The gospel of deaf people: artificial intelligence lip reading error rate halved

source: China Science Daily

for the millions of deaf people who lip reading can provide a window to communicate with the outside world. But this practice is difficult and the results are often inaccurate

now, researchers have written a new artificial intelligence (AI) program, which has better performance than professional lip readers and the best AI so far, and the error rate is only half of the previous best algorithm. If it is improved and integrated into smart devices, this method can make everyone understand lip reading

"this is a great job." Helen bear, a computer scientist at Queen Mary University in London, UK, who was not involved in the study, said

writing computer code that can read lips is maddening

therefore, in the new research, scientists "ask for help" from machine learning to let computers learn from data

they provided thousands of hours of videos and transcripts for the system, and let the computer solve this problem by itself

the project began with 140000 hours of youtube video, which shows people talking in various situations

then, the researchers designed a program to create a few second clip with annotations through the mouth action of each phoneme or word sound

this program filters out non English speech, non speaker faces, low-quality videos and videos not directly captured. Then, they cut the video around the speaker's mouth

this produced nearly 4000 hours of video, including more than 127000 English words

Hassan Akbari, a computer scientist at Columbia University who did not participate in the study, said that this process and the resulting data set are seven times larger than similar data sets. It is "important and valuable" to train people who read lips in a similar system

in addition, this process depends in part on the neural collaterals

AI algorithm contains many simple computing elements connected together, which learn and process information in a way similar to human brain

when researchers provide unlabeled videos for the system, these networks will clip mouth movements. The next program in the system also uses neural networks to provide a list of possible phonemes and their probabilities for each video frame. The last set of algorithms collate the possible phoneme sequences and generate English word sequences

after training, researchers tested the system with 37 minutes of video that they had not seen before

in their paper published on arXiv, they reported that the word error rate was only 41%

this score may not sound very good, but the error rate of the previous best algorithm - focusing on a single letter rather than phonemes - is 77%

in the same study, the error rate of professional lip readers is 93% (although in real life, they can refer to context and body language, which is conducive to lip reading)

this work was completed by deepmind, a London based AI company, but the company declined to comment on this record

bear indicates that the program's understanding of phonemes may look different, depending on what was said before and after. (for example, when saying "t" in "boot", the shape of the mouth is different from that in "bee".)

the system has separate stages to predict the phonemes represented by the lip shape and predict words through factors. This means that if you want to teach the system to recognize new words, you need to retrain the last stage

but she said that this AI also has weaknesses: it needs clear and straightforward video, and the 41% error rate is far from perfect

akbarni said that integrating the program into one can make people with hearing impairment carry "translation" with them

such translation can also help those who cannot speak, such as those with damaged vocal cords. For others, it can simply help parse various chats

this technology can also be used applied to other programs, such as analyzing safety videos, interpreting historical shots, or hearing the voice of Skype partners when the audio drops

the new AI method can even answer the biggest mystery in the world: in the 2002 World Cup finals, French football player Zidane was sent off for head butting against opponents

he was obviously irritated by his opponent's dirty words, but what did he say

we may be able to solve the mystery

