管家婆免费开奖大全

DCS research in speech recognition is conducted by Professors Geoffrey E. Hinton (Machine Learning) and Gerald Penn (Computational Linguistics) (Bigstock photo)

Leading breakthroughs in speech recognition software at Microsoft, Google, IBM

Groundbreaking work on speech recognition software by the 管家婆免费开奖大全鈥檚 Department of Computer Science (DCS) is transforming Microsoft, Google and IBM.

At a conference in Asia recently, Microsoft鈥檚 Chief Research Officer demonstrated an almost instantaneous translation of spoken English to Chinese speech 鈥 with software that maintained the sound of the speaker鈥檚 voice. It was the latest in a series of breakthroughs in the field involving 管家婆免费开奖大全 faculty and students.

鈥淎 few years ago, researchers at Microsoft Research and the 管家婆免费开奖大全 came together to develop another breakthrough in the field of speech recognition,鈥 Rick Rashid told the crowd. 鈥淭he idea that they had was to use a technology in a way patterned after the way the human brain works 鈥 it鈥檚 called deep neural networks.

鈥淭hat one change, that particular breakthrough increased recognition rates by approximately thirty percent. That鈥檚 a big deal.鈥

The breakthrough involves better recognition by the computer of what are called phonemes 鈥 small units of sound that comprise speech 鈥 and it has led to a reduction in errors by the computer, said Rashid.

鈥淭hat鈥檚 the difference between going from 20 to 25 per cent errors - or about one out of every five words - to roughly 15 per cent less errors or roughly one out of every seven or perhaps one out of every eight words,鈥 Rashid said. 鈥淚t鈥檚 still not perfect, there鈥檚 still a long way to go but I think you can see that we have already made a significant amount of progress in the recognition of speech.鈥

DCS research in speech recognition is conducted by Professors Geoffrey E. Hinton (Machine Learning) and Gerald Penn (Computational Linguistics), with this latest breakthrough drawing on Hinton's deep neural networks.

Graduate students Abdel-rahman Mohamed and George Dahl began collaborating in 2009, applying deep neural networks to speech recognition. (Artificial neural networks are simplified mathematical models of neural circuits in the human brain.)

鈥淓ven before I started my PhD at 管家婆免费开奖大全 with Gerald Penn, I was always thinking about how I might make a breakthrough in the speech recognition field,鈥 said Mohamed, 鈥渂ringing Automatic Speech Recognition (ASR) technology closer to the end users.鈥

Inspired by one of Hinton鈥檚 lectures on deep neural networks, Mohamed began applying them to speech - but deep neural networks required too much computing power for conventional computers 鈥 so Hinton and Mohamed enlisted Dahl. A student in Hinton鈥檚 lab, Dahl had discovered how to train and simulate neural networks efficiently using the same high-end graphics cards which make vivid computer games feasible on personal computers.

鈥淭hey applied the same method to the problem of recognizing fragments of phonemes in very short windows of speech,鈥 said Hinton. 鈥淭hey got significantly better results than previous methods on a standard three-hour benchmark.鈥

Dahl and Mohamed presented the results of their work at a 2009 Neural Information Processing Systems (NIPS) workshop to a mixed reaction.

鈥淢any participants in the workshop were excited about our results,鈥 recalled Dahl, 鈥渂ut at the time there was a lot of healthy skeptical concern that our results might not translate into similar gains on more realistic speech recognition problems.鈥

Researchers at Microsoft, however, were interested enough to invite both students to internships at Microsoft Research in Redmond the following year. There, Mohamed and Dahl successfully applied their methods to larger speech tasks, involving much larger vocabularies.

Fellow CS graduate student Navdeep Jaitly also became involved in the research, and worked with Google to implement it in their system. Google now uses a deep neural network for voice search in the Android 4.1 operating system, their answer to the iPhone鈥檚 Siri conversational agent.

鈥淚 was expecting this move,鈥 said Mohamed, 鈥済iven the great results our model achieved consistently on so many benchmarks.鈥

Dahl continued: 鈥淚t is very gratifying, particularly because there was a lot of initial resistance from the speech community to using deep neural networks for acoustic modeling.鈥

Today, most top speech labs are embracing the technology, including IBM, a long-time leader in speech recognition research, with whom Mohamed has also worked on this topic. Penn鈥檚 speech lab has also since developed an alternative neural network model in collaboration with York University Professor Hui Jiang and graduate student Ossama Abdel-Hamid. Abdel-Hamid has also worked on neural networks at Microsoft Research.

And the 管家婆免费开奖大全 researchers say the new business opportunities they鈥檝e helped create are just the beginning. Hinton鈥檚 lab has already applied deep neural networks to several other pattern recognition problems. And Penn鈥檚 speech lab is in the process of digitizing the last 23 years of CBC NewsWorld video to develop search algorithms for large collections of speech.

Unlike Google voice search, which uses voice queries for searching web pages of text, this work uses text queries to search through speech data for related news coverage or interviews.

鈥淭his is important not just for speech researchers,鈥 said Penn, 鈥渂ut for journalists, historians and anyone else who is interested in documenting the Canadian perspective on world affairs. Having all of this data around is great, but it鈥檚 of limited application if we can鈥檛 somehow navigate or search through it for topics of interest.鈥

The Bulletin Brief logo

Subscribe to The Bulletin Brief