The topic was speech recognition, and whether a new and unproven approach to machine intelligence—something called deep learning—could help computers more effectively identify the spoken word. Microsoft funded the mini-conference, held just before Christmas 2009, and two of its researchers invited the world’s preeminent deep learning expert, the University of Toronto’s Geoff Hinton, to give a speech.
Hinton’s idea was that machine learning models could work a lot like neurons in the human brain. He wanted to build “neural networks” that could gradually assemble an understanding of spoken words as more and more of them arrived. Neural networks were hot in the 1980s, but by 2009, they hadn’t lived up to their potential.
At Whistler, the gathered speech researchers were polite about the idea, “but not that interested,” says Peter Lee, the head of Microsoft’s research arm. These researchers had already settled on their own algorithms. But Microsoft’s team felt that deep learning was worth a shot, so the company had a couple of engineers work with Hinton’s researchers and run some experiments with real data. The results were “stunning,” Lee remembers—a more than 25 percent improvement in accuracy. This, in a field where a 5 percent improvement is game-changing. “We published those results, then the world changed,” he says.
Now, nearly five years later, neural network algorithms are hitting the mainstream, making computers smarter in new and exciting ways. Google has used them to beef up Android’s voice recognition. IBM uses them. And, most remarkably, Microsoft uses neural networks as part of the Star-Trek-like Skype Translate, which translates what you say into another language almost instantly. People “were very skeptical at first,” Hinton says, “but our approach has now taken over.”