Alexa Speech Science Technology
The Amazon Alexa Speech Science and Machine Learning team employs cutting-edge research and technology to create magical customer experiences with Alexa, the voice service that powers Amazon’s family of Echo products, Amazon Fire TV, and more.
Alexa is a spoken language understanding system comprised of several data-driven audio and text processing components that process and respond to customer voice requests. These components include signal processing, wake word detection, automatic speech recognition (ASR), natural language understanding (NLU), question answering, dialog management, and text-to-speech synthesis.
For example, on a hands-free WiFi-connected audio device such as Echo, a customer may ask, “Alexa, what’s the weather in Boston?” The wake word engine then detects the word “Alexa” and begins streaming audio to the speech platform in the Amazon cloud. The platform passes the audio stream to the ASR module, which returns a list of the most likely transcriptions of the user’s speech. This result is then transferred to NLU, which analyzes the top ASR results to extract the most likely interpretation of the user’s request, consisting of an intent (e.g., “GetWeatherForecast”) and any associated slots (e.g., “Boston”). Based on the type of intent, the speech platform routes the intent to the appropriate skill, which then specifies what Alexa should say to the user. Finally, Text-to-Speech technology converts Alexa’s text response into audio, which is streamed to the device to play back to the user from the device’s speaker.
This seamless interaction requires relentless focus on the customer experience and customer feedback. The speech science team focuses on incorporating learning from every customer interaction to accelerate Alexa’s learning, using highly scalable deep learning techniques. The Amazon speech scientist team trains deep neural networks on large datasets using distributed processing, working at massive Amazon scale to optimize training for the AWS network. Learning at scale requires the right balance of invention and simplification to find the set of algorithms that maximize Alexa’s accuracy given the data. The challenge of interacting with the Amazon-scale catalog of shopping, music, and media requires world-class solutions for almost every known NLP task, from Anaphora Resolution to Semantic Parsing.
Past Interspeech publications and submissions from the Amazon team are listed below.
Meet the Team
Are you ready for your next opportunity? Check out our open positions on this page, and meet some of our speech scientists here. We have global opportunities available, and speech and machine learning scientists from the following locations will be available to meet:
If you would like to meet with a speech scientist in person at the conference, please contact firstname.lastname@example.org.
You wear many hats when you work for a nonprofit. I’m sure this is not breaking news for you. As an IT professional, the one hat that I wear eve
Elon Musk made quite the announcement today. During the special shareholders meeting to approve the merger with SolarCity, which they approved by 85%,
The Labrador retriever left the Texas family stunned when she took off last summer and Jesyln Robles, a teenager, said of her dog, “I was r
Download This Document From @wikileaks https://wikileaks.org/vault7/document/HighRise-2_0-Users_Guide/ Then Go To Page 8 : "And Yo
Stephen Hawking says Donald Trump could ‘push earth over the brink’ Stephen Hawking has said that Donald Trump’s decisi
in these days hackers can be hacked from he's linux A 7-year-old critical remote code execution vulnerability has been discovered in Samba netw