Speech Recognition : The Tip of the Iceberg of the Powers of New Technologies-
Speech recognition is based on Natural Language Processing. Natural Language Processing (NLP) is a field of AI that allows the human voice to be translated to text and vice-versa. Smart speakers like Siri and Alexa are some of the famous names of example of Speech Recognition.
How does Speech Recognition work?
The computer performs these tasks to be able to understand what we are speaking:-
- The computer first breaks down all the sounds into individual tunes.
- Converts them into digital format.
- They use algorithms to try to find out the most appropriate words.
When we speak, the vibrations are caused. A system known as the Analogue-to-digital-converter (ADC) converts the vibrations to binary numbers (a set of 0's and 1's), removes unwanted noise, normalises the sound and the speed of the sound. It then separates the bands of the frequencies. The programs are coded into it to help the computer recognise the spells. For eg: Hello will be programmed as : H-el-l-oh in the computer. When the computer recognises that the sounds match they understand what they are saying. They can also be fed with the response they would give.
In the case of Alexa, when it hears the word "Alexa" it comes to know that it is being started and a command is about to come next. This is called a "trigger word detection". Next comes the stage of breaking down the words by the ADC. It then finds the most appropriate word which is programmed into it. Then it comes to the response which is done by Speech Synthesis.
Applications:
Thank you
Aniruddha KP
References:
(1) Youtube
(2) Croma (Fig. 1)
(3) TechGIG (Fig. 2)