Speech Recognition in Artificial Intelligence

Instructor: Prashant Mishra

Prashant is currently pursuing his bachelors in Computer Science and Engineering.

In this lesson, you will learn how speech recognition works in artificial intelligent systems. You will also learn about language models and the working of speech recognizers.

''Siri, please call John.'' This phrase, considered absurd if said at the start of the 21st century, is now very common. Doesn't it make you wonder how a mere five or six inch large device understands what we say? The answer lies in speech recognition.

What is Speech Recognition?

Speech recognition is a technique where sound signals are recorded and processed into text transcriptions. Their use is growing exponentially with each passing day. They are used in smart phones, computers, ovens, and other electronic devices. Devices, which produce statements in written form only by hearing them, are turning out to be a boon for the disabled.

How Speech Recognition Works in Artificial Intelligence Systems

The following steps are followed to convert a sound wave into a text transcription:


This is the first step and is implemented with the help of a voice recorder present in the device. The voice of the user is recorded in the form of an audio signal and stored.


As you are aware, computer and other electronic gadgets use data in their discrete form. By basic physics, it is known that a sound wave is continuous in nature. Therefore, for the system to understand it and process it, it is converted to discrete values. This conversion from continuous to discrete is done at a particular frequency.

Transforming to Frequency Domain

In this step, we convert the time domain of the audio signal to frequency domain. This is a crucial step as any error can cause huge differences in outputs. This step is also crucial as a lot of information about the audio can be assessed with the help of frequency domain. Fourier transforms are used to convert time domain to frequency domain.

Information Extraction from Audio

This step forms the core of any speech recognition system. In this step, we convert the audio into a usable form of vector. Different extraction techniques such as PLP, MFCC, etc. are used for this conversion.

Recognition of Extracted Information

In this step, the concept of pattern matching is used. The extracted information is taken and compared to some already-defined information and recognition is done. This matching and comparison is achieved through pattern matching. Google Speech API is one of the most extensively-used software for this purpose.

To unlock this lesson you must be a Member.
Create your account

Register to view this lesson

Are you a student or a teacher?

Unlock Your Education

See for yourself why 30 million people use

Become a member and start learning now.
Become a Member  Back
What teachers are saying about
Try it risk-free for 30 days

Earning College Credit

Did you know… We have over 200 college courses that prepare you to earn credit by exam that is accepted by over 1,500 colleges and universities. You can test out of the first two years of college and save thousands off your degree. Anyone can earn credit-by-exam regardless of age or education level.

To learn more, visit our Earning Credit Page

Transferring credit to the school of your choice

Not sure what college you want to attend yet? has thousands of articles about every imaginable degree, area of study and career path that can help you find the school that's right for you.

Create an account to start this course today
Try it risk-free for 30 days!
Create an account