Speech Recognition, Text to Speech, and Voice Interfaces

Speech Recognition,Text-To-Speech,and Voice Interfaces

By:Taryne CahalinStephanie SiricoChristiana Vasquez

Adelphi University - Mobile Learning, Fall 2013

What is Speech Recognition?Instead of an automated voice recording that enables a person to press buttons, he or she is able to speak specific words into a device and command orders with the help of a speech recognition program.

The Uses

Individuals With Disabilities – Assists those who have visual

impairment, hand immobility, dyslexia, etc.

Medical Transcription – Reduces delays to write out

medical transcriptions

Dictation - Converts words to text in emails or other word

documents (also helpful for English Language Learners).

Access Menu Commands – Opens files using voice commands.

Using Dragon Mobile

How does it work?

Speech recognition functions as a pipeline:

The pipeline converts PCM (pulse code modulation) digital audio into

recognized speech from a sound card.

Transforming PCM Digital Audio

16,000 PCM values per second, a “wavy

line”, that repeat while the user speaks

Information is converted for

better recognition in the program

Fast-Fourier transform identifies frequency

components of a specific sound

The program can

approximate how our ears

distinguish the sound

Transform PCM digital audio using Fast-Fourier Transform

Sound is given a number which describes the sound, called the “feature number”

Sounds matched to the most similar entry in the codebook.

These graphs are in a database called a “codebook”

Each 1/100th produces an amplitude graph

Fast-Fourier analyzes every 1/100th of a second

and converts the audio data

Two Categories

Small Vocabulary/many-users:• Leaves room for speech disparity (i.e. accents)• Limited, preset number of commands that are able to be used

Large Vocabulary/limited-users:• Best for business settings• Train system to work with a small number of users• Accuracy rate will increase as it learns its users

Discrete vs. Continuous Speech

Discrete• Easier for program to understand• Noticeable pause after each word

Continuous• Allows speaking at conversational speed• Used in most modern systems

Programs now can recognize accents and pronunciations better. In earlier programs, accents, pronunciations, speed, and background noise were all variables that made sounds difficult for programs to understand.

Using Talk – Text to Voice

This app allows you to type and then have the device repeat what was typed. In this case, instead of the device saying Taryne as “Ta-rin”, it pronounced it as “Ta-reen”. This is an example of speech recognition programs still need some work to be done because of emphasis on a syllable. The codebook did not have Taryne in it, so it was unable to pronounce her name.

The Future of Assistive Technologyin Schools

Students who need assistance in their writing skills because they have stronger oral skills.

Students who were absent for a class, have poor memory, or need assistance hearing the lesson.

Students who need assistance during Guided Reading.

Students who are English Language Learners.

Students with visual/hearing impairments and learning disabilities regarding reading/spelling/writing.

Speech Recognition, Text to Speech, and Voice Interfaces

Technology

Transcript of Speech Recognition, Text to Speech, and Voice Interfaces