Speech Recognition, Text to Speech, and Voice Interfaces
-
Upload
christiana-vasquez -
Category
Technology
-
view
577 -
download
2
description
Transcript of Speech Recognition, Text to Speech, and Voice Interfaces
Speech Recognition,Text-To-Speech,and Voice Interfaces
By:Taryne CahalinStephanie SiricoChristiana Vasquez
Adelphi University - Mobile Learning, Fall 2013
What is Speech Recognition?Instead of an automated voice recording that enables a person to press buttons, he or she is able to speak specific words into a device and command orders with the help of a speech recognition program.
The Uses
Individuals With Disabilities – Assists those who have visual
impairment, hand immobility, dyslexia, etc.
Medical Transcription – Reduces delays to write out
medical transcriptions
Dictation - Converts words to text in emails or other word
documents (also helpful for English Language Learners).
Access Menu Commands – Opens files using voice commands.
Using Dragon Mobile
How does it work?
Speech recognition functions as a pipeline:
The pipeline converts PCM (pulse code modulation) digital audio into
recognized speech from a sound card.
Transforming PCM Digital Audio
16,000 PCM values per second, a “wavy
line”, that repeat while the user speaks
Information is converted for
better recognition in the program
Fast-Fourier transform identifies frequency
components of a specific sound
The program can
approximate how our ears
distinguish the sound
Transform PCM digital audio using Fast-Fourier Transform
Sound is given a number which describes the sound, called the “feature number”
Sounds matched to the most similar entry in the codebook.
These graphs are in a database called a “codebook”
Each 1/100th produces an amplitude graph
Fast-Fourier analyzes every 1/100th of a second
and converts the audio data
Two Categories
Small Vocabulary/many-users:• Leaves room for speech disparity (i.e. accents)• Limited, preset number of commands that are able to be used
Large Vocabulary/limited-users:• Best for business settings• Train system to work with a small number of users• Accuracy rate will increase as it learns its users
Discrete vs. Continuous Speech
Discrete• Easier for program to understand• Noticeable pause after each word
Continuous• Allows speaking at conversational speed• Used in most modern systems
Programs now can recognize accents and pronunciations better. In earlier programs, accents, pronunciations, speed, and background noise were all variables that made sounds difficult for programs to understand.
Using Talk – Text to Voice
This app allows you to type and then have the device repeat what was typed. In this case, instead of the device saying Taryne as “Ta-rin”, it pronounced it as “Ta-reen”. This is an example of speech recognition programs still need some work to be done because of emphasis on a syllable. The codebook did not have Taryne in it, so it was unable to pronounce her name.
The Future of Assistive Technologyin Schools
Students who need assistance in their writing skills because they have stronger oral skills.
Students who were absent for a class, have poor memory, or need assistance hearing the lesson.
Students who need assistance during Guided Reading.
Students who are English Language Learners.
Students with visual/hearing impairments and learning disabilities regarding reading/spelling/writing.