Speech user interface
-
Upload
husain-master -
Category
Engineering
-
view
177 -
download
1
Transcript of Speech user interface
1
SPEECH USER INTERFACE
Husain Firoz Master(302093)
Guided by proff. Vidya Patil
2 Outline
Introduction Need for SUI Expectations from SUI overview of speech Recognition Voice features extraction and it technique Implementation of SUI Applications Future scope Shortcomings Conclusion
3 SPEECH USER INTERFACE (SUI)
A user interface that works with human voice commands It offers truly hands free, eyes free interaction with computers It provide interface for operating computers with following
understandings: Technology support User category User support
4 NEED FOR A SPEECH USER INTERFACE
It offer truly hands free, eyes-free interaction have unmatched throughput rates are the only plausible interaction modality for illiterate users across the
world speech is faster than typing on a keyboard Present opportunities for illiterate users in developing regions, giving
them a feasible way to access computing. but they are not yet developed in abundance to support every type of
user, language, or acoustic scenario.
5Expectations from speech user interface (SUI)
Recognize speech from any untrained users Understand the meaning of the spoken word Make the action as per meaning extracted word Deal with multiple languages Incorporate with large vocabularies Provide good fault tolerance level Provide help and messages, to users during interaction Operate in real-time
6 Speech Recognition
Translation of spoken words into text It is the ability of machines to understand natural human spoken
language. Two types of Speech recognition, speaker-dependent and speaker-
independent speaker-dependent :-Systems that require training speaker-independent :-Systems that do not require training Basically it is the process by which a computer maps an acoustic speech
signal to text.
7 SPEECH RECOGNITION MODEL
8 VOICE FEATURE EXTRACTION
Voice feature extraction is known as front end processing. It is performed in both recognition and training mode. converts digital speech signal into sets of numerical descriptors called
feature vectors . contain key characteristics of the speech signal. It evaluate of the different types of feature extracted from voice to
determine their suitability for voice recognition MFCC and HMM are one the most currently used feature extraction
techniques.
9 Feature Extraction Techniques
Mel-frequency Cepstral coefficients (MFCC) Hidden Markov model (HMM) Dynamic time warping (DTW) Fusion HMM and DTW
10 Mel-frequency Cepstral coefficients (MFCC)
Mel-frequency cepstral (MFC) is a representation of the short-term power spectrum of a sound, based on a linear cosine transform of a log power spectrum.
Mel-frequency Cepstral coefficients (MFCCs) are coefficients which contents the frequency bands of an audio input.
The frequency bands are equally spaced, by which mfcc approximates the human auditory system's response more closely.
The use of about 20 MFCC coefficients is common in ASR, although 10-12 coefficients are often considered to be sufficient for coding speech.
11 Hidden Markov model (HMM)
HMM models are used for representing the possible symbol sequences underlying speech utterances.
The states in the HMM represent easy spoken basic linguistic units (e.g. phonemes or smaller phases of phonemes) that are used by the human to pronounce a word.
For each word, one or more complex HMMs exist, which model the probability to articulate a state sequence representing a word.
HMMs are usually trained from large sets of recorded and feature analyzed samples.
12 Dynamic time warping (DTW)
dynamic time warping (DTW) is an algorithm for measuring similarity between two spoken word sequences which may vary in time or speed.
it compares two speech sequences. It measures similarity between two sequences
13 Fusion HMM and DTW
In this method HMM and DTW are combined Basically the results of HMM and DTW are combined in weight mean
vectors DTW find the similarity between two signals based on time Meanwhile HMM trains cluster and iteratively moves between clusters
based on their likelihoods given to it while training.
14 SUI IMPLEMENTATION
15 SUI IMPLEMENTATION (contd.)
Recording Speech Applying Noise cancellation End point detection Feature extraction :MFCC algorithm is used and parameters are
separated, that are further used in training part. Normalization: Word length is calculated for all groups and made an
average for each.
16 SUI IMPLEMENTATION (contd.)
Training using HMM Fusion HMM and DTW Recognized word or sentence is given to the application
17 Applications
Speech Operated Calculator. Voice Dialing intelligent voice assistant (Personal Agent) like Apple SIRI, google voice
talk etc. Home and Building Automation Systems using SUI live subtitling on television speech-to-text conversion or note taking systems
18 Future Scope
Speech User Interface For Learning Foreign Languages Dictation tools in the medical and legal profession
19 SHORTCOMINGS
Train the speech recognition system in the implementation environ-ment. Keep vocabulary Size small Keep short each speech input (word length). Use speech inputs that sound distinctly deferent from each other. Keep the user interface simple. Don't use speech to position objects. Use a command-based user interface. Allow users to quickly and easily turn oFF and on the speech recognizer. Use a highly directional, noise-canceling microphone
20 CONCLUSION