Speech user interface

20
SPEECH USER INTERFACE 1 Husain Firoz Master (302093) Guided by proff. Vidya Patil

Transcript of Speech user interface

Page 1: Speech user interface

1

SPEECH USER INTERFACE

Husain Firoz Master(302093)

Guided by proff. Vidya Patil

Page 2: Speech user interface

2 Outline

Introduction Need for SUI Expectations from SUI overview of speech Recognition Voice features extraction and it technique Implementation of SUI Applications Future scope Shortcomings Conclusion

Page 3: Speech user interface

3 SPEECH USER INTERFACE (SUI)

A user interface that works with human voice commands It offers truly hands free, eyes free interaction with computers It provide interface for operating computers with following

understandings: Technology support User category User support

Page 4: Speech user interface

4 NEED FOR A SPEECH USER INTERFACE

It offer truly hands free, eyes-free interaction have unmatched throughput rates are the only plausible interaction modality for illiterate users across the

world speech is faster than typing on a keyboard Present opportunities for illiterate users in developing regions, giving

them a feasible way to access computing. but they are not yet developed in abundance to support every type of

user, language, or acoustic scenario.

Page 5: Speech user interface

5Expectations from speech user interface (SUI)

Recognize speech from any untrained users Understand the meaning of the spoken word Make the action as per meaning extracted word Deal with multiple languages Incorporate with large vocabularies Provide good fault tolerance level Provide help and messages, to users during interaction Operate in real-time

Page 6: Speech user interface

6 Speech Recognition

Translation of spoken words into text It is the ability of machines to understand natural human spoken

language. Two types of Speech recognition, speaker-dependent and speaker-

independent speaker-dependent :-Systems that require training speaker-independent :-Systems that do not require training Basically it is the process by which a computer maps an acoustic speech

signal to text.

Page 7: Speech user interface

7 SPEECH RECOGNITION MODEL

Page 8: Speech user interface

8 VOICE FEATURE EXTRACTION

Voice feature extraction is known as front end processing. It is performed in both recognition and training mode. converts digital speech signal into sets of numerical descriptors called

feature vectors . contain key characteristics of the speech signal. It evaluate of the different types of feature extracted from voice to

determine their suitability for voice recognition MFCC and HMM are one the most currently used feature extraction

techniques.

Page 9: Speech user interface

9 Feature Extraction Techniques

Mel-frequency Cepstral coefficients (MFCC) Hidden Markov model (HMM) Dynamic time warping (DTW) Fusion HMM and DTW

Page 10: Speech user interface

10 Mel-frequency Cepstral coefficients (MFCC)

Mel-frequency cepstral (MFC) is a representation of the short-term power spectrum of a sound, based on a linear cosine transform of a log power spectrum.

Mel-frequency Cepstral coefficients (MFCCs) are coefficients which contents the frequency bands of an audio input.

The frequency bands are equally spaced, by which mfcc approximates the human auditory system's response more closely.

The use of about 20 MFCC coefficients is common in ASR, although 10-12 coefficients are often considered to be sufficient for coding speech.

Page 11: Speech user interface

11 Hidden Markov model (HMM)

HMM models are used for representing the possible symbol sequences underlying speech utterances.

The states in the HMM represent easy spoken basic linguistic units (e.g. phonemes or smaller phases of phonemes) that are used by the human to pronounce a word.

For each word, one or more complex HMMs exist, which model the probability to articulate a state sequence representing a word.

HMMs are usually trained from large sets of recorded and feature analyzed samples.

Page 12: Speech user interface

12 Dynamic time warping (DTW)

dynamic time warping (DTW) is an algorithm for measuring similarity between two spoken word sequences which may vary in time or speed.

it compares two speech sequences. It measures similarity between two sequences

Page 13: Speech user interface

13 Fusion HMM and DTW

In this method HMM and DTW are combined Basically the results of HMM and DTW are combined in weight mean

vectors DTW find the similarity between two signals based on time Meanwhile HMM trains cluster and iteratively moves between clusters

based on their likelihoods given to it while training.

Page 14: Speech user interface

14 SUI IMPLEMENTATION

Page 15: Speech user interface

15 SUI IMPLEMENTATION (contd.)

Recording Speech Applying Noise cancellation End point detection Feature extraction :MFCC algorithm is used and parameters are

separated, that are further used in training part. Normalization: Word length is calculated for all groups and made an

average for each.

Page 16: Speech user interface

16 SUI IMPLEMENTATION (contd.)

Training using HMM Fusion HMM and DTW Recognized word or sentence is given to the application

Page 17: Speech user interface

17 Applications

Speech Operated Calculator. Voice Dialing intelligent voice assistant (Personal Agent) like Apple SIRI, google voice

talk etc. Home and Building Automation Systems using SUI live subtitling on television speech-to-text conversion or note taking systems

Page 18: Speech user interface

18 Future Scope

Speech User Interface For Learning Foreign Languages Dictation tools in the medical and legal profession

Page 19: Speech user interface

19 SHORTCOMINGS

Train the speech recognition system in the implementation environ-ment. Keep vocabulary Size small Keep short each speech input (word length). Use speech inputs that sound distinctly deferent from each other. Keep the user interface simple. Don't use speech to position objects. Use a command-based user interface. Allow users to quickly and easily turn oFF and on the speech recognizer. Use a highly directional, noise-canceling microphone

Page 20: Speech user interface

20 CONCLUSION