Speech user interface

1

SPEECH USER INTERFACE

Husain Firoz Master(302093)

Guided by proff. Vidya Patil

2 Outline

Introduction Need for SUI Expectations from SUI overview of speech Recognition Voice features extraction and it technique Implementation of SUI Applications Future scope Shortcomings Conclusion

3 SPEECH USER INTERFACE (SUI)

A user interface that works with human voice commands It offers truly hands free, eyes free interaction with computers It provide interface for operating computers with following

understandings: Technology support User category User support

4 NEED FOR A SPEECH USER INTERFACE

It offer truly hands free, eyes-free interaction have unmatched throughput rates are the only plausible interaction modality for illiterate users across the

world speech is faster than typing on a keyboard Present opportunities for illiterate users in developing regions, giving

them a feasible way to access computing. but they are not yet developed in abundance to support every type of

user, language, or acoustic scenario.

5Expectations from speech user interface (SUI)

Recognize speech from any untrained users Understand the meaning of the spoken word Make the action as per meaning extracted word Deal with multiple languages Incorporate with large vocabularies Provide good fault tolerance level Provide help and messages, to users during interaction Operate in real-time

6 Speech Recognition

Translation of spoken words into text It is the ability of machines to understand natural human spoken

language. Two types of Speech recognition, speaker-dependent and speaker-

independent speaker-dependent :-Systems that require training speaker-independent :-Systems that do not require training Basically it is the process by which a computer maps an acoustic speech

signal to text.

7 SPEECH RECOGNITION MODEL

8 VOICE FEATURE EXTRACTION

Voice feature extraction is known as front end processing. It is performed in both recognition and training mode. converts digital speech signal into sets of numerical descriptors called

feature vectors . contain key characteristics of the speech signal. It evaluate of the different types of feature extracted from voice to

determine their suitability for voice recognition MFCC and HMM are one the most currently used feature extraction

techniques.

9 Feature Extraction Techniques

Mel-frequency Cepstral coefficients (MFCC) Hidden Markov model (HMM) Dynamic time warping (DTW) Fusion HMM and DTW

10 Mel-frequency Cepstral coefficients (MFCC)

Mel-frequency cepstral (MFC) is a representation of the short-term power spectrum of a sound, based on a linear cosine transform of a log power spectrum.

Mel-frequency Cepstral coefficients (MFCCs) are coefficients which contents the frequency bands of an audio input.

The frequency bands are equally spaced, by which mfcc approximates the human auditory system's response more closely.

The use of about 20 MFCC coefficients is common in ASR, although 10-12 coefficients are often considered to be sufficient for coding speech.

11 Hidden Markov model (HMM)

HMM models are used for representing the possible symbol sequences underlying speech utterances.

The states in the HMM represent easy spoken basic linguistic units (e.g. phonemes or smaller phases of phonemes) that are used by the human to pronounce a word.

For each word, one or more complex HMMs exist, which model the probability to articulate a state sequence representing a word.

HMMs are usually trained from large sets of recorded and feature analyzed samples.

12 Dynamic time warping (DTW)

dynamic time warping (DTW) is an algorithm for measuring similarity between two spoken word sequences which may vary in time or speed.

it compares two speech sequences. It measures similarity between two sequences

13 Fusion HMM and DTW

In this method HMM and DTW are combined Basically the results of HMM and DTW are combined in weight mean

vectors DTW find the similarity between two signals based on time Meanwhile HMM trains cluster and iteratively moves between clusters

based on their likelihoods given to it while training.

14 SUI IMPLEMENTATION

15 SUI IMPLEMENTATION (contd.)

Recording Speech Applying Noise cancellation End point detection Feature extraction :MFCC algorithm is used and parameters are

separated, that are further used in training part. Normalization: Word length is calculated for all groups and made an

average for each.

16 SUI IMPLEMENTATION (contd.)

Training using HMM Fusion HMM and DTW Recognized word or sentence is given to the application

17 Applications

Speech Operated Calculator. Voice Dialing intelligent voice assistant (Personal Agent) like Apple SIRI, google voice

talk etc. Home and Building Automation Systems using SUI live subtitling on television speech-to-text conversion or note taking systems

18 Future Scope

Speech User Interface For Learning Foreign Languages Dictation tools in the medical and legal profession

19 SHORTCOMINGS

Train the speech recognition system in the implementation environ-ment. Keep vocabulary Size small Keep short each speech input (word length). Use speech inputs that sound distinctly deferent from each other. Keep the user interface simple. Don't use speech to position objects. Use a command-based user interface. Allow users to quickly and easily turn oFF and on the speech recognizer. Use a highly directional, noise-canceling microphone

20 CONCLUSION

Speech user interface

Engineering

Transcript of Speech user interface