Speech Recognition ECE5526 Wilson Burgos. Outline Introduction Objective Existing Solutions...

27
uSpeak2Me Esol Android Tool Speech Recognition ECE5526 Wilson Burgos

Transcript of Speech Recognition ECE5526 Wilson Burgos. Outline Introduction Objective Existing Solutions...

Page 1: Speech Recognition ECE5526 Wilson Burgos. Outline Introduction Objective Existing Solutions Implementation Test and Result Conclusion.

uSpeak2MeEsol Android Tool

Speech Recognition ECE5526

Wilson Burgos

Page 2: Speech Recognition ECE5526 Wilson Burgos. Outline Introduction Objective Existing Solutions Implementation Test and Result Conclusion.

OutlineIntroductionObjectiveExisting SolutionsImplementationTest and ResultConclusion

Page 3: Speech Recognition ECE5526 Wilson Burgos. Outline Introduction Objective Existing Solutions Implementation Test and Result Conclusion.

IntroductionLots of $$$ are spent annually to improve

language skills for non native speakers.Classes for ESOL (English Speakers of other

languages) Lack of effective toolsSpeech recognition can help us in some areas

Page 4: Speech Recognition ECE5526 Wilson Burgos. Outline Introduction Objective Existing Solutions Implementation Test and Result Conclusion.

ObjectiveCreate a tool to help people learn to speak

English correctly in an effective way.Engage people using new technology

(Smartphone's) Using pocketsphinx, android and Text-to-

speech technologySimple and intuitive to useFun

Page 5: Speech Recognition ECE5526 Wilson Burgos. Outline Introduction Objective Existing Solutions Implementation Test and Result Conclusion.

Existing SolutionsEyeSpeak - http://www.eyespeakenglish.com

Pros Uses Native Speakers Pronunciation, pitch, timing & loudness

Cons Difficult to use Runs only on windows

Page 6: Speech Recognition ECE5526 Wilson Burgos. Outline Introduction Objective Existing Solutions Implementation Test and Result Conclusion.

Concept of OperationThe user from the main menu can start the

gameThe game screen must lead the user through

a series of words and log the number of positive responses (the score).

Each word has a corresponding graphic to display. For example, the game might show the user a picture of a mountain

The user has at most 30 seconds to respond

Page 7: Speech Recognition ECE5526 Wilson Burgos. Outline Introduction Objective Existing Solutions Implementation Test and Result Conclusion.

Development EnvironmentEclipse IDE with Android pluginCygwin Emulator

QEMU-based ARM emulatorRuns the same image as the deviceLimitations

No Camera support

Page 8: Speech Recognition ECE5526 Wilson Burgos. Outline Introduction Objective Existing Solutions Implementation Test and Result Conclusion.

Development EnvironmentActual Device

Page 9: Speech Recognition ECE5526 Wilson Burgos. Outline Introduction Objective Existing Solutions Implementation Test and Result Conclusion.

ImplementationUsing Java with the Android SDKPocketsphinx

Lightweight speech recognition decoder library Implemented in C

Page 10: Speech Recognition ECE5526 Wilson Burgos. Outline Introduction Objective Existing Solutions Implementation Test and Result Conclusion.

Android Architecture

Page 11: Speech Recognition ECE5526 Wilson Burgos. Outline Introduction Objective Existing Solutions Implementation Test and Result Conclusion.

Application Building BlocksActivityIntentReceiverServiceContentProvider

Page 12: Speech Recognition ECE5526 Wilson Burgos. Outline Introduction Objective Existing Solutions Implementation Test and Result Conclusion.

Application Architecture

Page 13: Speech Recognition ECE5526 Wilson Burgos. Outline Introduction Objective Existing Solutions Implementation Test and Result Conclusion.

ImplementationQuizGameActivity

The screen at the heart of the application—the game play screen.

This screen prompts the user to answer a series of trivia questions and stores the resulting score information

Uset Text-to-Speech technology to speak word if in simple mode

Page 14: Speech Recognition ECE5526 Wilson Burgos. Outline Introduction Objective Existing Solutions Implementation Test and Result Conclusion.

Implementation

RecognizerTask AudioTask

PocketSphinx VUMeter

Page 15: Speech Recognition ECE5526 Wilson Burgos. Outline Introduction Objective Existing Solutions Implementation Test and Result Conclusion.

ImplementationRecognizerTask

Interfaces directly to the pocketsphinx library using JNI calls

The Java Native Interface (JNI) enables the integration of code written in the Java programming language with code written in other languages such as C and C

Consumes data from the audio queue, produced by the AudioTask

Calls process_rawScoring

Based on positive detection of the utterance

Page 16: Speech Recognition ECE5526 Wilson Burgos. Outline Introduction Objective Existing Solutions Implementation Test and Result Conclusion.

ImplementationAudioTask

Interfaces directly to the audio peripherals to gather data

Format Sample Rate 8000Hz, 16Bit PCM, 8192k buffer

Page 17: Speech Recognition ECE5526 Wilson Burgos. Outline Introduction Objective Existing Solutions Implementation Test and Result Conclusion.

PocketSphinxVery limited documentationPackaged the pocketsphinx into a shared

library Created java shared library counterpart (jar)

To be added to the android applicationCompiled using cygwinInitialized with custom dictionary and

language modelSpeak2me.dicSpeak2me.lm

Loaded at startup from java code

Page 18: Speech Recognition ECE5526 Wilson Burgos. Outline Introduction Objective Existing Solutions Implementation Test and Result Conclusion.

LimitationsHardware memory

In the sphinx4 demos the recognizer was active all the time gathering data. When running in the device the AudioRecord buffer fills up preventing the recognizer to be active all the time.

Game needs to be responsive, how to solve this problem?

Page 19: Speech Recognition ECE5526 Wilson Burgos. Outline Introduction Objective Existing Solutions Implementation Test and Result Conclusion.

LimitationsHardware memory

The VUMeter class calculates the energy of the sampled data, removing the DC offset with a filter.

Detection logic was added to trigger end of utterance automatically with configurable lock/unlock thresholds

The game timer automatically starts the recognizer after every given word

Device SpeedTo improve detection the application uses the

partial results to determine if a match has been found, doesn’t penalize if partial is incorrect.

Page 20: Speech Recognition ECE5526 Wilson Burgos. Outline Introduction Objective Existing Solutions Implementation Test and Result Conclusion.

Screenshots

Page 21: Speech Recognition ECE5526 Wilson Burgos. Outline Introduction Objective Existing Solutions Implementation Test and Result Conclusion.
Page 22: Speech Recognition ECE5526 Wilson Burgos. Outline Introduction Objective Existing Solutions Implementation Test and Result Conclusion.
Page 23: Speech Recognition ECE5526 Wilson Burgos. Outline Introduction Objective Existing Solutions Implementation Test and Result Conclusion.

Test and ResultsThe cmu07a.dic recognized very poorlyhub4_wsj_sc_3s_8k.cd_semi_5000 TOTAL Words: 91

Correct: 56 Errors: 46 TOTAL Percent correct = 61.54%

Error = 50.55% Accuracy = 49.45% TOTAL Insertions: 11 Deletions: 3 Substitutions: 32

hub4_wsj_sc_3s_8k.cd_semi_5000adapt TOTAL Words: 91 Correct: 71 Errors: 25 TOTAL Percent correct = 78.02%

Error = 27.47% Accuracy = 72.53% T TOTAL Insertions: 5 Deletions: 9 Substitutions: 11

Page 24: Speech Recognition ECE5526 Wilson Burgos. Outline Introduction Objective Existing Solutions Implementation Test and Result Conclusion.

Test and ResultsUsing the custom corpus and creating

custom language model the tool accurately detects speech in a timely fashion ~2s.

Page 25: Speech Recognition ECE5526 Wilson Burgos. Outline Introduction Objective Existing Solutions Implementation Test and Result Conclusion.

InstallationNeed to install custom lexical and language

modeling files

Page 26: Speech Recognition ECE5526 Wilson Burgos. Outline Introduction Objective Existing Solutions Implementation Test and Result Conclusion.

Future AdditionsAdapt scoring based on pitch and phoneme

recognition. Add different levels of difficultyShow progress reports

Page 27: Speech Recognition ECE5526 Wilson Burgos. Outline Introduction Objective Existing Solutions Implementation Test and Result Conclusion.

Referenceshttp://developer.android.comhttp://sites.google.com/site/ioSams Teach Yourself Android Application

Development, Lauren Darcey & Shane Conder (2010)