Speech Recognition Application
-
Upload
nell-savage -
Category
Documents
-
view
53 -
download
0
description
Transcript of Speech Recognition Application
Speech Recognition Speech Recognition ApplicationApplication
Voice Enabled Phone Directory
- Yousef Rabah
يوسف رباح -
Why Speech Enabled Phone Why Speech Enabled Phone DirectoryDirectory
Growing Technology Easy AccessMainly used for:
– Educational purposes– People with certain Disabilities– Mobile use
ProblemProblem
Automatic speech interacting phone directory assistance
Automatic Speech Recognition - SphinxAutomatic Speech Recognition - Sphinx Speaker Dependent vs.
Independent Acoustic modeling Isolated vs. Continuous HMM – Probabilities,
Parameters, Training Language Model
– Unigrams: <s> & </s>– Bigrams: P(word2 | word1)
Phonemes Lexicon Structure
– ZERO Z IH R OW– TWO T UW
– H A HEIGH H
Input / Output Input / Output 24003 samples in file
/usr/local/share/sphinx3/model/lm/an4/hell.rawINFO: live.c(239): live_nfeatvec: 13
INFO: main_live_pretend.c(92): PARTIAL HYP: <sil>INFO: live.c(239): live_nfeatvec: 12
INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> A(2)INFO: live.c(239): live_nfeatvec: 13
INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> EIGHTHINFO: live.c(239): live_nfeatvec: 12
INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> HINFO: live.c(239): live_nfeatvec: 13
INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> H EINFO: live.c(239): live_nfeatvec: 12
INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> H EINFO: live.c(239): live_nfeatvec: 13
INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> H E LINFO: live.c(239): live_nfeatvec: 12
INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> H E LINFO: live.c(239): live_nfeatvec: 13
INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> H E L OH
Backtrace (null)
LatID SFrm EFrm AScr LScr Type
254 0 45 -391470 -74100 -1<sil>
594 46 81 -472155 -148846 0 H
1291 82 102 -288621 -148846 0 E
1850 103 126 -235274 -148846 0 L
2599 127 147 -430694 -148846 0 L
2650 148 148 0 -148846 0 </s>
0 148 -1818214 -818330 (Total)
FWDVIT: H E L L (null)
DifficultiesDifficulties
Hardware issuesASR software issuesLetter phonemesTime
SolutionSolution
4 Stage Process :
SolutionSolution
Database (PostgreSQL)
– Names– Phone numbers– Fast access
SolutionSolution
Architecture of application– db.pm– people.pm– people.pl– record.pl– wav_to_raw.pl– get_speech.pl– display_speech.pm– display_speech.pl– VEPD.pm– VEPD.pl
Example:…
PC: press space bar before and after you speak:
User: S AH EM
PC: Decoded as, SAM ?
Results | 1
1. SAM |SMITH | 765-973-2145
…
SolutionSolution
ResultsResults
A first step towards hands free speech enabled phone directory
Speaker Independent Application’s Features:
- Adding user- Retrieving user (via speech)- Manual search- Viewing current phone directory
Possible Future EnhancementPossible Future Enhancement
ASR enabled for :– Adding users– Phone # search– Word Recognition (instead of letters)
More accurate ASR (as tech. Grows)Graphical outlook (via perl/tk)Communication through VoiceXML
Special ThanksSpecial Thanks
To friends and family– Jim Rogers – Hassan Halta– Skylar Thompson– Kushboo Goel– Rabah family – El-Shabab el-taybeh
Questions/CommentsQuestions/Comments