Speech Recognition Application

Speech Recognition Speech Recognition ApplicationApplication

Voice Enabled Phone Directory

- Yousef Rabah

يوسف رباح -

Why Speech Enabled Phone Why Speech Enabled Phone DirectoryDirectory

Growing Technology Easy AccessMainly used for:

– Educational purposes– People with certain Disabilities– Mobile use

ProblemProblem

Automatic speech interacting phone directory assistance

Automatic Speech Recognition - SphinxAutomatic Speech Recognition - Sphinx Speaker Dependent vs.

Independent Acoustic modeling Isolated vs. Continuous HMM – Probabilities,

Parameters, Training Language Model

– Unigrams: <s> & </s>– Bigrams: P(word2 | word1)

Phonemes Lexicon Structure

– ZERO Z IH R OW– TWO T UW

– H A HEIGH H

Input / Output Input / Output 24003 samples in file

/usr/local/share/sphinx3/model/lm/an4/hell.rawINFO: live.c(239): live_nfeatvec: 13

INFO: main_live_pretend.c(92): PARTIAL HYP: <sil>INFO: live.c(239): live_nfeatvec: 12

INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> A(2)INFO: live.c(239): live_nfeatvec: 13

INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> EIGHTHINFO: live.c(239): live_nfeatvec: 12

INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> HINFO: live.c(239): live_nfeatvec: 13

INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> H EINFO: live.c(239): live_nfeatvec: 12

INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> H EINFO: live.c(239): live_nfeatvec: 13

INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> H E LINFO: live.c(239): live_nfeatvec: 12

INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> H E LINFO: live.c(239): live_nfeatvec: 13

INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> H E L OH

Backtrace (null)

LatID SFrm EFrm AScr LScr Type

254 0 45 -391470 -74100 -1<sil>

594 46 81 -472155 -148846 0 H

1291 82 102 -288621 -148846 0 E

1850 103 126 -235274 -148846 0 L

2599 127 147 -430694 -148846 0 L

2650 148 148 0 -148846 0 </s>

0 148 -1818214 -818330 (Total)

FWDVIT: H E L L (null)

DifficultiesDifficulties

Hardware issuesASR software issuesLetter phonemesTime

SolutionSolution

4 Stage Process :

SolutionSolution

Database (PostgreSQL)

– Names– Phone numbers– Fast access

SolutionSolution

Architecture of application– db.pm– people.pm– people.pl– record.pl– wav_to_raw.pl– get_speech.pl– display_speech.pm– display_speech.pl– VEPD.pm– VEPD.pl

Example:…

PC: press space bar before and after you speak:

User: S AH EM

PC: Decoded as, SAM ?

Results | 1

1. SAM |SMITH | 765-973-2145

…

SolutionSolution

ResultsResults

A first step towards hands free speech enabled phone directory

Speaker Independent Application’s Features:

- Adding user- Retrieving user (via speech)- Manual search- Viewing current phone directory

Possible Future EnhancementPossible Future Enhancement

ASR enabled for :– Adding users– Phone # search– Word Recognition (instead of letters)

More accurate ASR (as tech. Grows)Graphical outlook (via perl/tk)Communication through VoiceXML

Special ThanksSpecial Thanks

To friends and family– Jim Rogers – Hassan Halta– Skylar Thompson– Kushboo Goel– Rabah family – El-Shabab el-taybeh

Questions/CommentsQuestions/Comments

Speech Recognition Application

Documents

Transcript of Speech Recognition Application