Speech Recognition Raymond Sastraputera. Introduction Frame/Buffer Algorithm Silent Detector ...
-
Upload
sydney-harper -
Category
Documents
-
view
224 -
download
2
Transcript of Speech Recognition Raymond Sastraputera. Introduction Frame/Buffer Algorithm Silent Detector ...
![Page 1: Speech Recognition Raymond Sastraputera. Introduction Frame/Buffer Algorithm Silent Detector Estimate Pitch ◦ Correlation and Candidate ◦ Optimal.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bf7a1a28abf838c8312f/html5/thumbnails/1.jpg)
Speech RecognitionRaymond Sastraputera
![Page 2: Speech Recognition Raymond Sastraputera. Introduction Frame/Buffer Algorithm Silent Detector Estimate Pitch ◦ Correlation and Candidate ◦ Optimal.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bf7a1a28abf838c8312f/html5/thumbnails/2.jpg)
Introduction Frame/Buffer Algorithm
Silent Detector Estimate Pitch
◦ Correlation and Candidate◦ Optimal Candidate
◦ Buffer Delay Added Bias
Test and Result Conclusion
![Page 3: Speech Recognition Raymond Sastraputera. Introduction Frame/Buffer Algorithm Silent Detector Estimate Pitch ◦ Correlation and Candidate ◦ Optimal.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bf7a1a28abf838c8312f/html5/thumbnails/3.jpg)
Estimates the pitch on a speech
Written in C++
![Page 4: Speech Recognition Raymond Sastraputera. Introduction Frame/Buffer Algorithm Silent Detector Estimate Pitch ◦ Correlation and Candidate ◦ Optimal.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bf7a1a28abf838c8312f/html5/thumbnails/4.jpg)
Frame segment are shifted with no overlap
Frame segment
Buffer
![Page 5: Speech Recognition Raymond Sastraputera. Introduction Frame/Buffer Algorithm Silent Detector Estimate Pitch ◦ Correlation and Candidate ◦ Optimal.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bf7a1a28abf838c8312f/html5/thumbnails/5.jpg)
Initial detection of silent
|max(x)| + |max(y)| + |max(z)| + |min(x)| + |min(y)| + |min(z)| Threshold Value (50dB)
X Y Z
![Page 6: Speech Recognition Raymond Sastraputera. Introduction Frame/Buffer Algorithm Silent Detector Estimate Pitch ◦ Correlation and Candidate ◦ Optimal.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bf7a1a28abf838c8312f/html5/thumbnails/6.jpg)
Correlation of two vectors
j j
j,VV
jVjV
jVjV
P2221
)(2)(1
)(2)(1
![Page 7: Speech Recognition Raymond Sastraputera. Introduction Frame/Buffer Algorithm Silent Detector Estimate Pitch ◦ Correlation and Candidate ◦ Optimal.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bf7a1a28abf838c8312f/html5/thumbnails/7.jpg)
Correlation P(x,y)
Calculate for different window size (nm)◦ Window size will be the pitch value (in sample)◦ Correlation value above threshold become
candidate with score 1
X Y Z
Vector x Vector y
nmnm
![Page 8: Speech Recognition Raymond Sastraputera. Introduction Frame/Buffer Algorithm Silent Detector Estimate Pitch ◦ Correlation and Candidate ◦ Optimal.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bf7a1a28abf838c8312f/html5/thumbnails/8.jpg)
Correlation P(y,z)
Calculate for different nm
◦ Only for window size in candidate score 1◦ Correlation value above threshold become
candidate with score 2
X Y Z
Vector y Vector z
nm nm
![Page 9: Speech Recognition Raymond Sastraputera. Introduction Frame/Buffer Algorithm Silent Detector Estimate Pitch ◦ Correlation and Candidate ◦ Optimal.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bf7a1a28abf838c8312f/html5/thumbnails/9.jpg)
Correlation Q(n,m)
Calculate for different nm
◦ nMAX is maximum nm in the candidate
Optimal Candidate◦ if current candidate Qnm*0.77 is higher than
preceeding candidate’s Qnm
X Y Z
Vector x Vector z
nMAX nMAXnm
![Page 10: Speech Recognition Raymond Sastraputera. Introduction Frame/Buffer Algorithm Silent Detector Estimate Pitch ◦ Correlation and Candidate ◦ Optimal.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bf7a1a28abf838c8312f/html5/thumbnails/10.jpg)
Candidate score 1 Correlation P(x,y)◦ No candidate silence◦ Single candidate compute P(y,z)
Score stays at 1 hold Score 2 estimated pitch
◦ Multi candidate compute P(y,z) Candidate score 2 Correlation P(y,z)
◦ No candidate compute Q(n,m) candidate score1◦ Single candidate estimated pitch◦ Multi candidate compute Q(n,m)
Optimal Pitch Correlation Q(n,m)
![Page 11: Speech Recognition Raymond Sastraputera. Introduction Frame/Buffer Algorithm Silent Detector Estimate Pitch ◦ Correlation and Candidate ◦ Optimal.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bf7a1a28abf838c8312f/html5/thumbnails/11.jpg)
Single candidate with score 2 From Q(n,m) of
◦ Candidate score 2◦ Candidate score 1
On hold, and next frame estimated pitch is neither silence nor on hold.
![Page 12: Speech Recognition Raymond Sastraputera. Introduction Frame/Buffer Algorithm Silent Detector Estimate Pitch ◦ Correlation and Candidate ◦ Optimal.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bf7a1a28abf838c8312f/html5/thumbnails/12.jpg)
Delay the returning value of estimated pitch◦ Needed to limit the duration of on hold
![Page 13: Speech Recognition Raymond Sastraputera. Introduction Frame/Buffer Algorithm Silent Detector Estimate Pitch ◦ Correlation and Candidate ◦ Optimal.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bf7a1a28abf838c8312f/html5/thumbnails/13.jpg)
Conditions:◦ Two previous frame is not silent◦ Previous frame is not on hold◦ Previous frame pitch is between 5/8 and 7/4 of
the preceding frame pitch
![Page 14: Speech Recognition Raymond Sastraputera. Introduction Frame/Buffer Algorithm Silent Detector Estimate Pitch ◦ Correlation and Candidate ◦ Optimal.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bf7a1a28abf838c8312f/html5/thumbnails/14.jpg)
P(x,y) is doubled
![Page 15: Speech Recognition Raymond Sastraputera. Introduction Frame/Buffer Algorithm Silent Detector Estimate Pitch ◦ Correlation and Candidate ◦ Optimal.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bf7a1a28abf838c8312f/html5/thumbnails/15.jpg)
correlation_threshold_silent(0.88) Qnm_optimal_multiplier(0.77) sample_rate(20000.0F) max_pitch(400) min_pitch(50) pitch_buffer_size(20) bias_max_frequency(7/4) bias_min_frequency(5/8) silent_threshold(50.0F)
![Page 16: Speech Recognition Raymond Sastraputera. Introduction Frame/Buffer Algorithm Silent Detector Estimate Pitch ◦ Correlation and Candidate ◦ Optimal.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bf7a1a28abf838c8312f/html5/thumbnails/16.jpg)
![Page 17: Speech Recognition Raymond Sastraputera. Introduction Frame/Buffer Algorithm Silent Detector Estimate Pitch ◦ Correlation and Candidate ◦ Optimal.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bf7a1a28abf838c8312f/html5/thumbnails/17.jpg)
![Page 18: Speech Recognition Raymond Sastraputera. Introduction Frame/Buffer Algorithm Silent Detector Estimate Pitch ◦ Correlation and Candidate ◦ Optimal.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bf7a1a28abf838c8312f/html5/thumbnails/18.jpg)
![Page 19: Speech Recognition Raymond Sastraputera. Introduction Frame/Buffer Algorithm Silent Detector Estimate Pitch ◦ Correlation and Candidate ◦ Optimal.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bf7a1a28abf838c8312f/html5/thumbnails/19.jpg)
Some improvement can be done to increase the performance of the estimated pitch.◦ Reduce the search space◦ Adding 1st order derivaiton of the pitch◦ Filtering the outlier / noise
Current algorithm might not be fast enough to perform in real time
![Page 20: Speech Recognition Raymond Sastraputera. Introduction Frame/Buffer Algorithm Silent Detector Estimate Pitch ◦ Correlation and Candidate ◦ Optimal.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bf7a1a28abf838c8312f/html5/thumbnails/20.jpg)
Bagshaw, Paul Christopher. Automatic Prosodic Analysis for Computer Aider Pronunciation Teaching. The University of Edinburgh (1994).