Improved ASR in noise using harmonic decomposition
description
Transcript of Improved ASR in noise using harmonic decomposition
![Page 1: Improved ASR in noise using harmonic decomposition](https://reader036.fdocuments.us/reader036/viewer/2022070406/568141a4550346895dad885c/html5/thumbnails/1.jpg)
Improved ASR in noise using harmonic decomposition
• Introduction
• Pitch-Scaled Harmonic Filter
• Recognition Experiments
• Results
• Conclusion aperiodic contribution
periodic contribution
Production of /z/:
![Page 2: Improved ASR in noise using harmonic decomposition](https://reader036.fdocuments.us/reader036/viewer/2022070406/568141a4550346895dad885c/html5/thumbnails/2.jpg)
Motivation & Aims
• Most speech sounds are predominantly voiced or unvoiced.
What happens when the two components are “mixed”?
• Voiced and unvoiced components have different natures:
unvoiced: aperiodic signal from turbulence-noise sources
voiced: quasi-periodic signal from vocal-fold vibration
Why not extract their features separately?
Do the two contributions contain complementary information?
• Human speech recognition still performs well in noise.
How? Does it take advantage of harmonic properties?
Introduction
![Page 3: Improved ASR in noise using harmonic decomposition](https://reader036.fdocuments.us/reader036/viewer/2022070406/568141a4550346895dad885c/html5/thumbnails/3.jpg)
Voiced and unvoiced parts of a speech signal
aperiodic contribution
periodic contribution
Production of /z/:
Introduction
![Page 4: Improved ASR in noise using harmonic decomposition](https://reader036.fdocuments.us/reader036/viewer/2022070406/568141a4550346895dad885c/html5/thumbnails/4.jpg)
Automatic Speech Recognition
Front EndPattern
Recognitionspeech signal
speech labels
Feature Extraction:
conversion of speech signals to a sequence of parameter vectors
Dynamic Programming:
matching of observation sequences to models of known utterances
Introduction
![Page 5: Improved ASR in noise using harmonic decomposition](https://reader036.fdocuments.us/reader036/viewer/2022070406/568141a4550346895dad885c/html5/thumbnails/5.jpg)
u(n) v(n)
Harmonic Decomposition
Pitch optimisation
PSHF block diagram
raw pitch
wave-form
+ _
optimised pitch
f0raw f0
opt
aperiodic waveform
s(n)
periodic waveform
Nopt
sw(n)
vw(n)^
window
w(n) w(n)
window
uw(n)^
PSHF
![Page 6: Improved ASR in noise using harmonic decomposition](https://reader036.fdocuments.us/reader036/viewer/2022070406/568141a4550346895dad885c/html5/thumbnails/6.jpg)
Decomposition example (waveforms)
Ori
gina
lP
erio
dic
part
Ape
riod
ic
part
PSHF
![Page 7: Improved ASR in noise using harmonic decomposition](https://reader036.fdocuments.us/reader036/viewer/2022070406/568141a4550346895dad885c/html5/thumbnails/7.jpg)
Decomposition example (spectrograms)
Ori
gina
lP
erio
dic
part
Ape
riod
ic
part
PSHF
![Page 8: Improved ASR in noise using harmonic decomposition](https://reader036.fdocuments.us/reader036/viewer/2022070406/568141a4550346895dad885c/html5/thumbnails/8.jpg)
Decomposition example (MFCC specs.)
Ori
gina
lP
erio
dic
part
Ape
riod
ic
part
PSHF
![Page 9: Improved ASR in noise using harmonic decomposition](https://reader036.fdocuments.us/reader036/viewer/2022070406/568141a4550346895dad885c/html5/thumbnails/9.jpg)
Parameterisations
SPLIT: MFCC +Δ, +Δ2 catPSHF
PCA26:
PCA78:
PCA13:
PCA39:
MFCC +Δ, +Δ2catPSHF PCA
MFCC +Δ, +Δ2 catPSHF PCA
MFCC +Δ, +Δ2 catPSHF PCA
MFCC +Δ, +Δ2 catPSHF PCA
BASE: MFCCwaveform features
+Δ, +Δ2
Method
![Page 10: Improved ASR in noise using harmonic decomposition](https://reader036.fdocuments.us/reader036/viewer/2022070406/568141a4550346895dad885c/html5/thumbnails/10.jpg)
Speech Database: Aurora 2.0
• TIdigits database at 8 kHz, filtered with G.712 channel
• Connected English digit strings (male & female speakers)
GroupSignal-to-Noise Ratio
(dB)
clean condition Train
multi-condition 20 15 10 5
set A(same noises)
20 15 10 5 0 -5
set B(different noises)
20 15 10 5 0 -5Test
set C(different channel)
20 15 10 5 0 -5
Method
![Page 11: Improved ASR in noise using harmonic decomposition](https://reader036.fdocuments.us/reader036/viewer/2022070406/568141a4550346895dad885c/html5/thumbnails/11.jpg)
Description of the experiments
• Baseline experiment: [base]
standard parameterisation of the original waveforms (i.e., MFCC+D+A)
• Split experiments: [split]
adjustment of stream weights (voiced vs. unvoiced)
• PCA experiments: [pca26, pca78, pca13 and pca39]
decorrelation of the feature vectors, and reduction of the number of coefficients
Method
![Page 12: Improved ASR in noise using harmonic decomposition](https://reader036.fdocuments.us/reader036/viewer/2022070406/568141a4550346895dad885c/html5/thumbnails/12.jpg)
Split experiments resultsResults
![Page 13: Improved ASR in noise using harmonic decomposition](https://reader036.fdocuments.us/reader036/viewer/2022070406/568141a4550346895dad885c/html5/thumbnails/13.jpg)
Split experiments resultsResults
![Page 14: Improved ASR in noise using harmonic decomposition](https://reader036.fdocuments.us/reader036/viewer/2022070406/568141a4550346895dad885c/html5/thumbnails/14.jpg)
Split experiments resultsResults
![Page 15: Improved ASR in noise using harmonic decomposition](https://reader036.fdocuments.us/reader036/viewer/2022070406/568141a4550346895dad885c/html5/thumbnails/15.jpg)
Word Accuracy (%)clean multi overall
base 52.6 78.3 65.4split 77.9 89.1 83.0pca26 71.2 88.8 78.8pca78 61.9 88.1 74.7pca13 72.6 87.6 79.7pca39 70.9 87.5 78.8
Word Accuracy (%) WER (%)clean multi overall abs. rel.
base 52.6 78.3 65.4 -- --split 77.9 89.1 83.0 17.6 50.9pca26 71.2 88.8 78.8 13.4 38.7pca78 61.9 88.1 74.7 9.3 26.9pca13 72.6 87.6 79.7 14.3 41.3pca39 70.9 87.5 78.8 13.4 38.7
Summary of resultsResults
![Page 16: Improved ASR in noise using harmonic decomposition](https://reader036.fdocuments.us/reader036/viewer/2022070406/568141a4550346895dad885c/html5/thumbnails/16.jpg)
Conclusions
• PSHF module split Aurora’s speech waveforms into two synchronous streams (periodic and aperiodic).
• Used separately, accuracy was slighty degraded, however together, it was substantially increased in noisy conditions.
• Periodic speech segments provide robustness to noise.
• Apply Linear Discriminant Analysis (LDA) to the two-stream feature vector.
• Evaluate the performance of this front end in a more general task, such as phoneme recognition.
• Test the technique for speaker recognition.
Further Work
![Page 17: Improved ASR in noise using harmonic decomposition](https://reader036.fdocuments.us/reader036/viewer/2022070406/568141a4550346895dad885c/html5/thumbnails/17.jpg)
COLUMBO PROJECT: Harmonic Decomposition applied to ASR
David M. Moreno 1 <[email protected]>
Philip J.B. Jackson 2 <[email protected]>
Javier Hernando 1 <[email protected]>
Martin J. Russell 3 <[email protected]>
http://www.ee.surrey.ac.uk/
Personal/P.Jackson/Columbo/
1 2 3
![Page 18: Improved ASR in noise using harmonic decomposition](https://reader036.fdocuments.us/reader036/viewer/2022070406/568141a4550346895dad885c/html5/thumbnails/18.jpg)
Pitch Optimisation: vowel /u/
Cost function
Spectrum derived from a 268-point DFT
![Page 19: Improved ASR in noise using harmonic decomposition](https://reader036.fdocuments.us/reader036/viewer/2022070406/568141a4550346895dad885c/html5/thumbnails/19.jpg)
Harmonic Decomposition: vowel /u/
![Page 20: Improved ASR in noise using harmonic decomposition](https://reader036.fdocuments.us/reader036/viewer/2022070406/568141a4550346895dad885c/html5/thumbnails/20.jpg)
Word accuracy results (%)
![Page 21: Improved ASR in noise using harmonic decomposition](https://reader036.fdocuments.us/reader036/viewer/2022070406/568141a4550346895dad885c/html5/thumbnails/21.jpg)
Observation probability, with stream weights