aperiodic periodic Production of /z/: Covariation and weighting of harmonically decomposed streams...

20
aperiodic periodic Production of /z/: Covariation and weighting of harmonically decomposed streams for ASR Introduction Pitch-scaled harmonic filter Recognition experiments Results Conclusion

Transcript of aperiodic periodic Production of /z/: Covariation and weighting of harmonically decomposed streams...

Page 1: aperiodic periodic Production of /z/: Covariation and weighting of harmonically decomposed streams for ASR Introduction Pitch-scaled harmonic filter Recognition.

aperiodic periodic

Production of /z/:

Covariation and weighting of harmonically decomposed

streams for ASR

Introduction

Pitch-scaled harmonic filter

Recognition experiments

Results

Conclusion

Page 2: aperiodic periodic Production of /z/: Covariation and weighting of harmonically decomposed streams for ASR Introduction Pitch-scaled harmonic filter Recognition.

Motivation and aims

• Most speech sounds are either voiced or unvoiced, which have very different properties:

– voiced: quasi-periodic signal from phonation

– unvoiced: aperiodic signal from turbulence noise

• Do these properties allow humans to recognize speech in noise?

Maybe, we can use this information to help ASR...

by computing separate features for the two parts.

• Are their two contributions complementary?

http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/ INTRODUCTION

Page 3: aperiodic periodic Production of /z/: Covariation and weighting of harmonically decomposed streams for ASR Introduction Pitch-scaled harmonic filter Recognition.

aperiodic contribution periodic contribution

http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/ INTRODUCTION

Voiced and unvoiced parts of a speech signal

Production of /z/:

Page 4: aperiodic periodic Production of /z/: Covariation and weighting of harmonically decomposed streams for ASR Introduction Pitch-scaled harmonic filter Recognition.

speech waveform

aperiodic waveform

s(n)

periodic waveform

http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/ METHOD

Pitch-scaled harmonic filter

u(n)^

time shifting

v(n)^

PSHF. . .

optimised pitch

f0raw

f0opt

pitch optimisation

pitch extraction

Nopt

PSHFPSHF

re-splicing

Page 5: aperiodic periodic Production of /z/: Covariation and weighting of harmonically decomposed streams for ASR Introduction Pitch-scaled harmonic filter Recognition.

Orig

inal

Per

iodi

cA

perio

dic

http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/ METHOD

Decomposition example (waveforms)

Page 6: aperiodic periodic Production of /z/: Covariation and weighting of harmonically decomposed streams for ASR Introduction Pitch-scaled harmonic filter Recognition.

http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/ METHOD

Orig

inal

Per

iodi

cA

perio

dic

Decomposition ex. (spectrograms)

Page 7: aperiodic periodic Production of /z/: Covariation and weighting of harmonically decomposed streams for ASR Introduction Pitch-scaled harmonic filter Recognition.

http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/ METHOD

Orig

inal

Per

iodi

cA

perio

dic

Decomposition ex. (MFCC specs.)

Page 8: aperiodic periodic Production of /z/: Covariation and weighting of harmonically decomposed streams for ASR Introduction Pitch-scaled harmonic filter Recognition.

http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/ METHOD

Speech database: Aurora 2.0

• From TIdigits database of connected English digit strings (male & female speakers), filtered with G.712 at 8 kHz.

Data type Signal-to-Noise Ratio (dB)

clean-condition

multi-condition 20 15 10 5

set A (same noises)

20 15 10 5 0 -5

set B (different noises)

20 15 10 5 0 -5

set C (diffferent channel)

20 15 10 5 0 -5

TR

AIN

TE

ST

Page 9: aperiodic periodic Production of /z/: Covariation and weighting of harmonically decomposed streams for ASR Introduction Pitch-scaled harmonic filter Recognition.

http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/ METHOD

Description of the experiments

• Baseline experiment: [base]– standard parameterisation of the original waveforms

(i.e., MFCC,+Δ,+ΔΔ)

• PCA experiments: [pca26, pca78, pca13 and pca39]– decorrelation of the feature vectors, and reduction of

the number of coefficients

• Split experiments: [split, split1]– adjustment of stream weights (periodic vs. aperiodic)

Caveat: pitch values were derived from clean speech files, for entire database!

Page 10: aperiodic periodic Production of /z/: Covariation and weighting of harmonically decomposed streams for ASR Introduction Pitch-scaled harmonic filter Recognition.

PCA26:

PCA78:

PCA13:

PCA39:

MFCC +Δ, +Δ2catPSHF PCA

MFCC +Δ, +Δ2 catPSHF PCA

MFCC +Δ, +Δ2 catPSHF PCA

MFCC +Δ, +Δ2 catPSHF PCA

BASE: MFCCwaveform features

+Δ, +Δ2

http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/ METHOD

Parameterisations

SPLIT: MFCC +Δ, +Δ2 catPSHF

SPLIT1: MFCC +Δ, +Δ2 catPSHF

Page 11: aperiodic periodic Production of /z/: Covariation and weighting of harmonically decomposed streams for ASR Introduction Pitch-scaled harmonic filter Recognition.

Word Error Rate (%) clean multi overall base 47.4 21.7 34.6

pca26 33.8 11.4 22.6 pca78 42.7 12.8 27.7 pca13 28.3 13.0 20.7 pca39 30.3 14.5 22.4

http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/ RESULTS

Full-sized PCA results

Page 12: aperiodic periodic Production of /z/: Covariation and weighting of harmonically decomposed streams for ASR Introduction Pitch-scaled harmonic filter Recognition.

PCA26PCA39

• clean+ multi

http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/ RESULTS

Variance of Principal Components

Page 13: aperiodic periodic Production of /z/: Covariation and weighting of harmonically decomposed streams for ASR Introduction Pitch-scaled harmonic filter Recognition.

PCA26 experiment’s results

CLEAN MULTI

Page 14: aperiodic periodic Production of /z/: Covariation and weighting of harmonically decomposed streams for ASR Introduction Pitch-scaled harmonic filter Recognition.

Word Error Rate (%) clean multi overall base 47.4 21.7 34.6

pca26 29.0 11.4 20.2 pca78 38.3 12.1 25.2 pca13 27.6 12.6 20.1 pca39 29.3 12.5 20.9

http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/ RESULTS

Summary of best PCA results

Page 15: aperiodic periodic Production of /z/: Covariation and weighting of harmonically decomposed streams for ASR Introduction Pitch-scaled harmonic filter Recognition.

Split experiment’s results

Page 16: aperiodic periodic Production of /z/: Covariation and weighting of harmonically decomposed streams for ASR Introduction Pitch-scaled harmonic filter Recognition.

Word Error Rate (%) clean multi overall base 47.4 21.7 34.6

split (=0) 62.9 44.3 53.6

split (=1) 28.5 11.7 20.1

split (=2) 22.7 11.5 17.1

http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/ RESULTS

Sample Split results

Note: same value of stream weights used in training as in testing, for Split.

Page 17: aperiodic periodic Production of /z/: Covariation and weighting of harmonically decomposed streams for ASR Introduction Pitch-scaled harmonic filter Recognition.

Split1 experiment’s results

Page 18: aperiodic periodic Production of /z/: Covariation and weighting of harmonically decomposed streams for ASR Introduction Pitch-scaled harmonic filter Recognition.

Word Error Rate (%) WER (%) clean multi overall abs. rel. base 47.4 21.7 34.6 0.0 0.0

pca26 29.0 11.4 20.2 14.4 41.6 pca78 38.3 12.1 25.2 9.4 27.2 pca13 27.6 12.6 20.1 14.5 41.9 pca39 29.3 12.5 20.9 13.7 39.6

split 22.6 11.0 16.8 17.8 51.4 split1 21.0 10.9 16.0 18.6 53.8

Word Error Rate (%) clean multi overall base 47.4 21.7 34.6

pca26 29.0 11.4 20.2 pca78 38.3 12.1 25.2 pca13 27.6 12.6 20.1 pca39 29.3 12.5 20.9

split 22.6 11.0 16.8 split1 21.0 10.9 16.0

http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/ RESULTS

Summary of PCA & Split results

Page 19: aperiodic periodic Production of /z/: Covariation and weighting of harmonically decomposed streams for ASR Introduction Pitch-scaled harmonic filter Recognition.

http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/ CONCLUSION

Conclusions• PSHF module split Aurora’s speech waveforms into

two synchronous streams (periodic and aperiodic)– large improvements over the single-stream Baseline

• Split was better than all PCA combinations:– PCA26/13 better than PCA 78/39, and PCA13 best

– Split1 marginally better than Split

• Periodic speech segments give robustness to noise.

Further work– Modeling: how best to combine the streams?

– LVCSR: evaluate front end on TIMIT (phone recognition).

– Robust pitch tracking

Page 20: aperiodic periodic Production of /z/: Covariation and weighting of harmonically decomposed streams for ASR Introduction Pitch-scaled harmonic filter Recognition.

COLUMBO PROJECT: Harmonic decomposition

applied to ASR

Philip J.B. Jackson 1 <[email protected]>

David M. Moreno 2 <[email protected]>

Javier Hernando 2 <[email protected]>

Martin J. Russell 3 <[email protected]>

1 2 3

http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/