1cm Introducing COVAREP: A collaborative voice analysis...
Transcript of 1cm Introducing COVAREP: A collaborative voice analysis...
Introducing COVAREP:
A collaborative voice analysis repositoryfor speech technologies
John Kane
Wednesday November 27th, 2013SIGMEDIA-group
TCD
COVAREP - Open-source speech processing repository 1
Introduction
(a) Gilles Degottex (b) Thomas Drugman
(c) Tuomo Raitio (d) Stefan Scherer
COVAREP - Open-source speech processing repository 2
Motivation
“...open, well-documented, and well-tested scientific code isessential not only to reproducibility in modern scientific research,but to the very progression of research itself.”
COVAREP - Open-source speech processing repository 3
Related toolkits
KALDI - Speech recognition toolkit
- Speech processing toolkit
VOICEBOX - Speech analysis toolkit
COVAREP - Open-source speech processing repository 4
COVAREP - Aims
Website: http://covarep.github.io/covarep/index.htmlGitHub: https://github.com/covarep/covarep
COVAREP - Open-source speech processing repository 6
COVAREP - Aims
I More reproducible research
I Increase the availability and impact of speech processingalgorithms
I Participation and feedback
COVAREP - Open-source speech processing repository 7
COVAREP - Scope
I Broad scope - any speech signal processing algorithms
I Speech analysis, synthesis, conversion, transformation, speechquality, enhancement, glottal source/voice quality analysis,etc.
I Use! Contribute!
COVAREP - Open-source speech processing repository 8
Overview of COVAREP
Speech
Signal
PolarityDetection
GCIEstimation
PitchTracking
GlottaldFlowEstimation
GlottaldFlowParameterization
SpectraldEnvelopeEstimationd
FormantTracking
Phase-basedRepresentation
SinusoidalModeling
COVAREP - Open-source speech processing repository 9
Overview of COVAREP
SpeechSignal
PolarityDetection
GCIEstimation
PitchTracking
Glottal FlowEstimation
Glottal FlowParameterization
Spectral EnvelopeEstimation
FormantTracking
Phase-basedRepresentation
SinusoidalModeling
1. Periodicity
COVAREP - Open-source speech processing repository 10
Overview of COVAREP
SpeechSignal
PolarityDetection
GCIEstimation
PitchTracking
Glottal FlowEstimation
Glottal FlowParameterization
Spectral EnvelopeEstimation
FormantTracking
Phase-basedRepresentation
SinusoidalModeling
1. Periodicity
2. Spectral envelope
COVAREP - Open-source speech processing repository 11
Overview of COVAREP
SpeechSignal
PolarityDetection
GCIEstimation
PitchTracking
Glottal FlowEstimation
Glottal FlowParameterization
Spectral EnvelopeEstimation
FormantTracking
Phase-basedRepresentation
SinusoidalModeling
1. Periodicity
2. Spectral envelope
3. Sine modelling
COVAREP - Open-source speech processing repository 12
Overview of COVAREP
SpeechSignal
PolarityDetection
GCIEstimation
PitchTracking
Glottal FlowEstimation
Glottal FlowParameterization
Spectral EnvelopeEstimation
FormantTracking
Phase-basedRepresentation
SinusoidalModeling
1. Periodicity
2. Spectral envelope
3. Sine modelling
4. Glottal analysis
COVAREP - Open-source speech processing repository 13
Overview of COVAREP
SpeechSignal
PolarityDetection
GCIEstimation
PitchTracking
Glottal FlowEstimation
Glottal FlowParameterization
Spectral EnvelopeEstimation
FormantTracking
Phase-basedRepresentation
SinusoidalModeling
1. Periodicity
2. Spectral envelope
3. Sine modelling
4. Glottal analysis
4. Phase analysis
COVAREP - Open-source speech processing repository 14
COVAREP - Periodicity & synchronicity
SpeechSignal
PolarityDetection
GCIEstimation
PitchTracking
Glottal FlowEstimation
Glottal FlowParameterization
Spectral EnvelopeEstimation
FormantTracking
Phase-basedRepresentation
SinusoidalModeling
1. Periodicity
COVAREP - Open-source speech processing repository 15
COVAREP - Periodicity & synchronicity
I Polarity detection
I f0 and voicing decision extraction
I Detection of glottal closure instants
COVAREP - Open-source speech processing repository 16
Periodicity & synchronicity - F0 extraction
0 1000 2000 3000 4000 5000−50
0
50A
mpl
itude
(dB
)
Frequency (Hz)
Speech spectrum
Speech amplitude spectrum
COVAREP - Open-source speech processing repository 17
Periodicity & synchronicity - F0 extraction
0 1000 2000 3000 4000 5000−50
0
50A
mpl
itude
(dB
)
Frequency (Hz)
Speech spectrum
0 1000 2000 3000 4000 5000−60
−40
−20
0
Frequency (Hz)
Am
plitu
de (
dB)
Residual spectrum
Envelope-removed speech amplitude spectrum
COVAREP - Open-source speech processing repository 18
Periodicity & synchronicity - F0 extraction
0 1000 2000 3000 4000 5000−50
0
50A
mpl
itude
(dB
)
Frequency (Hz)
Speech spectrum
0 1000 2000 3000 4000 5000−60
−40
−20
0
Frequency (Hz)
Am
plitu
de (
dB)
Residual spectrum
SRH(f) = E (f )+∑N
k=2[E (k ·f )−E ((k−0.5)·f )] for f ∈ [F 0min,F 0max ]
where E is the residual spectrum, f is frequency (Hz) and N is the number of harmonics considered
COVAREP - Open-source speech processing repository 19
Periodicity & synchronicity - F0 extractionF
requ
ency
(H
z)
Time (seconds)
Residual harmonic summation
0.5 1 1.5 2 2.5 350
100
150
200
250
Residual harmonic summation over time
COVAREP - Open-source speech processing repository 20
COVAREP - Periodicity & synchronicityFre
quen
cy [H
z]
Discrete All−Pole (DAP) envelope
0.56 0.58 0.6 0.62 0.64 0.66 0.68 0.70
1000
2000
3000
4000
5000
0.56 0.58 0.6 0.62 0.64 0.66 0.68 0.7
−0.1
−0.05
0
0.05
0.1
0.15Glottal Flow (GF) derivative with GCIs
Time [s]
Am
plitu
de
Detected glottal closure instants
COVAREP - Open-source speech processing repository 21
COVAREP - Spectral envelope estimation
SpeechSignal
PolarityDetection
GCIEstimation
PitchTracking
Glottal FlowEstimation
Glottal FlowParameterization
Spectral EnvelopeEstimation
FormantTracking
Phase-basedRepresentation
SinusoidalModeling
2. Spectral envelope
COVAREP - Open-source speech processing repository 22
COVAREP - Spectral envelope estimation
I Discrete all-pole (DAP) model
I “True envelope” (TE) - spectral envelope by iterative cepstralsmoothing
I Weighted linear prediction
I Conversion from envelope to Mel-Frequency CepstralCoefficients (MFCC)
COVAREP - Open-source speech processing repository 23
COVAREP - Spectral envelope estimation
0 1000 2000 3000 4000 5000 6000 7000 8000−50
−40
−30
−20
−10
0
10
20
30Speech spectrum
Am
plitu
de (
dB)
Frequency (Hz)
Speech amplitude spectrumCOVAREP - Open-source speech processing repository 24
COVAREP - Spectral envelope estimation
−50
−30
−10
10
30A
mpl
itude
(dB
)Speech spectrum with mel−spaced filters
Frequency (Hz)
0 1000 2000 3000 4000 5000 6000 7000 80000
0.25
0.5
0.75
1
Speech spectrum with mel-spaced triangular filtersCOVAREP - Open-source speech processing repository 25
COVAREP - Spectral envelope estimation
0 1000 2000 3000 4000 5000 6000 7000 8000−80
−60
−40
−20
0
20
40A
mpl
itude
(dB
)
Frequency (Hz)
Speech spectrum with "True Envelope"
Speech spectrum with TE spectral envelopeCOVAREP - Open-source speech processing repository 26
COVAREP - Spectral envelope estimation
0 1000 2000 3000 4000 5000 6000 7000 8000−50
−30
−10
10
30A
mpl
itude
(dB
)"True Envelope" spectrum with mel−spaced filters
Frequency (Hz)
0
0.25
0.5
0.75
1
TE spectral envelope with mel-spaced triangular filtersCOVAREP - Open-source speech processing repository 27
COVAREP - Sinusoidal modelling
SpeechSignal
PolarityDetection
GCIEstimation
PitchTracking
Glottal FlowEstimation
Glottal FlowParameterization
Spectral EnvelopeEstimation
FormantTracking
Phase-basedRepresentation
SinusoidalModeling
3. Sine modelling
COVAREP - Open-source speech processing repository 28
COVAREP - Sinusoidal modelling
I Harmonic model
I Quasi-Harmonic Model (QHM)
I Adaptive Harmonic Model (aHM)
I Harmonic synthesis
COVAREP - Open-source speech processing repository 29
COVAREP - Glottal analysis
SpeechSignal
PolarityDetection
GCIEstimation
PitchTracking
Glottal FlowEstimation
Glottal FlowParameterization
Spectral EnvelopeEstimation
FormantTracking
Phase-basedRepresentation
SinusoidalModeling
4. Glottal analysis
COVAREP - Open-source speech processing repository 30
COVAREP - Glottal analysis
I Deconvolution of glottal source and vocal tract components
I Algorithms for parameterising the glottal source
I Detection of changes in tone-of-voice and voice quality
COVAREP - Open-source speech processing repository 32
COVAREP - Glottal analysis
0 0.005 0.01 0.015 0.02
125
250
500
1000
2000
4000
8000
Time (seconds)
Fre
quen
cy (
Hz)
Wavelet decomposition of an impulse
COVAREP - Open-source speech processing repository 34
COVAREP - Glottal analysis
0.3 0.31 0.32 0.33 0.34 0.350
0.2
0.4
0.6
0.8
1
Time (seconds)
Am
plitu
de
0.3 0.31 0.32 0.33 0.34 0.350
0.2
0.4
0.6
0.8
1
Time (seconds)
Am
plitu
de
125 Hz 250 Hz 500 Hz 1 kHz 2 kHz 4 kHz 8 kHz
All peaks across the different frequency bands
for breathy (top) and tense (bottom) speech samples
COVAREP - Open-source speech processing repository 35
COVAREP - Phase processing
SpeechSignal
PolarityDetection
GCIEstimation
PitchTracking
Glottal FlowEstimation
Glottal FlowParameterization
Spectral EnvelopeEstimation
FormantTracking
Phase-basedRepresentation
SinusoidalModeling
4. Phase analysis
COVAREP - Open-source speech processing repository 36
COVAREP - Phase processing
I Relative phase shift - speaker verification
I Phase distortion - emotional valence detection
I Chirp group delay represenation - detection of voice disorders
COVAREP - Open-source speech processing repository 37
Emotion classification experiment
I Speech data: Berlin emotion database (10 speakers, 7 actedemotions, 500+ utterances)
I Class labellng: Emotion vs non-emotion (binary),Passive-neutral-active (3-class)
I Feature extraction: Using COVAREP v1.1.0
I Classification: Support vector machines (RBF kernel)
I Validation: Speaker independent, leave-one-speaker-out
COVAREP - Open-source speech processing repository 38
Emotion classification experiment
Feature sets
I MFCC: Standard Mel-frequency cepstral coefficients
I TE-MFCC MFCCs derived from True Enveloperepresentation
I Glottal/VQ: Glottal and voice quality related features
I ALL: TE-MFCC and Glottal/VQ combined
I SEL: 10 most discriminative features
Speaker independent - Leave-one-speaker-out classificationexperiments
COVAREP - Open-source speech processing repository 39
Emotion classification experiment - Results
Neutral Anger Bored Disgust Fear Happy Sad
−0.4
−0.2
0
Neutral
peak
Slo
pe
Anger Bored Disgust Fear Happy Sad0.5
1
1.5
2
Rd
COVAREP - Open-source speech processing repository 40
Emotion classification experiment - Results
MFCCs TE_MFCCs Glottal/VQ ALL SEL0
10
20
30
40
Err
or (
%)
Emotion vs neutral
Activation (3−class)
COVAREP - Open-source speech processing repository 41
Emotion classification experiment - Results
Table: Confusion matrix (%)
MFCCs Glottal/VQNeutral Emotion Neutral Emotion
Neutral 48 52 82 18Emotion 18 82 27 73
COVAREP - Open-source speech processing repository 42
Potential applications for COVAREP algorithms
I Speech synthesis
I Speech recognition
I Modelling variation in speaking styles and affective states
I Speaker verification
I Voice pathology detection
I Lots of others!!
COVAREP - Open-source speech processing repository 44
COVAREP summary
I Repository of open-source speech processing algorithms
I Cross-unversity/country effort
I Fast access to newly developed state-of-the-art algorithms
I Improve visability and impact
I More reproducible research
COVAREP - Open-source speech processing repository 45
Thank you!
Resources:Website: http://covarep.github.io/covarep/GitHub: https://github.com/covarep/covarepPaper: Degottex, G., Kane, J., Drugman, T., Raitio, T., “COVAREP - A
collaborative voice analysis repository for speech technologies”, Submitted to
ICASSP 2014
COVAREP - Open-source speech processing repository 47