Speech Signal Processing I By Edmilson Morais And Prof. Greg. Dogil Stuttgart, October 18, 2001.
-
Upload
phebe-kennedy -
Category
Documents
-
view
215 -
download
0
Transcript of Speech Signal Processing I By Edmilson Morais And Prof. Greg. Dogil Stuttgart, October 18, 2001.
Speech Signal Speech Signal Processing IProcessing I
By By
Edmilson Morais And Prof. Greg. DogilEdmilson Morais And Prof. Greg. Dogil
Stuttgart, October 18, 2001Stuttgart, October 18, 2001
Goals of the CourseGoals of the Course
Our partOur part Basic theoretical concepts about Basic theoretical concepts about
Speech Signal Processing - Speech Signal Processing - SDSPSDSP Waveform generation for TTS systems - Waveform generation for TTS systems - TTSTTS Automatic Speech Recognition (Statistical approach)- Automatic Speech Recognition (Statistical approach)-
ASRASR Fundaments of programing in Matlab Fundaments of programing in Matlab
It will be the tool used for our simulationsIt will be the tool used for our simulations Your part ?Your part ?
Describe and justify the important aspects and Describe and justify the important aspects and drawbacks in the algorithm.drawbacks in the algorithm.
Next term: Next term: Speech Signal Processing IISpeech Signal Processing II Going deeper into more Theoretical and Pratical Going deeper into more Theoretical and Pratical
aspects of : aspects of : SSPSSP, , TTSTTS and and ASRASR. .
Tutorial of MatlabTutorial of Matlab
Principles of linear algebraPrinciples of linear algebra Vectors, Matrices, linear systemsVectors, Matrices, linear systems
Programing in MatlabPrograming in Matlab Variables, operators, ...Variables, operators, ... if statements, switch statements, for loops, if statements, switch statements, for loops,
while loops, continue statements, break while loops, continue statements, break statements, ...statements, ...
I/O operationsI/O operations Graphical visualizationGraphical visualization Executable filesExecutable files SubroutinesSubroutines
Matlab :Matlab : Graphical visualization Graphical visualization
[X,Y] = meshgrid(-8:.5:8);
R = sqrt(X.^2 + Y.^2) + eps;
Z = sin(R)./R;
mesh(X,Y,Z,'EdgeColor','black')
surf(X,Y,Z,'FaceColor','red','EdgeColor','none');
camlight left; lighting phong
Matlab :Matlab : Graphical visualization – Graphical visualization – Optimization in a hiperbolic (quadratic) Optimization in a hiperbolic (quadratic) surfacesurface
Mean squarederror - E
Weight
minE
w
E
w
Ew
0 0w )(nw
)1( nw -200 -150 -100 -50 0 50 100 150 200-200
-150
-100
-50
0
50
100
150
200
-200 -150 -100 -50 0 50 100 150 200-200
-150
-100
-50
0
50
100
150
200
-4-2
02
4
-4
-2
0
2
40
20
40
60
80
H(1)H(2)
Err
o q
uadrá
tico
SDSP : SDSP : Looking through timeLooking through time
time
amplitude
Speech signal : Analog and digitalSpeech signal : Analog and digital
Sampling rate
quantization
SDSP SDSP : Transformation and Digital filters: Transformation and Digital filters
Transformations Z-Transforms, Fourier transforms
Digital filters FIR, IIR
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-500
-400
-300
-200
-100
0
100
Normalized Frequency ( rad/sample)
Phase (degrees)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-80
-60
-40
-20
0
Normalized Frequency ( rad/sample)
Magnitude (dB
)
SDSP –SDSP – Frame based analysis Frame based analysis
Hanning window : w
Waveform multiplied for the hanning window : xw
Magnitude of the spectrum of xw
Freq. Response of the LP-filter
SDSP - SDSP - Looking at frequency components Looking at frequency components through timethrough time
Current
Previous
Current
Previous
Before smoothing
After smoothing
SDSP : SDSP : Vector quantizationVector quantization
Voronoi Space : Centroid and Distortion meassure
TTS - TTS - Waveform generation for TTSWaveform generation for TTS
Analysis and Resynthesis – Coding and DecodingAnalysis and Resynthesis – Coding and Decoding
LP AnalysisA(z)
Inverse Filter1
A(z )Pitch Marks
PrototypesSampling
Synthesis Filter
A(z )
TFI ResidueSynthesis
x
e
En
Storage Enviroment
x
A
A
A Fo
Original Speech Signal
Synthesized Speech Signal
Coding
Decoding
ProsodicInformation
Marks
Marks
Fo
En
U/UV
U/UV
.
.
Parametrization : Parametrization : Mapping the Mapping the waveform into a set waveform into a set of parametersof parameters
Reconstruction:Reconstruction: Synthesis of the Synthesis of the waveform from the set waveform from the set of parameters.of parameters.
Prosody :Prosody :
F0F0
DurationDuration
AmplitudeAmplitude
AA – LP coeficients – LP coeficients
ee – LP residue – LP residue
EnEn – Prototypes – Prototypes
FoFo – Fundamental – Fundamental frequencyfrequency
U/UVU/UV – Voiced / – Voiced / Unvoiced Unvoiced transitionstransitions
TTS - TTS - Waveform generation for TTSWaveform generation for TTS
Speech codingSpeech coding Parametric coders, Waveform coders, Hybrid codersParametric coders, Waveform coders, Hybrid coders
TTS – Concatenative approachTTS – Concatenative approach Time scale and Frequency scale modificationsTime scale and Frequency scale modifications Spectral smoothingsSpectral smoothings Unit selectionUnit selection
Original Resynthesized Modified : sinsin((x+x+))
Original TTS
ASR - ASR - Automatic Speech RecognitionAutomatic Speech Recognition
Front-End Signal ProcessingFront-End Signal Processing Feature extractionFeature extraction
Perceptual domain, Articulatory domainPerceptual domain, Articulatory domain Acoustic modelingAcoustic modeling
HMM : Hidden Markov ModelHMM : Hidden Markov Model ANN/HMM : Hybrid models - Artificial ANN/HMM : Hybrid models - Artificial
Neural Network and HMMNeural Network and HMM Statistical Language ModelingStatistical Language Modeling
N-grammars, smoothing techniquesN-grammars, smoothing techniques Search : DecodingSearch : Decoding
Viterbi, Stack decoding, ...Viterbi, Stack decoding, ...
ASR – HMM - ASR – HMM - TopologyTopology
Ergotic modelErgotic model Left-right model Left-right model
ASR – HMM – ASR – HMM – Basic Basic principleprinciple
X X X X X X X X X X X X X X X X X1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
u ã ã p
u u u i i ã ã ã p
u n i k k k k k ã ã ã p p
u i kn n n n n
u u u n i i i i ã ã ã ã pk k k k
p(x | )
p(x | )p
p(x | )k
p(x | )n
p(x | )ã
p(x | )i
u
x
a a a a a
a a a a
aa a
a
ASR – HMM - ASR – HMM - Viterbi alignmentViterbi alignment
50 100 150 200 50 100 150 200
(a) (b)
50 100 150 200 50 100 150 200
(c) (d)
ASR – HMM – ASR – HMM – Forward-BackwardForward-Backward
/ # // # / / ã /
ASR – ANN/HMMASR – ANN/HMM
X n-c X n X n+d
P( q |x , k n
Evaluation : Evaluation : Exercises and Exercises and SimulationsSimulations
List of ExercisesList of Exercises SDSP, TTS, ASRSDSP, TTS, ASR
SimulationsSimulations SDSPSDSP
Vector quantizationVector quantization TTSTTS
Waveform InterpolationWaveform Interpolation ASRASR
Acoustic modeling using : HMM and ANN+HMMAcoustic modeling using : HMM and ANN+HMM Language modelingLanguage modeling DecodingDecoding
Evaluation : Evaluation : ReportReport
ReportsReports Write the analysis and results of the simulation in a format Write the analysis and results of the simulation in a format
of a paperof a paper 4 pages, two colunms. Sections
Abstract Introduction Brief theoretical description of the method Methodology used to perform the experiment Results Conclusions and suggestions for further works Bibliograph
Days of classes
Normal semester
2001October : 18, 25, (01 is a hollyday)November : 8, 15, 22, 29December : 6,13,20
2002January : 10,17,24,31 February : 7,14
Total : 15 days.
Option two
2001October : 18, 25November : 8, 15, 22, 29
2002February : 7,14March : An one week block seminar : 1.5 hours a day.
Total : 13 days.
Option one
2001October : 16,18,23,25,30November : 6,8,13,15,20,22,27,29
2002February : 5,7,12,14
Total : 17 days.
Option three
2002March : An one week block seminar : 3 hours a day.
Equivalent to 15 days