Speech Signal Processing I By Edmilson Morais And Prof. Greg. Dogil Stuttgart, October 18, 2001.

21
Speech Signal Speech Signal Processing I Processing I By By Edmilson Morais And Prof. Greg. Dogil Edmilson Morais And Prof. Greg. Dogil Stuttgart, October 18, 2001 Stuttgart, October 18, 2001

Transcript of Speech Signal Processing I By Edmilson Morais And Prof. Greg. Dogil Stuttgart, October 18, 2001.

Page 1: Speech Signal Processing I By Edmilson Morais And Prof. Greg. Dogil Stuttgart, October 18, 2001.

Speech Signal Speech Signal Processing IProcessing I

By By

Edmilson Morais And Prof. Greg. DogilEdmilson Morais And Prof. Greg. Dogil

Stuttgart, October 18, 2001Stuttgart, October 18, 2001

Page 2: Speech Signal Processing I By Edmilson Morais And Prof. Greg. Dogil Stuttgart, October 18, 2001.

Goals of the CourseGoals of the Course

Our partOur part Basic theoretical concepts about Basic theoretical concepts about

Speech Signal Processing - Speech Signal Processing - SDSPSDSP Waveform generation for TTS systems - Waveform generation for TTS systems - TTSTTS Automatic Speech Recognition (Statistical approach)- Automatic Speech Recognition (Statistical approach)-

ASRASR Fundaments of programing in Matlab Fundaments of programing in Matlab

It will be the tool used for our simulationsIt will be the tool used for our simulations Your part ?Your part ?

Describe and justify the important aspects and Describe and justify the important aspects and drawbacks in the algorithm.drawbacks in the algorithm.

Next term: Next term: Speech Signal Processing IISpeech Signal Processing II Going deeper into more Theoretical and Pratical Going deeper into more Theoretical and Pratical

aspects of : aspects of : SSPSSP, , TTSTTS and and ASRASR. .

Page 3: Speech Signal Processing I By Edmilson Morais And Prof. Greg. Dogil Stuttgart, October 18, 2001.

Tutorial of MatlabTutorial of Matlab

Principles of linear algebraPrinciples of linear algebra Vectors, Matrices, linear systemsVectors, Matrices, linear systems

Programing in MatlabPrograming in Matlab Variables, operators, ...Variables, operators, ... if statements, switch statements, for loops, if statements, switch statements, for loops,

while loops, continue statements, break while loops, continue statements, break statements, ...statements, ...

I/O operationsI/O operations Graphical visualizationGraphical visualization Executable filesExecutable files SubroutinesSubroutines

Page 4: Speech Signal Processing I By Edmilson Morais And Prof. Greg. Dogil Stuttgart, October 18, 2001.

Matlab :Matlab : Graphical visualization Graphical visualization

[X,Y] = meshgrid(-8:.5:8);

R = sqrt(X.^2 + Y.^2) + eps;

Z = sin(R)./R;

mesh(X,Y,Z,'EdgeColor','black')

surf(X,Y,Z,'FaceColor','red','EdgeColor','none');

camlight left; lighting phong

Page 5: Speech Signal Processing I By Edmilson Morais And Prof. Greg. Dogil Stuttgart, October 18, 2001.

Matlab :Matlab : Graphical visualization – Graphical visualization – Optimization in a hiperbolic (quadratic) Optimization in a hiperbolic (quadratic) surfacesurface

Mean squarederror - E

Weight

minE

w

E

w

Ew

0 0w )(nw

)1( nw -200 -150 -100 -50 0 50 100 150 200-200

-150

-100

-50

0

50

100

150

200

-200 -150 -100 -50 0 50 100 150 200-200

-150

-100

-50

0

50

100

150

200

-4-2

02

4

-4

-2

0

2

40

20

40

60

80

H(1)H(2)

Err

o q

uadrá

tico

Page 6: Speech Signal Processing I By Edmilson Morais And Prof. Greg. Dogil Stuttgart, October 18, 2001.

SDSP : SDSP : Looking through timeLooking through time

time

amplitude

Speech signal : Analog and digitalSpeech signal : Analog and digital

Sampling rate

quantization

Page 7: Speech Signal Processing I By Edmilson Morais And Prof. Greg. Dogil Stuttgart, October 18, 2001.

SDSP SDSP : Transformation and Digital filters: Transformation and Digital filters

Transformations Z-Transforms, Fourier transforms

Digital filters FIR, IIR

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-500

-400

-300

-200

-100

0

100

Normalized Frequency ( rad/sample)

Phase (degrees)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-80

-60

-40

-20

0

Normalized Frequency ( rad/sample)

Magnitude (dB

)

Page 8: Speech Signal Processing I By Edmilson Morais And Prof. Greg. Dogil Stuttgart, October 18, 2001.

SDSP –SDSP – Frame based analysis Frame based analysis

Hanning window : w

Waveform multiplied for the hanning window : xw

Magnitude of the spectrum of xw

Freq. Response of the LP-filter

Page 9: Speech Signal Processing I By Edmilson Morais And Prof. Greg. Dogil Stuttgart, October 18, 2001.

SDSP - SDSP - Looking at frequency components Looking at frequency components through timethrough time

Current

Previous

Current

Previous

Before smoothing

After smoothing

Page 10: Speech Signal Processing I By Edmilson Morais And Prof. Greg. Dogil Stuttgart, October 18, 2001.

SDSP : SDSP : Vector quantizationVector quantization

Voronoi Space : Centroid and Distortion meassure

Page 11: Speech Signal Processing I By Edmilson Morais And Prof. Greg. Dogil Stuttgart, October 18, 2001.

TTS - TTS - Waveform generation for TTSWaveform generation for TTS

Analysis and Resynthesis – Coding and DecodingAnalysis and Resynthesis – Coding and Decoding

LP AnalysisA(z)

Inverse Filter1

A(z )Pitch Marks

PrototypesSampling

Synthesis Filter

A(z )

TFI ResidueSynthesis

x

e

En

Storage Enviroment

x

A

A

A Fo

Original Speech Signal

Synthesized Speech Signal

Coding

Decoding

ProsodicInformation

Marks

Marks

Fo

En

U/UV

U/UV

.

.

Parametrization : Parametrization : Mapping the Mapping the waveform into a set waveform into a set of parametersof parameters

Reconstruction:Reconstruction: Synthesis of the Synthesis of the waveform from the set waveform from the set of parameters.of parameters.

Prosody :Prosody :

F0F0

DurationDuration

AmplitudeAmplitude

AA – LP coeficients – LP coeficients

ee – LP residue – LP residue

EnEn – Prototypes – Prototypes

FoFo – Fundamental – Fundamental frequencyfrequency

U/UVU/UV – Voiced / – Voiced / Unvoiced Unvoiced transitionstransitions

Page 12: Speech Signal Processing I By Edmilson Morais And Prof. Greg. Dogil Stuttgart, October 18, 2001.

TTS - TTS - Waveform generation for TTSWaveform generation for TTS

Speech codingSpeech coding Parametric coders, Waveform coders, Hybrid codersParametric coders, Waveform coders, Hybrid coders

TTS – Concatenative approachTTS – Concatenative approach Time scale and Frequency scale modificationsTime scale and Frequency scale modifications Spectral smoothingsSpectral smoothings Unit selectionUnit selection

Original Resynthesized Modified : sinsin((x+x+))

Original TTS

Page 13: Speech Signal Processing I By Edmilson Morais And Prof. Greg. Dogil Stuttgart, October 18, 2001.

ASR - ASR - Automatic Speech RecognitionAutomatic Speech Recognition

Front-End Signal ProcessingFront-End Signal Processing Feature extractionFeature extraction

Perceptual domain, Articulatory domainPerceptual domain, Articulatory domain Acoustic modelingAcoustic modeling

HMM : Hidden Markov ModelHMM : Hidden Markov Model ANN/HMM : Hybrid models - Artificial ANN/HMM : Hybrid models - Artificial

Neural Network and HMMNeural Network and HMM Statistical Language ModelingStatistical Language Modeling

N-grammars, smoothing techniquesN-grammars, smoothing techniques Search : DecodingSearch : Decoding

Viterbi, Stack decoding, ...Viterbi, Stack decoding, ...

Page 14: Speech Signal Processing I By Edmilson Morais And Prof. Greg. Dogil Stuttgart, October 18, 2001.

ASR – HMM - ASR – HMM - TopologyTopology

Ergotic modelErgotic model Left-right model Left-right model

Page 15: Speech Signal Processing I By Edmilson Morais And Prof. Greg. Dogil Stuttgart, October 18, 2001.

ASR – HMM – ASR – HMM – Basic Basic principleprinciple

X X X X X X X X X X X X X X X X X1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

u ã ã p

u u u i i ã ã ã p

u n i k k k k k ã ã ã p p

u i kn n n n n

u u u n i i i i ã ã ã ã pk k k k

p(x | )

p(x | )p

p(x | )k

p(x | )n

p(x | )ã

p(x | )i

u

x

a a a a a

a a a a

aa a

a

Page 16: Speech Signal Processing I By Edmilson Morais And Prof. Greg. Dogil Stuttgart, October 18, 2001.

ASR – HMM - ASR – HMM - Viterbi alignmentViterbi alignment

50 100 150 200 50 100 150 200

(a) (b)

50 100 150 200 50 100 150 200

(c) (d)

Page 17: Speech Signal Processing I By Edmilson Morais And Prof. Greg. Dogil Stuttgart, October 18, 2001.

ASR – HMM – ASR – HMM – Forward-BackwardForward-Backward

/ # // # / / ã /

Page 18: Speech Signal Processing I By Edmilson Morais And Prof. Greg. Dogil Stuttgart, October 18, 2001.

ASR – ANN/HMMASR – ANN/HMM

X n-c X n X n+d

P( q |x , k n

Page 19: Speech Signal Processing I By Edmilson Morais And Prof. Greg. Dogil Stuttgart, October 18, 2001.

Evaluation : Evaluation : Exercises and Exercises and SimulationsSimulations

List of ExercisesList of Exercises SDSP, TTS, ASRSDSP, TTS, ASR

SimulationsSimulations SDSPSDSP

Vector quantizationVector quantization TTSTTS

Waveform InterpolationWaveform Interpolation ASRASR

Acoustic modeling using : HMM and ANN+HMMAcoustic modeling using : HMM and ANN+HMM Language modelingLanguage modeling DecodingDecoding

Page 20: Speech Signal Processing I By Edmilson Morais And Prof. Greg. Dogil Stuttgart, October 18, 2001.

Evaluation : Evaluation : ReportReport

ReportsReports Write the analysis and results of the simulation in a format Write the analysis and results of the simulation in a format

of a paperof a paper 4 pages, two colunms. Sections

Abstract Introduction Brief theoretical description of the method Methodology used to perform the experiment Results Conclusions and suggestions for further works Bibliograph

Page 21: Speech Signal Processing I By Edmilson Morais And Prof. Greg. Dogil Stuttgart, October 18, 2001.

Days of classes

Normal semester

2001October : 18, 25, (01 is a hollyday)November : 8, 15, 22, 29December : 6,13,20

2002January : 10,17,24,31 February : 7,14

Total : 15 days.

Option two

2001October : 18, 25November : 8, 15, 22, 29

2002February : 7,14March : An one week block seminar : 1.5 hours a day.

Total : 13 days.

Option one

2001October : 16,18,23,25,30November : 6,8,13,15,20,22,27,29

2002February : 5,7,12,14

Total : 17 days.

Option three

2002March : An one week block seminar : 3 hours a day.

Equivalent to 15 days