Speech Signal Processing I By Edmilson Morais And Prof. Greg. Dogil Stuttgart, October 18, 2001.

Speech Signal Speech Signal Processing IProcessing I

By By

Edmilson Morais And Prof. Greg. DogilEdmilson Morais And Prof. Greg. Dogil

Stuttgart, October 18, 2001Stuttgart, October 18, 2001

Goals of the CourseGoals of the Course

Our partOur part Basic theoretical concepts about Basic theoretical concepts about

Speech Signal Processing - Speech Signal Processing - SDSPSDSP Waveform generation for TTS systems - Waveform generation for TTS systems - TTSTTS Automatic Speech Recognition (Statistical approach)- Automatic Speech Recognition (Statistical approach)-

ASRASR Fundaments of programing in Matlab Fundaments of programing in Matlab

It will be the tool used for our simulationsIt will be the tool used for our simulations Your part ?Your part ?

Describe and justify the important aspects and Describe and justify the important aspects and drawbacks in the algorithm.drawbacks in the algorithm.

Next term: Next term: Speech Signal Processing IISpeech Signal Processing II Going deeper into more Theoretical and Pratical Going deeper into more Theoretical and Pratical

aspects of : aspects of : SSPSSP, , TTSTTS and and ASRASR. .

Tutorial of MatlabTutorial of Matlab

Principles of linear algebraPrinciples of linear algebra Vectors, Matrices, linear systemsVectors, Matrices, linear systems

Programing in MatlabPrograming in Matlab Variables, operators, ...Variables, operators, ... if statements, switch statements, for loops, if statements, switch statements, for loops,

while loops, continue statements, break while loops, continue statements, break statements, ...statements, ...

I/O operationsI/O operations Graphical visualizationGraphical visualization Executable filesExecutable files SubroutinesSubroutines

Matlab :Matlab : Graphical visualization Graphical visualization

[X,Y] = meshgrid(-8:.5:8);

R = sqrt(X.^2 + Y.^2) + eps;

Z = sin(R)./R;

mesh(X,Y,Z,'EdgeColor','black')

surf(X,Y,Z,'FaceColor','red','EdgeColor','none');

camlight left; lighting phong

Matlab :Matlab : Graphical visualization – Graphical visualization – Optimization in a hiperbolic (quadratic) Optimization in a hiperbolic (quadratic) surfacesurface

Mean squarederror - E

Weight

minE

w

E

w

Ew

0 0w )(nw

)1( nw -200 -150 -100 -50 0 50 100 150 200-200

-150

-100

-50

0

50

100

150

200

-200 -150 -100 -50 0 50 100 150 200-200

-150

-100

-50

0

50

100

150

200

-4-2

02

4

-4

-2

0

2

40

20

40

60

80

H(1)H(2)

Err

o q

uadrá

tico

SDSP : SDSP : Looking through timeLooking through time

time

amplitude

Speech signal : Analog and digitalSpeech signal : Analog and digital

Sampling rate

quantization

SDSP SDSP : Transformation and Digital filters: Transformation and Digital filters

Transformations Z-Transforms, Fourier transforms

Digital filters FIR, IIR

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-500

-400

-300

-200

-100

0

100

Normalized Frequency ( rad/sample)

Phase (degrees)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-80

-60

-40

-20

0

Normalized Frequency ( rad/sample)

Magnitude (dB

)

SDSP –SDSP – Frame based analysis Frame based analysis

Hanning window : w

Waveform multiplied for the hanning window : xw

Magnitude of the spectrum of xw

Freq. Response of the LP-filter

SDSP - SDSP - Looking at frequency components Looking at frequency components through timethrough time

Current

Previous

Current

Previous

Before smoothing

After smoothing

SDSP : SDSP : Vector quantizationVector quantization

Voronoi Space : Centroid and Distortion meassure

TTS - TTS - Waveform generation for TTSWaveform generation for TTS

Analysis and Resynthesis – Coding and DecodingAnalysis and Resynthesis – Coding and Decoding

LP AnalysisA(z)

Inverse Filter1

A(z )Pitch Marks

PrototypesSampling

Synthesis Filter

A(z )

TFI ResidueSynthesis

x

e

En

Storage Enviroment

x

A

A

A Fo

Original Speech Signal

Synthesized Speech Signal

Coding

Decoding

ProsodicInformation

Marks

Marks

Fo

En

U/UV

U/UV

.

.

Parametrization : Parametrization : Mapping the Mapping the waveform into a set waveform into a set of parametersof parameters

Reconstruction:Reconstruction: Synthesis of the Synthesis of the waveform from the set waveform from the set of parameters.of parameters.

Prosody :Prosody :

F0F0

DurationDuration

AmplitudeAmplitude

AA – LP coeficients – LP coeficients

ee – LP residue – LP residue

EnEn – Prototypes – Prototypes

FoFo – Fundamental – Fundamental frequencyfrequency

U/UVU/UV – Voiced / – Voiced / Unvoiced Unvoiced transitionstransitions

TTS - TTS - Waveform generation for TTSWaveform generation for TTS

Speech codingSpeech coding Parametric coders, Waveform coders, Hybrid codersParametric coders, Waveform coders, Hybrid coders

TTS – Concatenative approachTTS – Concatenative approach Time scale and Frequency scale modificationsTime scale and Frequency scale modifications Spectral smoothingsSpectral smoothings Unit selectionUnit selection

Original Resynthesized Modified : sinsin((x+x+))

Original TTS

ASR - ASR - Automatic Speech RecognitionAutomatic Speech Recognition

Front-End Signal ProcessingFront-End Signal Processing Feature extractionFeature extraction

Perceptual domain, Articulatory domainPerceptual domain, Articulatory domain Acoustic modelingAcoustic modeling

HMM : Hidden Markov ModelHMM : Hidden Markov Model ANN/HMM : Hybrid models - Artificial ANN/HMM : Hybrid models - Artificial

Neural Network and HMMNeural Network and HMM Statistical Language ModelingStatistical Language Modeling

N-grammars, smoothing techniquesN-grammars, smoothing techniques Search : DecodingSearch : Decoding

Viterbi, Stack decoding, ...Viterbi, Stack decoding, ...

ASR – HMM - ASR – HMM - TopologyTopology

Ergotic modelErgotic model Left-right model Left-right model

ASR – HMM – ASR – HMM – Basic Basic principleprinciple

X X X X X X X X X X X X X X X X X1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

u ã ã p

u u u i i ã ã ã p

u n i k k k k k ã ã ã p p

u i kn n n n n

u u u n i i i i ã ã ã ã pk k k k

p(x | )

p(x | )p

p(x | )k

p(x | )n

p(x | )ã

p(x | )i

u

x

a a a a a

a a a a

aa a

a

ASR – HMM - ASR – HMM - Viterbi alignmentViterbi alignment

50 100 150 200 50 100 150 200

(a) (b)

50 100 150 200 50 100 150 200

(c) (d)

ASR – HMM – ASR – HMM – Forward-BackwardForward-Backward

/ # // # / / ã /

ASR – ANN/HMMASR – ANN/HMM

X n-c X n X n+d

P( q |x , k n

Evaluation : Evaluation : Exercises and Exercises and SimulationsSimulations

List of ExercisesList of Exercises SDSP, TTS, ASRSDSP, TTS, ASR

SimulationsSimulations SDSPSDSP

Vector quantizationVector quantization TTSTTS

Waveform InterpolationWaveform Interpolation ASRASR

Acoustic modeling using : HMM and ANN+HMMAcoustic modeling using : HMM and ANN+HMM Language modelingLanguage modeling DecodingDecoding

Evaluation : Evaluation : ReportReport

ReportsReports Write the analysis and results of the simulation in a format Write the analysis and results of the simulation in a format

of a paperof a paper 4 pages, two colunms. Sections

Abstract Introduction Brief theoretical description of the method Methodology used to perform the experiment Results Conclusions and suggestions for further works Bibliograph

Days of classes

Normal semester

2001October : 18, 25, (01 is a hollyday)November : 8, 15, 22, 29December : 6,13,20

2002January : 10,17,24,31 February : 7,14

Total : 15 days.

Option two

2001October : 18, 25November : 8, 15, 22, 29

2002February : 7,14March : An one week block seminar : 1.5 hours a day.

Total : 13 days.

Option one

2001October : 16,18,23,25,30November : 6,8,13,15,20,22,27,29

2002February : 5,7,12,14

Total : 17 days.

Option three

2002March : An one week block seminar : 3 hours a day.

Equivalent to 15 days

Speech Signal Processing I By Edmilson Morais And Prof. Greg. Dogil Stuttgart, October 18, 2001.

Documents

Transcript of Speech Signal Processing I By Edmilson Morais And Prof. Greg. Dogil Stuttgart, October 18, 2001.