Speech Signal Processing I By Edmilson Morais And Prof. Greg. Dogil Second Lecture Stuttgart,...

Speech Signal Speech Signal Processing IProcessing I

By By

Edmilson Morais And Prof. Greg. DogilEdmilson Morais And Prof. Greg. Dogil

Second LectureSecond Lecture

Stuttgart, October 25, 2001Stuttgart, October 25, 2001

The Speech SignalThe Speech Signal

No-stacionary signal No-stacionary signal Voiced – Voiced – almostalmost periodic (Concept of periodic (Concept of

pitchpitch)) Unvoiced (aleatory)Unvoiced (aleatory) Transitions (Bursts, ...)Transitions (Bursts, ...)

Range of the Range of the PitchPitch Male : Male : Female : Female :

Sampling TheorySampling Theory

Low-pass filter

Sample Hold onLow-pass filter

X(n) has to be limited in band

The sampling frequency has to be higher or equal to 2 times the maximum frequency in x(n)

Linear FiltersLinear Filters

Finite impulse response filters

Matlab :Matlab : Graphical visualization – Graphical visualization – Optimization in a hiperbolic (quadratic) Optimization in a hiperbolic (quadratic) surfacesurface

Mean squarederror - E

Weight

minE

w

E

w

Ew

0 0w )(nw

)1( nw -200 -150 -100 -50 0 50 100 150 200-200

-150

-100

-50

0

50

100

150

200

-200 -150 -100 -50 0 50 100 150 200-200

-150

-100

-50

0

50

100

150

200

-4-2

02

4

-4

-2

0

2

40

20

40

60

80

H(1)H(2)

Err

o q

uadrá

tico

SDSP : SDSP : Looking through timeLooking through time

time

amplitude

Speech signal : Analog and digitalSpeech signal : Analog and digital

Sampling rate

quantization

SDSP SDSP : Transformation and Digital filters: Transformation and Digital filters

Transformations Z-Transforms, Fourier transforms

Digital filters FIR, IIR

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-500

-400

-300

-200

-100

0

100

Normalized Frequency ( rad/sample)

Phase (degrees)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-80

-60

-40

-20

0

Normalized Frequency ( rad/sample)

Magnitude (dB

)

SDSP –SDSP – Frame based analysis Frame based analysis

Hanning window : w

Waveform multiplied for the hanning window : xw

Magnitude of the spectrum of xw

Freq. Response of the LP-filter

SDSP - SDSP - Looking at frequency components Looking at frequency components through timethrough time

Current

Previous

Current

Previous

Before smoothing

After smoothing

SDSP : SDSP : Vector quantizationVector quantization

Voronoi Space : Centroid and Distortion meassure

TTS - TTS - Waveform generation for TTSWaveform generation for TTS

Analysis and Resynthesis – Coding and DecodingAnalysis and Resynthesis – Coding and Decoding

LP AnalysisA(z)

Inverse Filter1

A(z )Pitch Marks

PrototypesSampling

Synthesis Filter

A(z )

TFI ResidueSynthesis

x

e

En

Storage Enviroment

x

A

A

A Fo

Original Speech Signal

Synthesized Speech Signal

Coding

Decoding

ProsodicInformation

Marks

Marks

Fo

En

U/UV

U/UV

.

.

Parametrization : Parametrization : Mapping the Mapping the waveform into a set waveform into a set of parametersof parameters

Reconstruction:Reconstruction: Synthesis of the Synthesis of the waveform from the set waveform from the set of parameters.of parameters.

Prosody :Prosody :

F0F0

DurationDuration

AmplitudeAmplitude

AA – LP coeficients – LP coeficients

ee – LP residue – LP residue

EnEn – Prototypes – Prototypes

FoFo – Fundamental – Fundamental frequencyfrequency

U/UVU/UV – Voiced / – Voiced / Unvoiced Unvoiced transitionstransitions

TTS - TTS - Waveform generation for TTSWaveform generation for TTS

Speech codingSpeech coding Parametric coders, Waveform coders, Hybrid codersParametric coders, Waveform coders, Hybrid coders

TTS – Concatenative approachTTS – Concatenative approach Time scale and Frequency scale modificationsTime scale and Frequency scale modifications Spectral smoothingsSpectral smoothings Unit selectionUnit selection

Original Resynthesized Modified : sinsin((x+x+))

Original TTS

ASR - ASR - Automatic Speech RecognitionAutomatic Speech Recognition

Front-End Signal ProcessingFront-End Signal Processing Feature extractionFeature extraction

Perceptual domain, Articulatory domainPerceptual domain, Articulatory domain Acoustic modelingAcoustic modeling

HMM : Hidden Markov ModelHMM : Hidden Markov Model ANN/HMM : Hybrid models - Artificial ANN/HMM : Hybrid models - Artificial

Neural Network and HMMNeural Network and HMM Statistical Language ModelingStatistical Language Modeling

N-grammars, smoothing techniquesN-grammars, smoothing techniques Search : DecodingSearch : Decoding

Viterbi, Stack decoding, ...Viterbi, Stack decoding, ...

ASR – HMM - ASR – HMM - TopologyTopology

Ergotic modelErgotic model Left-right model Left-right model

ASR – HMM – ASR – HMM – Basic Basic principleprinciple

X X X X X X X X X X X X X X X X X1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

u ã ã p

u u u i i ã ã ã p

u n i k k k k k ã ã ã p p

u i kn n n n n

u u u n i i i i ã ã ã ã pk k k k

p(x | )

p(x | )p

p(x | )k

p(x | )n

p(x | )ã

p(x | )i

u

x

a a a a a

a a a a

aa a

a

ASR – HMM - ASR – HMM - Viterbi alignmentViterbi alignment

50 100 150 200 50 100 150 200

(a) (b)

50 100 150 200 50 100 150 200

(c) (d)

ASR – HMM – ASR – HMM – Forward-BackwardForward-Backward

/ # // # / / ã /

ASR – ANN/HMMASR – ANN/HMM

X n-c X n X n+d

P( q |x , k n

Evaluation : Evaluation : Exercises and Exercises and SimulationsSimulations

List of ExercisesList of Exercises SDSP, TTS, ASRSDSP, TTS, ASR

SimulationsSimulations SDSPSDSP

Vector quantizationVector quantization TTSTTS

Waveform InterpolationWaveform Interpolation ASRASR

Acoustic modeling using : HMM and ANN+HMMAcoustic modeling using : HMM and ANN+HMM Language modelingLanguage modeling DecodingDecoding

Evaluation : Evaluation : ReportReport

ReportsReports Write the analysis and results of the simulation in a format Write the analysis and results of the simulation in a format

of a paperof a paper 4 pages, two colunms. Sections

Abstract Introduction Brief theoretical description of the method Methodology used to perform the experiment Results Conclusions and suggestions for further works Bibliograph

Days of classes

Normal semester

2001October : 18, 25, (01 is a hollyday)November : 8, 15, 22, 29December : 6,13,20

2002January : 10,17,24,31 February : 7,14

Total : 15 days.

Option two

2001October : 18, 25November : 8, 15, 22, 29

2002February : 7,14March : An one week block seminar : 1.5 hours a day.

Total : 13 days.

Option one

2001October : 16,18,23,25,30November : 6,8,13,15,20,22,27,29

2002February : 5,7,12,14

Total : 17 days.

Option three

2002March : An one week block seminar : 3 hours a day.

Equivalent to 15 days

Speech Signal Processing I By Edmilson Morais And Prof. Greg. Dogil Second Lecture Stuttgart,...

Documents

Transcript of Speech Signal Processing I By Edmilson Morais And Prof. Greg. Dogil Second Lecture Stuttgart,...