Introduction to Speech Signal Processing

Introduction to Speech Signal Processing

Dr. Zhang SenDr. Zhang Sen

[email protected]

Chinese Academy of SciencesBeijing, China

23/04/24

Report D

ocument

2

•Introduction–Sampling and quantization–Speech coding

•Features and Analysis–Main features–Some transformations

•Text-to-Speech–State of the art–Main approaches

•Speech-to-Text –State of the art–Main approaches

•Applications–Human-machine dialogue systems

Report D

ocument

3

• View speech signal in math.– Can be described by continuous function, but– Hard to find explicit analytical form– Non-linear – Non-stationary, time-varying– Some parts like noise– Some parts like pseudo-periodic signal

• View speech signal in physics– Wave generated by vibration– Transmitted in air/media

Report D

ocument

4

• Analysis approaches– Divide-and-conquer– Approximation and simplicity– Transformation (TD-FD)

• Analysis purpose– To find speech features– Which are important, which are trivial– Correlation between features– How features change?– How to to change original signal

Report D

ocument

5

• Features can be classified as– Time-domain features– Frequency-domain features

• Or– Short-term features– Long-term features

• Feature representation– Numerical: Vector or Distribution– Diagram: curve or image

Report D

ocument

6

• Windowing (frame)– In short-term, non-stationary->stationary– and Non-linear->linear (10ms-25ms)

Report D

ocument

7

• Window types

Report D

ocument

8

• Window shapes

Report D

ocument

9

• A few words on Window function

Report D

ocument

10

• Commonly used speech features– Zero-crossing-rate (ZCR)– Peaks– Power and energy– Correlation, auto-correlation, AMDF– Formant– Pitch– Frequency spectrum– Cepstrum and MFCC– Linear Predictive Coefficients (LPC), LPCC

Report D

ocument

11

• ZCR

Report D

ocument

12

• Level-crossing-rate

Report D

ocument

13

• Peaks

Report D

ocument

14

• Power and energy

Report D

ocument

15

• Correlation, auto-correlation, AMDF– To measure the similarity of two signals or to detect the

periodicity of a signal– Sum x(k+i)*x(k+m+i) in a range, where k is the

reference point and m is the lags

Report D

ocument

16

• Center-clipping technique

Report D

ocument

17

• Auto-correlation peaks

Report D

ocument

18

• Auto-correlation show

Report D

ocument

19

• Formant– LPC->FFT

Report D

ocument

20

• Formant displays

Report D

ocument

21

• Some typical formant values

Report D

ocument

22

• Pitch, fundamental frequency– Referred to as F0, determine tone and prosody– Pitch estimation methods

• Auto-correlation and AMDF• Cepstrum• LPC• Peak detection

– Pitch smoothing methods• Dynamic programming• N-point smoothing filter• HMM

Report D

ocument

23

• Pitch show– The pitch of a3 by auto-correlation method

Report D

ocument

24

• Spectrogram– Representation of a signal highlighting several

of its properties based on short-time Fourier analysis

– Two dimensional: time horizontal and frequency vertical

– Third ‘dimension’: gray or color level indicating energy

Report D

ocument

25

Report D

ocument

26

• Spectrum of a frame (vowel)

Report D

ocument

27

• Spectrum of a frame (consonant)

Report D

ocument

28

• Cepstrum analysis

Report D

ocument

29

• Cepstrum and MFCC computation

DFT IDFTlog|DFT|s(n)

Filter-bankDCTMFCCcepstrum

Report D

ocument

30

• Filter-bank

Report D

ocument

31

• Perceptual measures

Report D

ocument

32

• Linear predictive analysis

Report D

ocument

33

• Prediction errors

Report D

ocument

34

• LP coefficients to cepstral coefficients – The computation of LPCC– LPCC is often used in ASR as feature vector

Report D

ocument

35

• Some transformations in SSP– DFT, FFT, DCT and their inverses

• Frequency analysis• TD-FD conversion

– Z transformation• LPC analysis• Filter design

– Wavelet transformation• Frequency analysis• Compression

Report D

ocument

36

• Fourier Transform

Report D

ocument

37

• Discrete Fourier Transform

The computation load of DFT is O(N2), the Fast Discrete Fourier Transform reduced it to O(NlogN) b

ased on divide-and-conquer principle

Report D

ocument

38

• Basic Phonetic knowledge– Consonant/unvoiced– Vowel/voiced– Co-articulation– Phone and phoneme– Uni-, bi-, tri-phone– Canonical form, surface form, reduced form– Tone and prosody

Report D

ocument

39

Report D

ocument

40

Report D

ocument

41

• Co-articulation– Very common in English, it causes many diffic

ulties in ASR– In Mandarin, not very serious– The use of bi-phones and tri-phones intend to c

ope with this issue.– Some examples:

• Mandarin: A yi, yi yi, wu yun, …• English: this issue, in a box, …

Report D

ocument

42

• Some research topics– Speech signal detection, endpoint detection– Consonant/vowel separation– Pitch estimation– Echo cancellation– De-noise and filter design– Multi-signal separation– Robust features– Perceptual features– Re-sampling and re-construction– etc

Report D

ocument

43

References• Speech & Language Processing

– Jurafsky & Martin -Prentice Hall - 2000• Spoken Language Processing

– X.. D. Huang, al et, Prentice Hall, Inc., 2000• Statistical Methods for Speech Recognition

– Jelinek - MIT Press - 1999• Foundations of Statistical Natural Language Processing

– Manning & Schutze - MIT Press - 1999• Fundamentals of Speech Recognition

– L. R. Rabiner and B. H. Juang, Prentice-Hall, 1993• Dr. J. Picone - Speech Website

– www.isip.msstate.edu

Report D

ocument

44

Test

• Mode– A final 4-page report or– A 30-min presentation

• Content– Review of speech processing– Speech features and processing approaches– Review of TTS or ASR– Audio in computer engineering

Report D

ocument

45

TTHHAANNKKSS

Introduction to Speech Signal Processing

Documents

Transcript of Introduction to Speech Signal Processing