Introduction to Speech Signal Processing

Dr. Zhang SenDr. Zhang Sen

zhangsen@gscas.ac.cn

Chinese Academy of SciencesBeijing, China

23/04/24

Report D

ocument

•Introduction–Sampling and quantization–Speech coding

•Features and Analysis–Main features–Some transformations

•Text-to-Speech–State of the art–Main approaches

•Speech-to-Text –State of the art–Main approaches

•Applications–Human-machine dialogue systems

Report D

ocument

• View speech signal in math.– Can be described by continuous function, but– Hard to find explicit analytical form– Non-linear – Non-stationary, time-varying– Some parts like noise– Some parts like pseudo-periodic signal

• View speech signal in physics– Wave generated by vibration– Transmitted in air/media

Report D

ocument

• Analysis approaches– Divide-and-conquer– Approximation and simplicity– Transformation (TD-FD)

• Analysis purpose– To find speech features– Which are important, which are trivial– Correlation between features– How features change?– How to to change original signal

Report D

ocument

• Features can be classified as– Time-domain features– Frequency-domain features

• Or– Short-term features– Long-term features

• Feature representation– Numerical: Vector or Distribution– Diagram: curve or image

Report D

ocument

• Windowing (frame)– In short-term, non-stationary->stationary– and Non-linear->linear (10ms-25ms)

Report D

ocument

• Window types

Report D

ocument

• Window shapes

Report D

ocument

• A few words on Window function

Report D

ocument

• Commonly used speech features– Zero-crossing-rate (ZCR)– Peaks– Power and energy– Correlation, auto-correlation, AMDF– Formant– Pitch– Frequency spectrum– Cepstrum and MFCC– Linear Predictive Coefficients (LPC), LPCC

Report D

ocument

• ZCR

Report D

ocument

• Level-crossing-rate

Report D

ocument

• Peaks

Report D

ocument

• Power and energy

Report D

ocument

• Correlation, auto-correlation, AMDF– To measure the similarity of two signals or to detect the

periodicity of a signal– Sum x(k+i)*x(k+m+i) in a range, where k is the

reference point and m is the lags

Report D

ocument

• Center-clipping technique

Report D

ocument

• Auto-correlation peaks

Report D

ocument

• Auto-correlation show

Report D

ocument

• Formant– LPC->FFT

Report D

ocument

• Formant displays

Report D

ocument

• Some typical formant values

Report D

ocument

• Pitch, fundamental frequency– Referred to as F0, determine tone and prosody– Pitch estimation methods

• Auto-correlation and AMDF• Cepstrum• LPC• Peak detection

– Pitch smoothing methods• Dynamic programming• N-point smoothing filter• HMM

Report D

ocument

• Pitch show– The pitch of a3 by auto-correlation method

Report D

ocument

• Spectrogram– Representation of a signal highlighting several

of its properties based on short-time Fourier analysis

– Two dimensional: time horizontal and frequency vertical

– Third ‘dimension’: gray or color level indicating energy

Report D

ocument

Report D

ocument

• Spectrum of a frame (vowel)

Report D

ocument

• Spectrum of a frame (consonant)

Report D

ocument

• Cepstrum analysis

Report D

ocument

• Cepstrum and MFCC computation

DFT IDFTlog|DFT|s(n)

Filter-bankDCTMFCCcepstrum

Report D

ocument

• Filter-bank

Report D

ocument

• Perceptual measures

Report D

ocument

• Linear predictive analysis

Report D

ocument

• Prediction errors

Report D

ocument

• LP coefficients to cepstral coefficients – The computation of LPCC– LPCC is often used in ASR as feature vector

Report D

ocument

• Some transformations in SSP– DFT, FFT, DCT and their inverses

• Frequency analysis• TD-FD conversion

– Z transformation• LPC analysis• Filter design

– Wavelet transformation• Frequency analysis• Compression

Report D

ocument

• Fourier Transform

Report D

ocument

• Discrete Fourier Transform

The computation load of DFT is O(N2), the Fast Discrete Fourier Transform reduced it to O(NlogN) b

ased on divide-and-conquer principle

Report D

ocument

• Basic Phonetic knowledge– Consonant/unvoiced– Vowel/voiced– Co-articulation– Phone and phoneme– Uni-, bi-, tri-phone– Canonical form, surface form, reduced form– Tone and prosody

Report D

ocument

Report D

ocument

Report D

ocument

• Co-articulation– Very common in English, it causes many diffic

ulties in ASR– In Mandarin, not very serious– The use of bi-phones and tri-phones intend to c

ope with this issue.– Some examples:

• Mandarin: A yi, yi yi, wu yun, …• English: this issue, in a box, …

Report D

ocument

• Some research topics– Speech signal detection, endpoint detection– Consonant/vowel separation– Pitch estimation– Echo cancellation– De-noise and filter design– Multi-signal separation– Robust features– Perceptual features– Re-sampling and re-construction– etc

Report D

ocument

References• Speech & Language Processing

– Jurafsky & Martin -Prentice Hall - 2000• Spoken Language Processing

– X.. D. Huang, al et, Prentice Hall, Inc., 2000• Statistical Methods for Speech Recognition

– Jelinek - MIT Press - 1999• Foundations of Statistical Natural Language Processing

– Manning & Schutze - MIT Press - 1999• Fundamentals of Speech Recognition

– L. R. Rabiner and B. H. Juang, Prentice-Hall, 1993• Dr. J. Picone - Speech Website

– www.isip.msstate.edu

Report D

ocument

• Mode– A final 4-page report or– A 30-min presentation

• Content– Review of speech processing– Speech features and processing approaches– Review of TTS or ASR– Audio in computer engineering

Report D

ocument

TTHHAANNKKSS

Introduction to Speech Signal Processing

Documents

Transcript of Introduction to Speech Signal Processing

SPSC – Signal Processing & Speech Communication Lab · 2007-06-22 · SPSC – Signal Processing & Speech Communication Lab Professor Horst Cerjak, 19.12.2005 3 Georg Holzmann,

Tempelaars Stan Ed Signal Processing Speech and Music

Speech Signal Processing - Phil Garner · Speech Signal Processing Milos Cernak Introduction Speech synthesis signal processing Analysis Speech parameter generation Re-synthesis Synthesis

Signal Processing for Robust Speech Recognition Motivated by ...

Biometric speech signal processing in a system with ...yadda.icm.edu.pl/yadda/element/bwmeta1.element... · Biometric speech signal processing in a system ... Gaussian mixture models,

Di it l Si l P iDigital Signal Processing for Speech ...berlin.csie.ntnu.edu.tw/Courses/Speech Recognition/Lectures2009... · Di it l Si l P iDigital Signal Processing for Speech

14ec3029 Speech and Audio Signal Processing

Digital Signal Processing on Speech Recognition

NONLINEAR COCHLEAR SIGNAL PROCESSING AND … Handbook on Speech Processing and Speech Communication 1 NONLINEAR COCHLEAR SIGNAL PROCESSING AND MASKING IN SPEECH PERCEPTION Jont B.

Introduction to Digital Speech Processing speech processing...the use of digital signal processing in speech communication problems ... recognition, understanding, verification, language

CS578- Speech Signal Processing

EE627 - Speech Signal Processing Lecture 11/12 : Cepstral ... · EE627 - Speech Signal Processing Lecture 11/12 : Cepstral Analysis echniquT es for Speech Recognition R. Hegde Dept.

Audio/Speech Signal Processing An Overview - IIT Kanpurhome.iitk.ac.in/~nnaik/pdf/PPT_AudioSpeech.pdf · Signal Processing Tasks •Audio/Speech Encoding/Decoding - Codecs ( DFT –Spectral

Presentation - Signal Processing and Speech Communication

Digital signal processing through speech, hearing, and Python

multirate signal processing for speech

Speech Processing A Discrete-Time Signal Processing Framework.

Computer Vision, Speech Communication & Signal Processing ...

An introduction to signal processing for speech - CiteSeerX

Speech Signal Processing 1