COLEA : A MATLAB Tool for Speech Analysis

1

Transcript of COLEA : A MATLAB Tool for Speech Analysis

Page 1: COLEA : A MATLAB Tool for Speech Analysis

A MATLAB software

tool for SPEECH

analysis

1

Page 2: COLEA : A MATLAB Tool for Speech Analysis

2

Page 3: COLEA : A MATLAB Tool for Speech Analysis

About COLEA

Installation Instruction

Getting started & Guided Tour

Buttons in the MAIN COLEA WINDOW

PULL-DOWN MENUS

REFERENCES

CONCLUSION

3

Page 4: COLEA : A MATLAB Tool for Speech Analysis

• COLEA was originally developed in MATLAB 5.x, and is

actually a subset of a COchLEA Implants

Toolbox.

• It does not exploit the new features of MATLAB 7.x.

4

Page 5: COLEA : A MATLAB Tool for Speech Analysis

5

System Requirement

₪ IBM compatible PC running Windows 95 (but we have windows 7/8 or XP)

₪ MATLAB ver. 5.x and MATLAB’s Signal Processing Toolbox (we used currently

7.10.x )

₪ Sound Card (any soundcard that runs in Windows, e.g., SoundBlaster)

₪ 700 Kbytes of disk space (we have free memory in Giga bytes)

Installation Steps

₪ Download from http://www.utdallas.edu/~loizou/speech/colea.html

₪ PC/Windows

After downloading the file ‘colea.zip’ to your PC, create a new directory/folder,

and unzip the file in that directory.

₪ Unix

After downloading the file ‘colea.tar’, type: tar xvf colea.tar to un-tar the file.

This will automatically create a new directory called ‘colea’.

Page 6: COLEA : A MATLAB Tool for Speech Analysis

6

After extract the files, you can see that COLEA can contains

several file formats by reading the extension of the file

.WAV : Microsoft Windows audio files

.WAV : NIST’s SPHERE format - new TIMIT format

.ILS

.ADF : CSRE software package format

.ADC : old TIMIT database format

.VOC : Creative Lab’s format

The file extension is very important because each file format

has different header information.

COLEA knows the file’s sampling frequency, the number of

samples, etc., by reading the header.

Page 7: COLEA : A MATLAB Tool for Speech Analysis

7

Now illustrating some of COLEA’s features.

Start the MATLAB.

Open the colea.m file

Run this file.

click on change folder (if ASK!!!)

Select the had.ils file.(from the COLEA extracted file

folder)

Click on the waveform.

Page 8: COLEA : A MATLAB Tool for Speech Analysis

8

Page 9: COLEA : A MATLAB Tool for Speech Analysis

9

This spectrum was obtained by performing a 12- pole

LPC analysis on the 10-msec speech segment

So, when you click anywhere on the waveform using the

left mouse button, the program takes a 10-msec window

of the speech segment immediately after the cursor line,

and performs LPC analysis.

You may change the size of the window, using the

Duration pull down option shown in the controls window

Page 10: COLEA : A MATLAB Tool for Speech Analysis

10

Linear predictive coding (LPC) is a tool used mostly in audio

signal processing and speech processing for representing the

Spectral envelop of a digital signal of Speech in compressed

form, using the information of a linear predictive model.

It is one of the most powerful speech analysis techniques, and

one of the most useful methods for encoding good quality

speech at a low bit rate and provides extremely accurate

estimates of speech parameters.

IDEA: The basic idea behind linear predictive analysis is that a

specific speech sample at the current time can be

approximated as a linear combination of past speech samples.

Page 11: COLEA : A MATLAB Tool for Speech Analysis

11

LPC order

FFT Spectrum

FFT size : you

have a choice on

the size of the FFT

Overlay : If you

want to see the

FFT spectrum

overlaid on top of

the LPC spectrum

Page 12: COLEA : A MATLAB Tool for Speech Analysis

12

Among other things, the controls window in Figure

2(CONTROLs) displays estimates of the formant

frequencies and formant amplitudes (in dB).

The formant frequencies are computed by peak-picking

the LPC spectrum. To get accurate estimates of the

formant frequencies, one needs to choose the LPC order

properly depending on the sampling frequency.

Increasing the LPC order to 18 will yield a better estimate

of the second and third formants

Page 13: COLEA : A MATLAB Tool for Speech Analysis

13

There are four pull-down menus in the LPC spectrum

window

Print |Save | Label | Options

Page 14: COLEA : A MATLAB Tool for Speech Analysis

14

The Label menu is used for adding text or legends on the

figure or deleting existing text in the figure.

Page 15: COLEA : A MATLAB Tool for Speech Analysis

15

Options menu : Set Frequency Range

This sub-menu is used for setting the frequency range.

Page 16: COLEA : A MATLAB Tool for Speech Analysis

16

Options menu : LPC analysis’

this sub-menu is for setting a few options in LPC analysis

as well as FFT analysis [using (or not using) a pre-

emphasis FIR filter]

Page 17: COLEA : A MATLAB Tool for Speech Analysis

17

Zoom in (Selected region) & Zoom Out

Play: All & Sel (Selected interval is play)

Page 18: COLEA : A MATLAB Tool for Speech Analysis

18

Page 19: COLEA : A MATLAB Tool for Speech Analysis

19

This tool is used for

comparing two waveforms

or two frames using either

time domain measures

(i.e., SNR) oror spectral domain measures (i.e., Itakura-Saito measure)

To use this tool, you need first to load two waveforms where the

top is the approximated waveform and the bottom is the original

waveform.

The user has the option of making an overall (or global)

comparison between the two waveforms or a segmental (local)

comparison between the two waveforms.

Page 20: COLEA : A MATLAB Tool for Speech Analysis

20

Overall : The two speech files are segmented in 10 msec

frames and the comparison is performed for each frame.

At Cursor : To compare two particular speech segments

of the two files.

The following distance measures are used :

SNR : Signal-to-noise ratio

CEP : Cepstrum

WCEP : Weighted cepstrum (by a ramp)

IS : Itakura-Saito

LR : Likelihood ratio

LLR : Log-likelihood ratio

WLR : Weighted likelihood ratio

WSM : Weighted slope distance metric (Klatt's)

Page 21: COLEA : A MATLAB Tool for Speech Analysis

21

This tool is used for

adjusting the volume.

There are three different modes:

Autoscale (default) : The signal is automatically scaled

to the maximum value allowed by the hardware. In this

mode, you can not use the slider bar.

No scale : In this mode the signal can be made louder

or softer by movin the slider bar.

Absolute : In this mode, the signal is played as is. No

scaling is done. Moving the slider bar has no effect.

Page 22: COLEA : A MATLAB Tool for Speech Analysis

22

Dual time-waveform and spectrogram displays

Records speech directly into MATLAB NEW

Displays time-aligned phonetic transcriptions

Manual segmentation of speech waveforms - creates labelfiles which can be used to train speech recognitionsystems

Waveform editing - cutting, copying or pasting speechsegments

Formant analysis - displays formant tracks of F1, F2 andF3

Pitch analysis

Filter tool - filters speech signal at cut-off frequenciesspecified by the user

Comparison tool - compares two waveforms using severalspectral distance measures

Speech degradation - adds noise to the speech signal atan SNR specified by the user

Page 23: COLEA : A MATLAB Tool for Speech Analysis

23

L. Rabiner and R. Shafer, Digital Processing of Speech Signals,

Englewood Cliffs: Prentice Hall, 1978.

A. Noll, “Cepstrum pitch determination,” J. Acoust. Soc. Am., vol. 41, pp.

293-309, February 1967.

J.D. Markel and A.H. Gray, Jr., Linear Prediction of Speech, Springer-

Verlag, Berlin, 1976.

A. H. Gray and J.D. Markel, “Distance measures for speech processing,

IEEE Trans. Acoustics, Speech, Signal Proc., ASSP-24(5), pp. 380-391,

October 1976.

L. Rabiner and B-H. Juang, Fundamentals of Speech Recognition,

Englewood Cliffs: Prentice Hall, 1993.

D. Klatt, “Prediction of perceived phonetic distance from critical band

spectra: A first step,” Proc. ICASSP, pp. 1278-1281, 1982.

Page 24: COLEA : A MATLAB Tool for Speech Analysis

24

By the use of COLEA tool, it is very easy to analyze /

compare the speech signals in TIME as well as

Frequency domain and extract the accurate SPEECH

parameters.

Page 25: COLEA : A MATLAB Tool for Speech Analysis

25

Page 26: COLEA : A MATLAB Tool for Speech Analysis

26

Page 27: COLEA : A MATLAB Tool for Speech Analysis

• Pre-emphasis Filtering• A pre-emphasis filter compresses the dynamic range of the

speech signal’s power spectrum by flattening the spectral tilt.

• Power Spectral Density• This option displays an estimate of the power spectral density

(long-time average FFT spectrum) obtained using Welch’smethod.

• Energy plot• This option is used for displaying the energy contour computed

every 20-msec intervals, and expressed in dB.

• Convert to SCN noise• This option converts the speech signal to Signal Correlated Noise

(SCN) using a method proposed by Schroeder. This methodpreserves the shape of the time waveform, but destroys thespectral content of the signal.

27

Page 28: COLEA : A MATLAB Tool for Speech Analysis

28

Weighted Likelihood Ratio (WLR) was first proposed in

1984 by Sugiyama [2] as a distortion measure when

comparing two given speech spectra. More emphasis has

been put to the peak part of the spectrum during the

measuring. It is not only consistent with human

perception, but also accordance with the fact the peak

(formant) plays a more important role during the

recognition. Especially it should be noted that peak part is

much less polluted by noises. It is successfully used for

vowel classification and isolated word recognition based

on DP.

Page 29: COLEA : A MATLAB Tool for Speech Analysis

29

• The Itakura–Saito distance is a measure of the

perceptual difference between an original spectrum and

an approximation of that spectrum. It was proposed

by Fumitada Itakuraand Shuzo Saito in the 1970s while

they were with NTT.

• The distance is defined as:[1]

• The Itakura–Saito distance is a Bregman divergence, but

is not a true metric since it is not symmetric.[2]

Page 30: COLEA : A MATLAB Tool for Speech Analysis

30

• The Itakura–Saito distance

• Traditional speech information hiding methods have several

disadvantages, for example, constant embedding amplitude,

lower speech quality, higher bit error rate. A novel speech

information hiding method based on Itakura-Saito measure and

psychoacoustic model is proposed. The embedding amplitude

can be controlled by Itakura-Saito measure and psychoacoustic

model together. The host speech is decomposed by wavelet

packet transformation and then mapped into the critical bands.

According to the audio masking threshold, the embedding

amplitude in each subband can be determined. And then, the

adjustment factors can be calculated by Itakura-Saito measure

to control the embedding amplitude in each frame so that the

speech quality is good. The embedding amplitude can be

determined automatically. Experimental results show that the

performance of this method is better than that of the traditional

methods.

Page 31: COLEA : A MATLAB Tool for Speech Analysis

31

• WSM - Weighted slope distance metric (Klatt's) [6]. Its

measure gives highest recognition accuracy

• The overall distortion is obtained by averaging the spectral

distortion over all frames in an utterance.

• A cepstrum is the result of taking the Fourier

transform (FT) of the logarithm of the

estimated spectrum of a signal. There is

a complex cepstrum, a real cepstrum, a power cepstrum,

and phase cepstrum. The power cepstrum in particular

finds applications in the analysis of human speech.

Page 32: COLEA : A MATLAB Tool for Speech Analysis

32

• A weighted cepstral distance measure is proposed and is

tested in a speaker-independent isolated word recognition

system using standard DTW (dynamic time warping)

techniques. The measure is a statistically weighted

distance measure with weights equal to the inverse

variance of the cepstral coefficients.

• The most significant performance characteristic of the

weighted cepstral distance was that it tended to equalize

the performance of the recognizer across different talkers.

Page 33: COLEA : A MATLAB Tool for Speech Analysis

33

Through minimizing the sum of squared differences (over

a finite interval) between the actual speech samples and

linear predicted values a unique set of parameters or

predictor coefficients can be determined. These

coefficients form the basis for linear predictive analysis of

speech.

In reality the actual predictor coefficients are never used

in recognition, since they typical show high variance. The

predictor coefficient are transformed to a more robust set

of parameters known as spectral coefficients.