Signal Processing For Speech Applications - Part 1csl.anthropomatik.kit.edu/downloads/... ·...

54
Signal Processing for Speech Applications - 1 May 7, 2013 Signal Processing For Speech Applications - Part 1

Transcript of Signal Processing For Speech Applications - Part 1csl.anthropomatik.kit.edu/downloads/... ·...

Page 1: Signal Processing For Speech Applications - Part 1csl.anthropomatik.kit.edu/downloads/... · –Continuous-time Fourier Transform (CTFT) • Features of the Fourier Transform –Discrete

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

1

May 7, 2013

Signal Processing

For Speech Applications

- Part 1

Page 2: Signal Processing For Speech Applications - Part 1csl.anthropomatik.kit.edu/downloads/... · –Continuous-time Fourier Transform (CTFT) • Features of the Fourier Transform –Discrete

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

2

References:

• Huang et al., Chapter on DSP

• Classical paper: Schafer/Rabiner in Waibel/Lee (on the

web)

• Nahin: "Dr. Euler's Fabulous Formula" – excellent

explanation of Fourier sums and the Fourier Transform,

written for Engineering students

Note: many slides of this lecture are from Rich Stern

Page 3: Signal Processing For Speech Applications - Part 1csl.anthropomatik.kit.edu/downloads/... · –Continuous-time Fourier Transform (CTFT) • Features of the Fourier Transform –Discrete

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

3

Signal Processing For Speech Applications

• Major speech technologies:

– Speech coding

– Speech synthesis

– Speech recognition

Page 4: Signal Processing For Speech Applications - Part 1csl.anthropomatik.kit.edu/downloads/... · –Continuous-time Fourier Transform (CTFT) • Features of the Fourier Transform –Discrete

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

4

Goals Of Speech Representations

• Capture important phonetic information in speech

• Computational efficiency

• Efficiency in storage requirements

• Optimize generalization

Page 5: Signal Processing For Speech Applications - Part 1csl.anthropomatik.kit.edu/downloads/... · –Continuous-time Fourier Transform (CTFT) • Features of the Fourier Transform –Discrete

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

5

5

Representation of Speech

Definition: Digital representation of speech

Represent speech as a sequences of numbers

(as a prerequisite for automatic processing using computers)

1) Direct representation of speech waveform:

represent speech waveform as accurate as possible so that an

acoustic signal can be reconstructed

2) Parametric representation

Represent a set of properties/parameters with regard to a certain model

Decide the targeted application first:

• Speech coding

• Speech synthesis

• Speech recognition

Classical paper: Schafer/Rabiner in Waibel/Lee (paper online)

Page 6: Signal Processing For Speech Applications - Part 1csl.anthropomatik.kit.edu/downloads/... · –Continuous-time Fourier Transform (CTFT) • Features of the Fourier Transform –Discrete

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

6

Speech Coding

Objectives of Speech Coding:

– Quality versus bit rate

– Quantization Noise

– High measured intelligibility

– Low bit rate (b/s of speech)

– Low computational requirement

– Robustness to transmission errors

– Robustness to successive

encode/decode cycles

Objectives for real-time:

– Low coding/decoding delay

– Work with non-speech

signals (e.g. touch tone)

Page 7: Signal Processing For Speech Applications - Part 1csl.anthropomatik.kit.edu/downloads/... · –Continuous-time Fourier Transform (CTFT) • Features of the Fourier Transform –Discrete

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

7

Speech Synthesis

The standard approach:

Speech Synthesis = Text-to-Speech conversion (don't confuse

that with Speech Recognition)

Page 8: Signal Processing For Speech Applications - Part 1csl.anthropomatik.kit.edu/downloads/... · –Continuous-time Fourier Transform (CTFT) • Features of the Fourier Transform –Discrete

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

8

Speech Recognition

• Major functional components:

– Signal processing to extract relevant features from speech

waveforms

– Comparison of features to pre-stored templates

• Important design choices:

– Choice of features (Optimize Generalization)

– Specific method of comparing features to stored templates

Feature

extraction

Decision

making

procedure A/D

Hypotheses

(phonemes)

Speech

features Speech

Page 9: Signal Processing For Speech Applications - Part 1csl.anthropomatik.kit.edu/downloads/... · –Continuous-time Fourier Transform (CTFT) • Features of the Fourier Transform –Discrete

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

9

Goals Of This Lecture

How do we accomplish feature extraction for speech recognition?

Some specific topics:

– Quantization (A/D Conversion)

– Sampling

– Filter Bank Coefficients

– Linear predictive coding (LPC)

– LPC-derived cepstral coefficients (LPCC)

– Mel-frequency cepstral coefficients (MFCC)

Some of the underlying mathematics

– Continuous-time Fourier transform (CTFT)

– Discrete-time Fourier transform (DTFT)

– Z-transforms

Page 10: Signal Processing For Speech Applications - Part 1csl.anthropomatik.kit.edu/downloads/... · –Continuous-time Fourier Transform (CTFT) • Features of the Fourier Transform –Discrete

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

10

Overview (I)

• Introduction

– A note on oscillations

• Frequeny transformation

– Why Perform Signal Processing?

• Different Kinds of Signals

– Quantization

• Quantization of Signals

• Quantization of Speech Signals

– Sampling of continuous-time signals

• How Frequently Should we Sample? – Aliasing

• The Sampling Theorem

• Acoustic Features in the Sampled Signal

Page 11: Signal Processing For Speech Applications - Part 1csl.anthropomatik.kit.edu/downloads/... · –Continuous-time Fourier Transform (CTFT) • Features of the Fourier Transform –Discrete

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

11

Overview (II)

• Frequency Representations in Continuous and Discrete Time (I)

– Why Perform Signal Processing In The Frequency Domain?

– The Source-Filter Model For Speech

– Frequency Representation

– Fourier Sums

– Continuous-time Fourier Transform (CTFT)

• Features of the Fourier Transform

– Discrete Time Fourier Transform (DTFT)

– Convolution

• Convolution of a Function with an Impulse Train

Page 12: Signal Processing For Speech Applications - Part 1csl.anthropomatik.kit.edu/downloads/... · –Continuous-time Fourier Transform (CTFT) • Features of the Fourier Transform –Discrete

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

12

Overview (III)

• Frequency Representations in Continuous and Discrete Time (II)

– The Fourier Transform – Example

– Examples of CTFTs and DTFTs

– Correspondence Between Frequency In Discrete And Continuous Time

– Short-Term Spectral Analysis

– Short-time Fourier Analysis

– Windowing

• Two Popular Window Shapes

• The Effect of Applying Windows

• Breaking Up Incoming Speech Into Frames Using Hamming Windows

• Time and Frequency Response of Windows

• Speech Visualisation with Spectrograms

• Effect of Window Duration

• Using Filterbanks

– The Mel Scale

– The Bark Scale

Page 13: Signal Processing For Speech Applications - Part 1csl.anthropomatik.kit.edu/downloads/... · –Continuous-time Fourier Transform (CTFT) • Features of the Fourier Transform –Discrete

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

13

• Introduction

– A note on oscillations

• Frequeny transformation

– Why Perform Signal Processing?

• Different Kinds of Signals

– Quantization

• Quantization of Signals

• Quantization of Speech Signals

– Sampling of continuous-time signals

• How Frequently Should we Sample? – Aliasing

• The Sampling Theorem

• Acoustic Features in the Sampled Signal

Overview

Page 14: Signal Processing For Speech Applications - Part 1csl.anthropomatik.kit.edu/downloads/... · –Continuous-time Fourier Transform (CTFT) • Features of the Fourier Transform –Discrete

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

14

A note on oscillations

• Very generally, frequency means "the number of

occurrences of a repeating event per unit time"

(Wikipedia (en), Frequency)

• Right-hand side: impulse trains with different

frequencies (frequency increases from top to

bottom)

• More specifically, if a signal contains a frequency, this

means that it contains a pure oscillation

• We need both meanings

Page 15: Signal Processing For Speech Applications - Part 1csl.anthropomatik.kit.edu/downloads/... · –Continuous-time Fourier Transform (CTFT) • Features of the Fourier Transform –Discrete

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

15

Frequency components

• Let's look at the following graphic

• Left: two sine waves. These are pure oscillations with only

one frequency.

• Right-hand sinde: visualization of the contained frequencies

(must be symmetric w.r.t. the vertical axis)

• Below: a sum of two oscillations -> we also have a sum of

the contained frequencies

Page 16: Signal Processing For Speech Applications - Part 1csl.anthropomatik.kit.edu/downloads/... · –Continuous-time Fourier Transform (CTFT) • Features of the Fourier Transform –Discrete

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

16

The complex oscillation

• We have learnt that the sine wave f(t) = sin(ωt) (and the

cosine, of course) are pure (real) oscillations.

• They can be combined to form a complex

oscillation f(t) = eiωt = cos(ωt) + i sin(ωt),

where i is the imaginary unit.

• Visualization: If t runs from 0 to 2π, f(t)

moves around the unit circle. The real

and imaginary parts are the real

oscillations

• We see that sin and cos are

"complementary": sin2(ωt) + cos2(ωt) = 1.

Page 17: Signal Processing For Speech Applications - Part 1csl.anthropomatik.kit.edu/downloads/... · –Continuous-time Fourier Transform (CTFT) • Features of the Fourier Transform –Discrete

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

17

• Let's assume a complex signal.

• Which frequencies does it contain?

• We measure the frequency components of a function f(t)

with the integral

• F(ω) is the Fourier Transform of f(t). Does everybody

recognize the complex oscillation?

dtetfF ti )()(

How to measure frequency components?

Page 18: Signal Processing For Speech Applications - Part 1csl.anthropomatik.kit.edu/downloads/... · –Continuous-time Fourier Transform (CTFT) • Features of the Fourier Transform –Discrete

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

18

• Let's look at one example

• Here's a complex signal which looks rather ugly: The

signal is heavily disturbed by some kind of "noise".

• We take the Fourier transform, visualize it, and notice

that the noise has very specific frequencies!

• We will get to know other

applications of frequencies and

the Fourier transformation.

Example

Page 19: Signal Processing For Speech Applications - Part 1csl.anthropomatik.kit.edu/downloads/... · –Continuous-time Fourier Transform (CTFT) • Features of the Fourier Transform –Discrete

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

19

A digital signal representation

• A look at the time-domain waveform of “six”:

How do we get this into a computer?

Page 20: Signal Processing For Speech Applications - Part 1csl.anthropomatik.kit.edu/downloads/... · –Continuous-time Fourier Transform (CTFT) • Features of the Fourier Transform –Discrete

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

20

Sampling & Quantization

Goal: Given a signal that is continuous in time and amplitude,

find a discrete representation.

Two steps are necessary: sampling and quantization.

• Quantization corresponds to a discretization of the y-axis

• Sampling corresponds to a discretization of the x-axis

Page 21: Signal Processing For Speech Applications - Part 1csl.anthropomatik.kit.edu/downloads/... · –Continuous-time Fourier Transform (CTFT) • Features of the Fourier Transform –Discrete

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

21

• Given a discrete signal f[i] to be quantized into q[i]

• Assume that f is between fmin and fmax

• Partition y-axis into a fixed number n of (equally sized) intervals

• Usually n=2b , in ASR typically b=16 n=65536 (16-bit quantization)

• q[i] can only have values that are centers of the intervals

• Quantization: assign q[i] the center of the interval in which lies f[i]

• Quantization makes errors, i.e. adds noise to the signal: f[i]=q[i]+e[i]

• The average quantization error e[i] is (fmax-fmin)/(2n)

• Define signal to noise ratio SNR[dB] = power(f[i]) / power(e[i])

Quantization of Signals

Page 22: Signal Processing For Speech Applications - Part 1csl.anthropomatik.kit.edu/downloads/... · –Continuous-time Fourier Transform (CTFT) • Features of the Fourier Transform –Discrete

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

22

• Choice of quantization depth:

Speech signals are usually in the range between 50 dB and 60 dB

The lower the SNR, the lower the speech recognition performance

To get a reasonable SNR, b should be at least 10 to 12

Each bit contributes to about 6db of SNR

Typically in ASR the samples are quantized with 16 bits

• Speech signals' amplitudes are not equally distributed

Speech amplitudes are exponentially distributed around their mean

So the information-theoretic optimum is not to partition the y-axis

into equal intervals

therefore, often used: µ-law encoding

f(µ)[n] = fmax · sgn(f[n]) · log(1+µ|f[n]|/fmax) / log(1+µ) , µ=100,...,500

µ-law encoding makes speech amplitudes equally distributed before

quantization

Quantization of Speech Signals

Page 23: Signal Processing For Speech Applications - Part 1csl.anthropomatik.kit.edu/downloads/... · –Continuous-time Fourier Transform (CTFT) • Features of the Fourier Transform –Discrete

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

23

Sampling Continuous-time Signals

Sampling converts the analog signal (in the time domain) into a digital

representation (also in the time domain!)

Time domain:

• x-axis = time

• y-axis = amplitude

Questions:

• Do we lose information during sampling?

• How can we prevent it?

0 20 40 60 80 100 120-0.5

0

0.5

1

1.5

Original speech signal

0 20 40 60 80 100 120-0.5

0

0.5

1

1.5

Sampled version of signal

Page 24: Signal Processing For Speech Applications - Part 1csl.anthropomatik.kit.edu/downloads/... · –Continuous-time Fourier Transform (CTFT) • Features of the Fourier Transform –Discrete

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

24

How Frequently Should we Sample? – Aliasing

• The number of samples per unit of time (usually seconds) taken from

a continuous signal to make a discrete signal is called sampling rate

• The unit for sampling rate is hertz (inverse seconds, 1/s)

• Example: Correct sampling (above) /

undersampling (below)

– After undersampling we can not

(uniquely) recover the original frequency!

– This effect is called aliasing.

– How can we prevent aliasing?

Page 25: Signal Processing For Speech Applications - Part 1csl.anthropomatik.kit.edu/downloads/... · –Continuous-time Fourier Transform (CTFT) • Features of the Fourier Transform –Discrete

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

26

• How can we prevent aliasing?

In practical applications, the signals are band-limited

Those band-limited signals do not contain any arbitrary frequency.

But there is a maximum frequency (let's call it fl).

The band limitation can be ensured already during the recording process by

an appropriate A/D converter (with analog frequency filters)

When the sampling rate is too low, the samples can contain "incorrect"

frequencies.

This frequence 2fl is also called Nyquist rate.

The maximal frequency fl, which can be returned correctly is called

Nyquist frequency.

The Aliasing Effect

Nyquist or sampling theorem:

When a fl-band-limited signal is sampled with a sampling rate of at

least 2fl then the signal can be exactly reproduced from the samples.

Page 26: Signal Processing For Speech Applications - Part 1csl.anthropomatik.kit.edu/downloads/... · –Continuous-time Fourier Transform (CTFT) • Features of the Fourier Transform –Discrete

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

27

Why Perform Signal Processing?

• Another look at the time-domain waveform of “six”

• Digitalization is OK: now we have a discrete representation f[n]

• It’s still hard to infer much from the time-domain waveform

• We need to extract relevant features

Page 27: Signal Processing For Speech Applications - Part 1csl.anthropomatik.kit.edu/downloads/... · –Continuous-time Fourier Transform (CTFT) • Features of the Fourier Transform –Discrete

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

28

• Amplitude (1) (not very useful) The envelope of the amplitude is correlated to the power

(= integral of the squared signal)

Power is useful for detecting speech vs. silence,

also syllables, phrase boundaries, prosody

• Peak to peak (2) (correlated to amplitude)

• Root mean square (RMS) amplitude (3): Square root of the mean over time

(proportional to average power)

• Zero crossing rate can help distinguish some

weak sounds from silence

• Voiced/unvoiced homogeneous periodic signal vs. white noise in fricatives

Acoustic Features in the Sampled Signal

Source: Wikipedia

Page 28: Signal Processing For Speech Applications - Part 1csl.anthropomatik.kit.edu/downloads/... · –Continuous-time Fourier Transform (CTFT) • Features of the Fourier Transform –Discrete

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

29

• By now we have considered the signal in the time domain,

i.e. we have shown the amplitude (signal strength) as a function of time

• Now we wish to consider the signal in the frequency domain,

– i.e. we want to know which frequency components occur in the signal

– "Frequency domain" means that we look at speech as a function of the frequency

• We know how to get a frequency domain representation: Fourier

transformation

• So this is our application of the Fourier transformation: We extract

frequency-based features

The Frequency Domain

Page 29: Signal Processing For Speech Applications - Part 1csl.anthropomatik.kit.edu/downloads/... · –Continuous-time Fourier Transform (CTFT) • Features of the Fourier Transform –Discrete

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

30

• Frequency Representations in Continuous and Discrete Time (I)

– Why Perform Signal Processing In The Frequency Domain?

– The Source-Filter Model For Speech

– Frequency Representation

– Fourier Sums

– Continuous-time Fourier Transform (CTFT)

• Features of the Fourier Transform

– Discrete Time Fourier Transform (DTFT)

– Convolution

• Convolution of a Function with an Impulse Train

Overview

Page 30: Signal Processing For Speech Applications - Part 1csl.anthropomatik.kit.edu/downloads/... · –Continuous-time Fourier Transform (CTFT) • Features of the Fourier Transform –Discrete

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

31

Why Perform Signal Processing In The Frequency Domain?

• Use of frequency analysis often simplifies signal

processing

• Use of frequency analysis often facilitates

understanding

• Human hearing is based on frequency analysis

Page 31: Signal Processing For Speech Applications - Part 1csl.anthropomatik.kit.edu/downloads/... · –Continuous-time Fourier Transform (CTFT) • Features of the Fourier Transform –Discrete

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

33

The Source-Filter Model For Speech

• Sounds are produced either by

- vibrating the vocal cords (voiced sounds) or

- random noise resulting from friction of the airflow (unvoiced sounds)

- voiced fricatives need a mixed excitation model

• Signal un is modulated by the vocal tract, whose "impulse response" is vn

• Resulting signal is modulated by the lips and nostrils' radiation response rn

• Eventually the resulting signal fn is emitted.

Page 32: Signal Processing For Speech Applications - Part 1csl.anthropomatik.kit.edu/downloads/... · –Continuous-time Fourier Transform (CTFT) • Features of the Fourier Transform –Discrete

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

34

• A useful model for representing the generation

of speech sounds:

• The behavior of this model is best described in the

frequency domain

• So now we have two reasons to consider speech in

frequency domain: speech generation and hearing

Pitch Pulse train

source

Noise source

Vocal tract model

Amplitude

p[n]

The Source-Filter Model For Speech

Page 33: Signal Processing For Speech Applications - Part 1csl.anthropomatik.kit.edu/downloads/... · –Continuous-time Fourier Transform (CTFT) • Features of the Fourier Transform –Discrete

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

38

• We reconsider the Fourier transform defined before:

• is the complex oscillation (with the

frequency ω).

• Thus the Fourier transformation is

actually complex!

Continuous-Time Fourier Transform

dtetftfFTF ti )())(()(

tite ti sincos

Page 34: Signal Processing For Speech Applications - Part 1csl.anthropomatik.kit.edu/downloads/... · –Continuous-time Fourier Transform (CTFT) • Features of the Fourier Transform –Discrete

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

39

We can express the Fourier transform F of a function f in polar coordinates:

Then

is called the spectrum,

and

is the amplitude spectrum

and

is the phase spectrum

and

is the power spectrum

•Often, when we say "spectrum" we actually mean "power spectrum".

• It is general consensus that the phase spectrum in not important for

speech recognition.

Continuous-Time Fourier Transform (CTFT)

))sin()(cos(|)(||)(|)( titFeFF ti

Page 35: Signal Processing For Speech Applications - Part 1csl.anthropomatik.kit.edu/downloads/... · –Continuous-time Fourier Transform (CTFT) • Features of the Fourier Transform –Discrete

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

40

• Now we have a discrete signal f[n].

• We can transfer the idea of the Fourier transform to a discrete

signal. Then we get the Discrete Time Fourier Transform of a

sequence x[k]:

• DTFT is periodic with period 2π!

Thus the Nyquist frequency appears as π.

Thus we use only an interval of this length of for the inverse

transformation:

Note: DTFT transforms a discrete signal into a continuous

function.

Discrete Time Fourier Transform (DTFT)

Page 36: Signal Processing For Speech Applications - Part 1csl.anthropomatik.kit.edu/downloads/... · –Continuous-time Fourier Transform (CTFT) • Features of the Fourier Transform –Discrete

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

41

The Dirac distribution is defined

as a single impulse with the area 1.0.

Its Fourier transform is 1.0

A discrete function s[n]

can be expressed

as the weighted sum of

Dirac distributions:

Consequently, we can

define the

discrete Fourier

transform as a sum:

Discrete Time Fourier Transform (DTFT)

Page 37: Signal Processing For Speech Applications - Part 1csl.anthropomatik.kit.edu/downloads/... · –Continuous-time Fourier Transform (CTFT) • Features of the Fourier Transform –Discrete

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

42

• The convolution of two functions f(t) * g(t) is defined as:

• or in the discrete case:

• Convolution is a commutative operation!

Convolution

dgtftgf )()())((

i

kgkifkgf ][][])[(

Page 38: Signal Processing For Speech Applications - Part 1csl.anthropomatik.kit.edu/downloads/... · –Continuous-time Fourier Transform (CTFT) • Features of the Fourier Transform –Discrete

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

43

• Computing the Fourier Transform of

on both sides, we get

i.e. the convolution becomes a multiplication in frequency domain!

(Why?)

Convolution

dgtftgfth )()())(()(

)()()( GFH

Page 39: Signal Processing For Speech Applications - Part 1csl.anthropomatik.kit.edu/downloads/... · –Continuous-time Fourier Transform (CTFT) • Features of the Fourier Transform –Discrete

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

44

• Finally, we summarize some properties of the Fourier transformation

which will be useful later

Often, the effect of a channel on a signal can be described as a

convolution in the time domain.

Then the corresponding effect in the frequency domain becomes a

multiplication.

And in the log-frequency domain we get a simple summation.

Features of the Fourier Transform

Time shift

Frequency Shift

Convolution

Linearity

Differentiation

Page 40: Signal Processing For Speech Applications - Part 1csl.anthropomatik.kit.edu/downloads/... · –Continuous-time Fourier Transform (CTFT) • Features of the Fourier Transform –Discrete

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

46

Overview

• Frequency Representations in Continuous and Discrete Time (II)

– The Fourier Transform – Example

– Examples of CTFTs and DTFTs

– Correspondence Between Frequency In Discrete And Continuous Time

– Short-Term Spectral Analysis

– Short-time Fourier Analysis

– Windowing

• Two Popular Window Shapes

• The Effect of Applying Windows

• Breaking Up Incoming Speech Into Frames Using Hamming Windows

• Time and Frequency Response of Windows

• Speech Visualisation with Spectrograms

• Effect of Window Duration

• Using Filterbanks

– The Mel Scale

– The Bark Scale

Page 41: Signal Processing For Speech Applications - Part 1csl.anthropomatik.kit.edu/downloads/... · –Continuous-time Fourier Transform (CTFT) • Features of the Fourier Transform –Discrete

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

47

The Fourier Transform – Example

• Example for the Fourier Transform:

Left: audio signal, right: frequency spectrum

The audio signal has been samples with 8 kHz

Thus the spectrum has a fundamental frequency of 4 KHz

The vertical axis in the frequency spectrum is logarithmically.

Page 42: Signal Processing For Speech Applications - Part 1csl.anthropomatik.kit.edu/downloads/... · –Continuous-time Fourier Transform (CTFT) • Features of the Fourier Transform –Discrete

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

50

Correspondence Between Frequency In Discrete And Continuous Time

• Continuous signals can have arbitrarily high frequencies

• Discrete signals have the Nyquist frequency as the maximum

frequency

• This frequency appears as π in the discrete signal!

• Suppose that a continuous-time signal is sampled at a rate

greater than the Nyquist rate, with a time between samples of T

• Let W represent continuous-time frequency in radians/sec

• Let represent discrete-time frequency in radians

• Then …

• Comment:

The maximum discrete-time frequency, w = π, corresponds to the

Nyquist frequency in continuous time, half the sampling rate

WT

Page 43: Signal Processing For Speech Applications - Part 1csl.anthropomatik.kit.edu/downloads/... · –Continuous-time Fourier Transform (CTFT) • Features of the Fourier Transform –Discrete

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

51

Facts:

• The frequency distribution

over an entire utterance does

not help much for recognition.

• Most acoustic events (e.g.

phonemes) have durations in

the range of 10 to 100 ms.

• Many acoustic events are

not static (diphtongs) and

need more detailed analysis.

Solution:

• Partition the entire recording

in a sequence of short segments

• The segments may overlap each

other

Short-Term Spectral Analysis

Page 44: Signal Processing For Speech Applications - Part 1csl.anthropomatik.kit.edu/downloads/... · –Continuous-time Fourier Transform (CTFT) • Features of the Fourier Transform –Discrete

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

52

Short-time Fourier Analysis

• Problem: Conventional Fourier analysis does not capture time-varying nature of speech signals

• Solution: Multiply signals by finite-duration window function, then compute DTFT:

• Side effect: Windowing causes spectral blurring

X(ej

) x[n]e jn

n

X[n, ] x[m]w[nm]e jm

m0

N1

Page 45: Signal Processing For Speech Applications - Part 1csl.anthropomatik.kit.edu/downloads/... · –Continuous-time Fourier Transform (CTFT) • Features of the Fourier Transform –Discrete

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

53

Windows Shapes

Page 46: Signal Processing For Speech Applications - Part 1csl.anthropomatik.kit.edu/downloads/... · –Continuous-time Fourier Transform (CTFT) • Features of the Fourier Transform –Discrete

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

54

Two Popular Window Shapes

• Rectangular window:

• Hamming window:

• Comment: Hamming window is frequently preferred because of its frequency response

w[n]1, 0 n N

0, otherwise

w[n].54 .46cos(2n / N), 0 n N

0,otherwise

Page 47: Signal Processing For Speech Applications - Part 1csl.anthropomatik.kit.edu/downloads/... · –Continuous-time Fourier Transform (CTFT) • Features of the Fourier Transform –Discrete

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

55

• When we compute the Fourier transform of a wn-windowed signal

s(m)[n], we get:

• Now let W be the Fourier transform of the window,

then we can express the short time spectrum in terms of a convolution:

• So the application of a window to the speech signal before computing the

short time spectrum has the same effect as if we computed the

convolution of the original short time spectrum with the window's fourier

transform.

The Effect of Applying Windows

Page 48: Signal Processing For Speech Applications - Part 1csl.anthropomatik.kit.edu/downloads/... · –Continuous-time Fourier Transform (CTFT) • Features of the Fourier Transform –Discrete

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

56

• A window has the effect of "smoothing" the resulting spectrum

• The stepsize (when to compute a short time spectrum) and the window size do not have

to be the same. (Often: window size = 2 · stepsize)

• Most often used window shape is Hamming window

Signal Fourier Transform

of the Window

Fourier Transform of

the Windowed Signal

Rectangle

Window

Hamming

Window

The Effect of Applying Windows

Page 49: Signal Processing For Speech Applications - Part 1csl.anthropomatik.kit.edu/downloads/... · –Continuous-time Fourier Transform (CTFT) • Features of the Fourier Transform –Discrete

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

57

Effect of Window Duration

Short-duration window: Long-Duration window:

Time

Fre

qu

en

cy

0 0.2 0.4 0.6 0.8 1 1.2

0

1000

2000

3000

4000

5000

Time

Fre

qu

en

cy

0 0.2 0.4 0.6 0.8 1 1.20

1000

2000

3000

4000

5000

Page 50: Signal Processing For Speech Applications - Part 1csl.anthropomatik.kit.edu/downloads/... · –Continuous-time Fourier Transform (CTFT) • Features of the Fourier Transform –Discrete

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

60

Speech Visualisation with Spectrograms

Page 51: Signal Processing For Speech Applications - Part 1csl.anthropomatik.kit.edu/downloads/... · –Continuous-time Fourier Transform (CTFT) • Features of the Fourier Transform –Discrete

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

61

• All Fourier coefficients reflect too much of the signals microstructure

• The microstructure contains redundancies but also a lot of "misleading"

information

• Solution – Filterbanks: The human ear also works with "filterbanks"

• Different approaches to computing filterbank coefficients:

Fixed width

filters:

Variable

width:

Overlapping

filters:

Using Filterbanks

• Typical filterbanks: mel or bark scales

Page 52: Signal Processing For Speech Applications - Part 1csl.anthropomatik.kit.edu/downloads/... · –Continuous-time Fourier Transform (CTFT) • Features of the Fourier Transform –Discrete

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

62

• Mel: Perceptual scale of pitches (MELody)

Proposed by Stevens Volkman and Newman 1937

Let people listen to tones and judge what pitch they perceive

Find pitches to be perceived equal in distance from one another

Compare it with the actual pitch and define a function

Result = melscale: 1 mel = 1127 log (1+ f/700)

Above 500Hz larger and larger

intervals are judged to be equal

pitch increments

(4 hertz octaves correspond to

two mel octaves)

Then: make filterbanks with equal

bandwidths in mels

The Mel Scale

Source: Wikipedia

Page 53: Signal Processing For Speech Applications - Part 1csl.anthropomatik.kit.edu/downloads/... · –Continuous-time Fourier Transform (CTFT) • Features of the Fourier Transform –Discrete

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

63

• Experiment (by Barkhausen):

Let person listen to two frequencies

If the frequencies differ significantly, then person does notice this

If the frequencies are close, the person can hear only one frequency

The minimal frequency-distance (critical bandwidth) is called 1 bark

The scale ranges from 1 to 24 and corresponds to the first 24

critical bands of hearing.

The subsequent band edges are (in Hz) 20, 100, 200, 300, 400,

510, 630, 770, 920, 1080, 1270, 1480, 1720, 2000, 2320, 2700,

3150, 3700, 4400, 5300, 6400, 7700, 9500, 12000, 15500.

The Bark Scale

Page 54: Signal Processing For Speech Applications - Part 1csl.anthropomatik.kit.edu/downloads/... · –Continuous-time Fourier Transform (CTFT) • Features of the Fourier Transform –Discrete

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

64

Thanks for your interest!