Signal Processing For Speech Applications - Part 1csl.anthropomatik.kit.edu/downloads/... ·...

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

1

May 7, 2013

Signal Processing

For Speech Applications

- Part 1

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

2

References:

• Huang et al., Chapter on DSP

• Classical paper: Schafer/Rabiner in Waibel/Lee (on the

web)

• Nahin: "Dr. Euler's Fabulous Formula" – excellent

explanation of Fourier sums and the Fourier Transform,

written for Engineering students

Note: many slides of this lecture are from Rich Stern

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

3

Signal Processing For Speech Applications

• Major speech technologies:

– Speech coding

– Speech synthesis

– Speech recognition

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

4

Goals Of Speech Representations

• Capture important phonetic information in speech

• Computational efficiency

• Efficiency in storage requirements

• Optimize generalization

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

5

5

Representation of Speech

Definition: Digital representation of speech

Represent speech as a sequences of numbers

(as a prerequisite for automatic processing using computers)

1) Direct representation of speech waveform:

represent speech waveform as accurate as possible so that an

acoustic signal can be reconstructed

2) Parametric representation

Represent a set of properties/parameters with regard to a certain model

Decide the targeted application first:

• Speech coding

• Speech synthesis

• Speech recognition

Classical paper: Schafer/Rabiner in Waibel/Lee (paper online)

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

6

Speech Coding

Objectives of Speech Coding:

– Quality versus bit rate

– Quantization Noise

– High measured intelligibility

– Low bit rate (b/s of speech)

– Low computational requirement

– Robustness to transmission errors

– Robustness to successive

encode/decode cycles

Objectives for real-time:

– Low coding/decoding delay

– Work with non-speech

signals (e.g. touch tone)

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

7

Speech Synthesis

The standard approach:

Speech Synthesis = Text-to-Speech conversion (don't confuse

that with Speech Recognition)

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

8

Speech Recognition

• Major functional components:

– Signal processing to extract relevant features from speech

waveforms

– Comparison of features to pre-stored templates

• Important design choices:

– Choice of features (Optimize Generalization)

– Specific method of comparing features to stored templates

Feature

extraction

Decision

making

procedure A/D

Hypotheses

(phonemes)

Speech

features Speech

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

9

Goals Of This Lecture

How do we accomplish feature extraction for speech recognition?

Some specific topics:

– Quantization (A/D Conversion)

– Sampling

– Filter Bank Coefficients

– Linear predictive coding (LPC)

– LPC-derived cepstral coefficients (LPCC)

– Mel-frequency cepstral coefficients (MFCC)

Some of the underlying mathematics

– Continuous-time Fourier transform (CTFT)

– Discrete-time Fourier transform (DTFT)

– Z-transforms

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

10

Overview (I)

• Introduction

– A note on oscillations

• Frequeny transformation

– Why Perform Signal Processing?

• Different Kinds of Signals

– Quantization

• Quantization of Signals

• Quantization of Speech Signals

– Sampling of continuous-time signals

• How Frequently Should we Sample? – Aliasing

• The Sampling Theorem

• Acoustic Features in the Sampled Signal

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

11

Overview (II)

• Frequency Representations in Continuous and Discrete Time (I)

– Why Perform Signal Processing In The Frequency Domain?

– The Source-Filter Model For Speech

– Frequency Representation

– Fourier Sums

– Continuous-time Fourier Transform (CTFT)

• Features of the Fourier Transform

– Discrete Time Fourier Transform (DTFT)

– Convolution

• Convolution of a Function with an Impulse Train

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

12

Overview (III)

• Frequency Representations in Continuous and Discrete Time (II)

– The Fourier Transform – Example

– Examples of CTFTs and DTFTs

– Correspondence Between Frequency In Discrete And Continuous Time

– Short-Term Spectral Analysis

– Short-time Fourier Analysis

– Windowing

• Two Popular Window Shapes

• The Effect of Applying Windows

• Breaking Up Incoming Speech Into Frames Using Hamming Windows

• Time and Frequency Response of Windows

• Speech Visualisation with Spectrograms

• Effect of Window Duration

• Using Filterbanks

– The Mel Scale

– The Bark Scale

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

13

• Introduction

– A note on oscillations

• Frequeny transformation

– Why Perform Signal Processing?

• Different Kinds of Signals

– Quantization

• Quantization of Signals

• Quantization of Speech Signals

– Sampling of continuous-time signals

• How Frequently Should we Sample? – Aliasing

• The Sampling Theorem

• Acoustic Features in the Sampled Signal

Overview

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

14

A note on oscillations

• Very generally, frequency means "the number of

occurrences of a repeating event per unit time"

(Wikipedia (en), Frequency)

• Right-hand side: impulse trains with different

frequencies (frequency increases from top to

bottom)

• More specifically, if a signal contains a frequency, this

means that it contains a pure oscillation

• We need both meanings

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

15

Frequency components

• Let's look at the following graphic

• Left: two sine waves. These are pure oscillations with only

one frequency.

• Right-hand sinde: visualization of the contained frequencies

(must be symmetric w.r.t. the vertical axis)

• Below: a sum of two oscillations -> we also have a sum of

the contained frequencies

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

16

The complex oscillation

• We have learnt that the sine wave f(t) = sin(ωt) (and the

cosine, of course) are pure (real) oscillations.

• They can be combined to form a complex

oscillation f(t) = eiωt = cos(ωt) + i sin(ωt),

where i is the imaginary unit.

• Visualization: If t runs from 0 to 2π, f(t)

moves around the unit circle. The real

and imaginary parts are the real

oscillations

• We see that sin and cos are

"complementary": sin2(ωt) + cos2(ωt) = 1.

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

17

• Let's assume a complex signal.

• Which frequencies does it contain?

• We measure the frequency components of a function f(t)

with the integral

• F(ω) is the Fourier Transform of f(t). Does everybody

recognize the complex oscillation?

dtetfF ti )()(

How to measure frequency components?

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

18

• Let's look at one example

• Here's a complex signal which looks rather ugly: The

signal is heavily disturbed by some kind of "noise".

• We take the Fourier transform, visualize it, and notice

that the noise has very specific frequencies!

• We will get to know other

applications of frequencies and

the Fourier transformation.

Example

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

19

A digital signal representation

• A look at the time-domain waveform of “six”:

How do we get this into a computer?

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

20

Sampling & Quantization

Goal: Given a signal that is continuous in time and amplitude,

find a discrete representation.

Two steps are necessary: sampling and quantization.

• Quantization corresponds to a discretization of the y-axis

• Sampling corresponds to a discretization of the x-axis

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

21

• Given a discrete signal f[i] to be quantized into q[i]

• Assume that f is between fmin and fmax

• Partition y-axis into a fixed number n of (equally sized) intervals

• Usually n=2b , in ASR typically b=16 n=65536 (16-bit quantization)

• q[i] can only have values that are centers of the intervals

• Quantization: assign q[i] the center of the interval in which lies f[i]

• Quantization makes errors, i.e. adds noise to the signal: f[i]=q[i]+e[i]

• The average quantization error e[i] is (fmax-fmin)/(2n)

• Define signal to noise ratio SNR[dB] = power(f[i]) / power(e[i])

Quantization of Signals

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

22

• Choice of quantization depth:

Speech signals are usually in the range between 50 dB and 60 dB

The lower the SNR, the lower the speech recognition performance

To get a reasonable SNR, b should be at least 10 to 12

Each bit contributes to about 6db of SNR

Typically in ASR the samples are quantized with 16 bits

• Speech signals' amplitudes are not equally distributed

Speech amplitudes are exponentially distributed around their mean

So the information-theoretic optimum is not to partition the y-axis

into equal intervals

therefore, often used: µ-law encoding

f(µ)[n] = fmax · sgn(f[n]) · log(1+µ|f[n]|/fmax) / log(1+µ) , µ=100,...,500

µ-law encoding makes speech amplitudes equally distributed before

quantization

Quantization of Speech Signals

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

23

Sampling Continuous-time Signals

Sampling converts the analog signal (in the time domain) into a digital

representation (also in the time domain!)

Time domain:

• x-axis = time

• y-axis = amplitude

Questions:

• Do we lose information during sampling?

• How can we prevent it?

0 20 40 60 80 100 120-0.5

0

0.5

1

1.5

Original speech signal

0 20 40 60 80 100 120-0.5

0

0.5

1

1.5

Sampled version of signal

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

24

How Frequently Should we Sample? – Aliasing

• The number of samples per unit of time (usually seconds) taken from

a continuous signal to make a discrete signal is called sampling rate

• The unit for sampling rate is hertz (inverse seconds, 1/s)

• Example: Correct sampling (above) /

undersampling (below)

– After undersampling we can not

(uniquely) recover the original frequency!

– This effect is called aliasing.

– How can we prevent aliasing?

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

26

• How can we prevent aliasing?

In practical applications, the signals are band-limited

Those band-limited signals do not contain any arbitrary frequency.

But there is a maximum frequency (let's call it fl).

The band limitation can be ensured already during the recording process by

an appropriate A/D converter (with analog frequency filters)

When the sampling rate is too low, the samples can contain "incorrect"

frequencies.

This frequence 2fl is also called Nyquist rate.

The maximal frequency fl, which can be returned correctly is called

Nyquist frequency.

The Aliasing Effect

Nyquist or sampling theorem:

When a fl-band-limited signal is sampled with a sampling rate of at

least 2fl then the signal can be exactly reproduced from the samples.

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

27

Why Perform Signal Processing?

• Another look at the time-domain waveform of “six”

• Digitalization is OK: now we have a discrete representation f[n]

• It’s still hard to infer much from the time-domain waveform

• We need to extract relevant features

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

28

• Amplitude (1) (not very useful) The envelope of the amplitude is correlated to the power

(= integral of the squared signal)

Power is useful for detecting speech vs. silence,

also syllables, phrase boundaries, prosody

• Peak to peak (2) (correlated to amplitude)

• Root mean square (RMS) amplitude (3): Square root of the mean over time

(proportional to average power)

• Zero crossing rate can help distinguish some

weak sounds from silence

• Voiced/unvoiced homogeneous periodic signal vs. white noise in fricatives

Acoustic Features in the Sampled Signal

Source: Wikipedia

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

29

• By now we have considered the signal in the time domain,

i.e. we have shown the amplitude (signal strength) as a function of time

• Now we wish to consider the signal in the frequency domain,

– i.e. we want to know which frequency components occur in the signal

– "Frequency domain" means that we look at speech as a function of the frequency

• We know how to get a frequency domain representation: Fourier

transformation

• So this is our application of the Fourier transformation: We extract

frequency-based features

The Frequency Domain

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

30

• Frequency Representations in Continuous and Discrete Time (I)

– Why Perform Signal Processing In The Frequency Domain?

– The Source-Filter Model For Speech

– Frequency Representation

– Fourier Sums

– Continuous-time Fourier Transform (CTFT)

• Features of the Fourier Transform

– Discrete Time Fourier Transform (DTFT)

– Convolution

• Convolution of a Function with an Impulse Train

Overview

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

31

Why Perform Signal Processing In The Frequency Domain?

• Use of frequency analysis often simplifies signal

processing

• Use of frequency analysis often facilitates

understanding

• Human hearing is based on frequency analysis

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

33

The Source-Filter Model For Speech

• Sounds are produced either by

- vibrating the vocal cords (voiced sounds) or

- random noise resulting from friction of the airflow (unvoiced sounds)

- voiced fricatives need a mixed excitation model

• Signal un is modulated by the vocal tract, whose "impulse response" is vn

• Resulting signal is modulated by the lips and nostrils' radiation response rn

• Eventually the resulting signal fn is emitted.

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

34

• A useful model for representing the generation

of speech sounds:

• The behavior of this model is best described in the

frequency domain

• So now we have two reasons to consider speech in

frequency domain: speech generation and hearing

Pitch Pulse train

source

Noise source

Vocal tract model

Amplitude

p[n]

The Source-Filter Model For Speech

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

38

• We reconsider the Fourier transform defined before:

• is the complex oscillation (with the

frequency ω).

• Thus the Fourier transformation is

actually complex!

Continuous-Time Fourier Transform

dtetftfFTF ti )())(()(

tite ti sincos

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

39

We can express the Fourier transform F of a function f in polar coordinates:

Then

is called the spectrum,

and

is the amplitude spectrum

and

is the phase spectrum

and

is the power spectrum

•

•Often, when we say "spectrum" we actually mean "power spectrum".

• It is general consensus that the phase spectrum in not important for

speech recognition.

Continuous-Time Fourier Transform (CTFT)

))sin()(cos(|)(||)(|)( titFeFF ti

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

40

• Now we have a discrete signal f[n].

• We can transfer the idea of the Fourier transform to a discrete

signal. Then we get the Discrete Time Fourier Transform of a

sequence x[k]:

• DTFT is periodic with period 2π!

Thus the Nyquist frequency appears as π.

Thus we use only an interval of this length of for the inverse

transformation:

Note: DTFT transforms a discrete signal into a continuous

function.

Discrete Time Fourier Transform (DTFT)

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

41

The Dirac distribution is defined

as a single impulse with the area 1.0.

Its Fourier transform is 1.0

A discrete function s[n]

can be expressed

as the weighted sum of

Dirac distributions:

Consequently, we can

define the

discrete Fourier

transform as a sum:

Discrete Time Fourier Transform (DTFT)

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

42

• The convolution of two functions f(t) * g(t) is defined as:

• or in the discrete case:

• Convolution is a commutative operation!

Convolution

dgtftgf )()())((

i

kgkifkgf ][][])[(

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

43

• Computing the Fourier Transform of

on both sides, we get

i.e. the convolution becomes a multiplication in frequency domain!

(Why?)

Convolution

dgtftgfth )()())(()(

)()()( GFH

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

44

• Finally, we summarize some properties of the Fourier transformation

which will be useful later

Often, the effect of a channel on a signal can be described as a

convolution in the time domain.

Then the corresponding effect in the frequency domain becomes a

multiplication.

And in the log-frequency domain we get a simple summation.

Features of the Fourier Transform

Time shift

Frequency Shift

Convolution

Linearity

Differentiation

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

46

Overview

• Frequency Representations in Continuous and Discrete Time (II)

– The Fourier Transform – Example

– Examples of CTFTs and DTFTs

– Correspondence Between Frequency In Discrete And Continuous Time

– Short-Term Spectral Analysis

– Short-time Fourier Analysis

– Windowing

• Two Popular Window Shapes

• The Effect of Applying Windows

• Breaking Up Incoming Speech Into Frames Using Hamming Windows

• Time and Frequency Response of Windows

• Speech Visualisation with Spectrograms

• Effect of Window Duration

• Using Filterbanks

– The Mel Scale

– The Bark Scale

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

47

The Fourier Transform – Example

• Example for the Fourier Transform:

Left: audio signal, right: frequency spectrum

The audio signal has been samples with 8 kHz

Thus the spectrum has a fundamental frequency of 4 KHz

The vertical axis in the frequency spectrum is logarithmically.

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

50

Correspondence Between Frequency In Discrete And Continuous Time

• Continuous signals can have arbitrarily high frequencies

• Discrete signals have the Nyquist frequency as the maximum

frequency

• This frequency appears as π in the discrete signal!

• Suppose that a continuous-time signal is sampled at a rate

greater than the Nyquist rate, with a time between samples of T

• Let W represent continuous-time frequency in radians/sec

• Let represent discrete-time frequency in radians

• Then …

• Comment:

The maximum discrete-time frequency, w = π, corresponds to the

Nyquist frequency in continuous time, half the sampling rate

WT

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

51

Facts:

• The frequency distribution

over an entire utterance does

not help much for recognition.

• Most acoustic events (e.g.

phonemes) have durations in

the range of 10 to 100 ms.

• Many acoustic events are

not static (diphtongs) and

need more detailed analysis.

Solution:

• Partition the entire recording

in a sequence of short segments

• The segments may overlap each

other

Short-Term Spectral Analysis

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

52

Short-time Fourier Analysis

• Problem: Conventional Fourier analysis does not capture time-varying nature of speech signals

• Solution: Multiply signals by finite-duration window function, then compute DTFT:

• Side effect: Windowing causes spectral blurring

X(ej

) x[n]e jn

n

X[n, ] x[m]w[nm]e jm

m0

N1

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

53

Windows Shapes

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

54

Two Popular Window Shapes

• Rectangular window:

• Hamming window:

• Comment: Hamming window is frequently preferred because of its frequency response

w[n]1, 0 n N

0, otherwise

w[n].54 .46cos(2n / N), 0 n N

0,otherwise

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

55

• When we compute the Fourier transform of a wn-windowed signal

s(m)[n], we get:

• Now let W be the Fourier transform of the window,

then we can express the short time spectrum in terms of a convolution:

• So the application of a window to the speech signal before computing the

short time spectrum has the same effect as if we computed the

convolution of the original short time spectrum with the window's fourier

transform.

The Effect of Applying Windows

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

56

• A window has the effect of "smoothing" the resulting spectrum

• The stepsize (when to compute a short time spectrum) and the window size do not have

to be the same. (Often: window size = 2 · stepsize)

• Most often used window shape is Hamming window

Signal Fourier Transform

of the Window

Fourier Transform of

the Windowed Signal

Rectangle

Window

Hamming

Window

The Effect of Applying Windows

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

57

Effect of Window Duration

Short-duration window: Long-Duration window:

Time

Fre

qu

en

cy

0 0.2 0.4 0.6 0.8 1 1.2

0

1000

2000

3000

4000

5000

Time

Fre

qu

en

cy

0 0.2 0.4 0.6 0.8 1 1.20

1000

2000

3000

4000

5000

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

60

Speech Visualisation with Spectrograms

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

61

• All Fourier coefficients reflect too much of the signals microstructure

• The microstructure contains redundancies but also a lot of "misleading"

information

• Solution – Filterbanks: The human ear also works with "filterbanks"

• Different approaches to computing filterbank coefficients:

Fixed width

filters:

Variable

width:

Overlapping

filters:

Using Filterbanks

• Typical filterbanks: mel or bark scales

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

62

• Mel: Perceptual scale of pitches (MELody)

Proposed by Stevens Volkman and Newman 1937

Let people listen to tones and judge what pitch they perceive

Find pitches to be perceived equal in distance from one another

Compare it with the actual pitch and define a function

Result = melscale: 1 mel = 1127 log (1+ f/700)

Above 500Hz larger and larger

intervals are judged to be equal

pitch increments

(4 hertz octaves correspond to

two mel octaves)

Then: make filterbanks with equal

bandwidths in mels

The Mel Scale

Source: Wikipedia

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

63

• Experiment (by Barkhausen):

Let person listen to two frequencies

If the frequencies differ significantly, then person does notice this

If the frequencies are close, the person can hear only one frequency

The minimal frequency-distance (critical bandwidth) is called 1 bark

The scale ranges from 1 to 24 and corresponds to the first 24

critical bands of hearing.

The subsequent band edges are (in Hz) 20, 100, 200, 300, 400,

510, 630, 770, 920, 1080, 1270, 1480, 1720, 2000, 2320, 2700,

3150, 3700, 4400, 5300, 6400, 7700, 9500, 12000, 15500.

The Bark Scale

Sig

na

l Pro

ce

ssin

g fo

r S

pe

ech

Ap

plic

atio

ns -

64

Thanks for your interest!

Signal Processing For Speech Applications - Part 1csl.anthropomatik.kit.edu/downloads/... ·...

Documents

Transcript of Signal Processing For Speech Applications - Part 1csl.anthropomatik.kit.edu/downloads/... ·...