Page 0 of 23 MELP Vocoders Nima Moghadam SN#:82245502 Saeed Nari SN#:82270309 Supervisor Dr. Saameti...

24
Page 1 of 23 MELP Vocoders MELP Vocoders Nima Moghadam SN#:82245502 Saeed Nari SN#:82270309 Supervisor Dr. Saameti April 2005 Sharif University of Technology

Transcript of Page 0 of 23 MELP Vocoders Nima Moghadam SN#:82245502 Saeed Nari SN#:82270309 Supervisor Dr. Saameti...

Page 1: Page 0 of 23 MELP Vocoders Nima Moghadam SN#:82245502 Saeed Nari SN#:82270309 Supervisor Dr. Saameti April 2005 Sharif University of Technology.

Page 1 of 23

MELP VocodersMELP Vocoders

Nima Moghadam SN#:82245502

Saeed NariSN#:82270309

Supervisor

Dr. Saameti

April 2005Sharif University of Technology

Page 2: Page 0 of 23 MELP Vocoders Nima Moghadam SN#:82245502 Saeed Nari SN#:82270309 Supervisor Dr. Saameti April 2005 Sharif University of Technology.

Page 2 of 23

OutlineOutline

IntroductionMELP Vocoder FeaturesAlgorithm DescriptionParameters & Comparison

Page 3: Page 0 of 23 MELP Vocoders Nima Moghadam SN#:82245502 Saeed Nari SN#:82270309 Supervisor Dr. Saameti April 2005 Sharif University of Technology.

Page 3 of 23

IntroductionIntroduction

Traditional pitched-excited LPC vocoders use either a periodic train or white noise for synthesis filter

intelligible speech at very low bit ratesBut sometimes results in mechanical or

buzzy sound and are prone to tonal noise

Page 4: Page 0 of 23 MELP Vocoders Nima Moghadam SN#:82245502 Saeed Nari SN#:82270309 Supervisor Dr. Saameti April 2005 Sharif University of Technology.

Page 4 of 23

IntroductionIntroduction

These problems arise from:– Inability of a simple pulse train to reproduce

all kind of voiced speech

MELP vocoder uses a mixed-excitation model and it represents a richer ensemble of speech characteristic Produce more natural sounding speech

Page 5: Page 0 of 23 MELP Vocoders Nima Moghadam SN#:82245502 Saeed Nari SN#:82270309 Supervisor Dr. Saameti April 2005 Sharif University of Technology.

Page 5 of 23

MELP vocoderMELP vocoder

Robust in background noise environments

Based on traditional LPC model, also includes additional features

Aperiodic pulses

Adaptive spectral enhancement

Mixed excitation

Pulse dispersion

Page 6: Page 0 of 23 MELP Vocoders Nima Moghadam SN#:82245502 Saeed Nari SN#:82270309 Supervisor Dr. Saameti April 2005 Sharif University of Technology.

Page 6 of 23

Mixed ExcitationMixed Excitation

Mixed-excitation is implemented using a multi-band mixing model

This model can simulate frequency dependent voicing strength

Using a mixture of Aperiodic/periodic and white noise as excitation

Primary effect of this unit is to reduce the buzz in broadband acoustic noise

Page 7: Page 0 of 23 MELP Vocoders Nima Moghadam SN#:82245502 Saeed Nari SN#:82270309 Supervisor Dr. Saameti April 2005 Sharif University of Technology.

Page 7 of 23

Aperiodic pulsesAperiodic pulses

When input signal is voiced, MELP vocoder can synthesize speech using either aperiodic or periodic pulses.

Aperiodic pulses used during transition regions between voiced and unvoiced segments of speech signal

Producing erratic glottal pulses without tonal noise

Page 8: Page 0 of 23 MELP Vocoders Nima Moghadam SN#:82245502 Saeed Nari SN#:82270309 Supervisor Dr. Saameti April 2005 Sharif University of Technology.

Page 8 of 23

Pulse DispersionPulse Dispersion

Pulse dispersion is implemented using fixed pulse dispersion filter based on a flattened triangle pulse

The pulse dispersion filter improves the match of bandpass filtered synthetic and natural speech waveforms in frequency bands which do not contain a formant resonance.

Spreading the excitation energy with a pitch periodReduce harsh quality of the synthetic speech

Page 9: Page 0 of 23 MELP Vocoders Nima Moghadam SN#:82245502 Saeed Nari SN#:82270309 Supervisor Dr. Saameti April 2005 Sharif University of Technology.

Page 9 of 23

Adaptive spectral enhancement filterAdaptive spectral enhancement filter

Based on the poles of the vocal tract filterIs used to enhance the formant structure

in the synthetic speechThis filter improves the match between

synthetic and natural bandpass waveforms more natural speech output

Page 10: Page 0 of 23 MELP Vocoders Nima Moghadam SN#:82245502 Saeed Nari SN#:82270309 Supervisor Dr. Saameti April 2005 Sharif University of Technology.

Page 10 of 23

MELP Algorithm Description MELP Algorithm Description (Encoder)(Encoder)

1. filter out any low frequency noise

2. This filtered speech is again filtered in order to perform the initial pitch search for the pitch estimation

3. The next step is to perform the Bandpass voicing analysis

- In this step we decide to use periodic/Aperiodic train or white noise model

Page 11: Page 0 of 23 MELP Vocoders Nima Moghadam SN#:82245502 Saeed Nari SN#:82270309 Supervisor Dr. Saameti April 2005 Sharif University of Technology.

Page 11 of 23

MELP Algorithm Description MELP Algorithm Description (Encoder) (Encoder) cont’dcont’d

In this stage A voice degree parameter is estimated in each band, based on the normalized correlation function of the speech signal and the smoothed rectified signal in the non-DC band

Let sk(n) denote the speech signal in band k, uk(n) denote the DC-removed smoothed rectified signal of sk(n). The correlation function:

2/11

0

21

0

2

1

0

])()([

)()()(

N

n

N

n

N

nx

pnxnx

pnxnxpR

P – the pitch of current frame

N – the frame length

k – the voicing strength for band (defined as max(Rsk(P),Ruk(P)))

Page 12: Page 0 of 23 MELP Vocoders Nima Moghadam SN#:82245502 Saeed Nari SN#:82270309 Supervisor Dr. Saameti April 2005 Sharif University of Technology.

Page 12 of 23

MELP Algorithm Description MELP Algorithm Description (Encoder ) (Encoder ) cont’dcont’d

The jittery state is determined by the peakiness of the fullwave rectified LP residue e(n):

1

0

1

0

2/12

)(1

])(1

[

N

n

N

n

neN

neN

Peakiness

If peakiness is greater than some threshold, the speech frame is then flagged as jittered (Aperiodic flag will be set)

Page 13: Page 0 of 23 MELP Vocoders Nima Moghadam SN#:82245502 Saeed Nari SN#:82270309 Supervisor Dr. Saameti April 2005 Sharif University of Technology.

Page 13 of 23

MELP Algorithm Description MELP Algorithm Description (Encoder) (Encoder) cont’dcont’d

4. Applying a LPC analysis

5. Calculating final pitch estimate

6. Calculating Gain estimate

7. quantize the LPC coefficients, pitch, gain and bandpass voicing

8. Fourier magnitudes are determined and quantized The information in these coefficients improves the

accuracy of the speech production model at the perceptually-important lower frequencies

Page 14: Page 0 of 23 MELP Vocoders Nima Moghadam SN#:82245502 Saeed Nari SN#:82270309 Supervisor Dr. Saameti April 2005 Sharif University of Technology.

Page 14 of 23

MELP EncoderMELP Encoder

Pre filter Pitch Search

Bandpass Voicing Decision

GainCalculator

LPC Analysis

Filter

Final PitchAnd voicing

Decision

LSF quantization

QuantizeGain, pitch,Voicing,

jitter

FourierMagnitudecalculation

ApplyForward

Error Correction

Input

signal

Transmitted

Bitstream

Page 15: Page 0 of 23 MELP Vocoders Nima Moghadam SN#:82245502 Saeed Nari SN#:82270309 Supervisor Dr. Saameti April 2005 Sharif University of Technology.

Page 15 of 23

MELP Algorithm (Decoder)MELP Algorithm (Decoder)

1. Decoding the pitch

2. Applying gain attenuation

3. Interpolating linearly all of the synthesis parameters pitch-synchronously

4. Generating mixed-excitation

Page 16: Page 0 of 23 MELP Vocoders Nima Moghadam SN#:82245502 Saeed Nari SN#:82270309 Supervisor Dr. Saameti April 2005 Sharif University of Technology.

Page 16 of 23

MELP Algorithm (Decoder) MELP Algorithm (Decoder) cont’dcont’d

5. Applying an adaptive spectral enhancement filter

6. LPC synthesis and applying gain factor

7. Dispersion filtering

Page 17: Page 0 of 23 MELP Vocoders Nima Moghadam SN#:82245502 Saeed Nari SN#:82270309 Supervisor Dr. Saameti April 2005 Sharif University of Technology.

Page 17 of 23

MELP DecoderMELP Decoder

Decodeparameters

NoiseGenerator

NoiseShaping

Filter

PulseGenerator

PulsePosition

Jitter

PulseShaping

Filter

AdaptiveSpectral

Enhancement+

LPCSynthesis

Filter

PulseDispersion

Filtergain

Received Bitstream

Synthesized

Speech

Page 18: Page 0 of 23 MELP Vocoders Nima Moghadam SN#:82245502 Saeed Nari SN#:82270309 Supervisor Dr. Saameti April 2005 Sharif University of Technology.

Page 18 of 23

Parameter QuantizationParameter Quantization

Parameters Voiced Unvoiced

LSF parameters 25 25

Fourier magnitudes 8 -

Gain (2 per frames) 8 8

Pitch. overall voicing 7 7

Bandpass voicing 4 -

Aperiodic flag 1 -

Error protection - 13

Sync bit 1 1

Total bits / 22.5 ms frame

54 54

Page 19: Page 0 of 23 MELP Vocoders Nima Moghadam SN#:82245502 Saeed Nari SN#:82270309 Supervisor Dr. Saameti April 2005 Sharif University of Technology.

Page 19 of 23

Bit transmission orderBit transmission order

Page 20: Page 0 of 23 MELP Vocoders Nima Moghadam SN#:82245502 Saeed Nari SN#:82270309 Supervisor Dr. Saameti April 2005 Sharif University of Technology.

Page 20 of 23

Comparison of the 2400 BPS MELP with Comparison of the 2400 BPS MELP with other Standard Codersother Standard Coders

Diagnostic Acceptability Measure

Two Conditions– Quiet

– Office

Continuously Variable Slope Delta Modulation (CVSD)

• 16,000 bps Code Excited Linear Prediction (CELP)

• 4800 bps • FS1016

Mixed Excitation Linear Prediction (MELP) • 2400 bps • FIPS Publication 137

Linear Predictive Coding (LPC) • 2400 bps

Page 21: Page 0 of 23 MELP Vocoders Nima Moghadam SN#:82245502 Saeed Nari SN#:82270309 Supervisor Dr. Saameti April 2005 Sharif University of Technology.

Page 21 of 23

Comparison of the 2400 BPS MELP with Comparison of the 2400 BPS MELP with other Standard Coders (cont’d)other Standard Coders (cont’d)

Mean Opinion Score in Six ConditionsQuiet

– Anechoic Sound Chamber – Dynamic Microphone

Quiet - H250 – Anechoic Sound Chamber – H250 Microphone

1% Random Bit Errors – Anechoic Sound Chamber – Dynamic Microphone

0.5% Random Block Errors – Anechoic Sound Chamber – Dynamic Microphone – 50% Errors within a 35ms block

Office – Modern Office Environment – Dynamic Microphone

Mobile Command Environment – Field Shelter – EV M87 Microphone

Page 22: Page 0 of 23 MELP Vocoders Nima Moghadam SN#:82245502 Saeed Nari SN#:82270309 Supervisor Dr. Saameti April 2005 Sharif University of Technology.

Page 22 of 23

Comparison of the 2400 BPS MELP with Comparison of the 2400 BPS MELP with other Standard Coders (cont’d)other Standard Coders (cont’d)

Complexity with three Measurements

– RAM– ROM– MIPS

Page 23: Page 0 of 23 MELP Vocoders Nima Moghadam SN#:82245502 Saeed Nari SN#:82270309 Supervisor Dr. Saameti April 2005 Sharif University of Technology.

Page 23 of 23

Voice samplesVoice samples

Original Sound

MELP 1800

MELP 2000

MELP 2200

Page 24: Page 0 of 23 MELP Vocoders Nima Moghadam SN#:82245502 Saeed Nari SN#:82270309 Supervisor Dr. Saameti April 2005 Sharif University of Technology.

Page 24 of 23

Any Question?Any Question?

Thanks!