Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015....

46
Omar A. Nasr [email protected] April, 2015 Multimedia communications ECP 610 1

Transcript of Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015....

Page 1: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5

Omar A. Nasr

[email protected]

April, 2015

Multimedia communications

ECP 610

1

Page 2: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5

Speech coding (compression)

2

A procedure to represent a digitized speech signal using as

few bits as possible, maintaining at the same time a

reasonable level of speech quality.

The standard defines the compression algorithm, not the

platform of implementation (DSP, GPP, FPGA, ASIC, .. etc)

Uncoded speech: 8 kHz sampling x 16bits/sample =

128kbps

Issues: effects due to the channel errors

Page 3: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5

A good speech coder

3

Low Bit rate

High speech quality (intelligibility, naturalness, pleasantness,

and speaker recognizability)

Robustness across Different Speakers / Languages (males,

females, adults, kids)

Robustness in the Presence of Channel Errors

Low Memory Size and Low Computational Complexity

Low Coding Delay

Page 4: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5

Coder delay

4

Page 5: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5

Classification of speech coders

5

Page 6: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5

Classification by coding technique

6

Waveform coders

preserve the original shape of the signal waveform, and hence

the resultant coders can generally be applied to any signal

source.

Data rates 24-64kbps

Can be measured by SNR

Parametric coders

the speech signal is assumed to be generated from a model,

which is controlled by some parameters

Does not preserve the shape of the signal

Low bit rates (can reach less than 2kbps)

Page 7: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5

Classification by coding technique

7

Hybrid coders

Parametric + waveform

Assume a model, then add more parameters to reach a

waveform that is close to the original waveform

Medium bit rate

Page 8: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5

Parametric speech coding

8

Page 9: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5

Models

9

Human auditory systems

Speech production model

Phase perception

Page 10: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5

Linear prediction

10

Basic idea: approximate each speech sample as a linear combination of the past few samples

Weights minimizes the mean square prediction error

The resultant weights are the Linear Prediction Coefficients (LPCs)

LPCs change from frame to frame

Another interpretation of LP is as a spectrum estimation method

By computing the LPCs of a signal frame, it is possible to generate another signal in such a way that the spectral contents are close to the original one

Page 11: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5

Linear prediction

11

Prediction … redundancy removal

The problem of linear prediction

Page 12: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5

Derivation of the LPCs

12

Page 13: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5

Prediction Gain

13

Page 14: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5

15

Page 15: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5

For voiced frames, capture the envelop

16

Page 16: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5

Reflection coefficients

17

There is a linear mapping between reflection coefficients and

the linear prediction coefficients

The effect of quantization of reflection coefficients is less

than the quantization of the LPC coefficients

Page 17: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5

Long term linear prediction

18

Prediction order should be > pitch period to accurately

model voiced signals

Problem: time varying + high bit rate (many LPCs)

Page 18: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5

Long Term Linear Prediction

19

Page 19: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5

20

Page 20: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5

Synthesis filters

21

Page 21: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5

22

Page 22: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5

Pre-emphasis of the speech waveform

23

To compensate the roll off of the high frequencies in the

spectrum

Page 23: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5

Waveform CODECs

24

G.711

Objective: minimize average distortion.

You need to know the distribution of the input signal

G.711 standard

Page 24: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5

G.726

25

Page 25: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5

Vector quantization

26

-every pair of numbers falling

in a particular region are

approximated by a red star associated with that region

Page 26: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5

27

Page 27: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5

Linear Prediction Coding

28

FS1015, 2.4kbps, 1982

Originally for military applications. its synthetic output speech that often requires trained operators for reliable usage

Each frame has parameters

Encoder estimates paramters

Page 28: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5

Linear Prediction Coding

29

Frame duration : 180 samples (22.5 ms)

Page 29: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5

FS1015 (LPC10)

30

Input: 8kHz speech, PCM, 12 bits/sample

Frame size: 180 samples = 22.5 ms

Possible pitch periods = only 60 values

54 bits per frame. Hence bit rate = 54*8000/180 = 2400

Page 30: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5

Advantages and disadvantages

31

Advantages:

Low bit rate

Very simple encoder and decoder

Disadvantages:

Sometimes the speech frame cannot be classified as strictly

voiced or unvoiced

The use of noise or impulse train is not a good modelling

Bad quality

Samples:

Page 31: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5

REGULAR-PULSE EXCITATION CODERS

32

Multipulse excitation

Open loop

Use a certain criteria to select only few pulses of the prediction error

Page 32: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5

33

Regular pulse excitation

Page 33: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5

34

Closed loop (Analysis by Synthesis)

Page 34: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5

GSM 6.10 (1988)

35

Regular pulse excited Long Term prediction (RPE-LTP)

Low computational cost

High quality reproduction

Robustness against channel errors

Coding efficiency

8 reflection coefficients

One LPC vector every 160 samples (20ms)

Selects one of 4 subsampled error sequences at each

subframe (40 samples)

Page 35: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5

GSM 6.10 (1988)

36

Page 36: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5

Code Excited Linear Prediction (CELP)

37

Excitation codebook can be fixed/adaptive ,

deterministic/random

No strict (Voiced/unvoiced) classification

Page 37: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5

CELP

38

Analysis by synthesis

Page 38: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5

39

Page 39: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5

CELP

40

Advantages?

Disadvantages?

Page 40: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5

G.728 (LD-CELP)

41

20 samples frames – Four 5 samples subframes

Pitch period: first coarse estimate, then a fine estimate

Compared to previous pitch to check for halving or doubling

Pitch: once per frame (obtained in decimated domain by a

factor of 4, then normal domain)

Bit rate: 16 kbps

Page 41: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5

Vector Sum Excited Linear Prediction

(VSELP)

42

A CELP coder with a particular codebook structure having

reduced computational cost.

IS54 (7.96kbps) , GSM 6.20 (5.6kbps) “Half Rate"

Basic idea:

Form the codebook from some basis functions

G.729 uses CELP

Page 42: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5

GSM EFR ACELP

43

A-CELP based

12.2kbps bit rate + 10.6kbps channel coding = 22.8kbps

ETSI AMR (Adaptive Multirate)

All coders based no ACELP

12.2 (EFR), 10.2, 7.95, 7.40, 6.70, 5.90, 5.15, and 4.75 kbps.

Page 43: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5

MELP (Mixed Excited Linear Prediction)

44

2.4 kbps

Page 44: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5

Fourier Magnitudes

45

Page 45: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5

Shaping filters

46

Page 46: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5

MELP bit allocation

47