Embedded Audio Coder

Post on 22-Jan-2016

88 views 0 download

description

Embedded Audio Coder. Jin Li. Outline. Introduction Embedded audio coder - Algorithm MLT with window switching Quantizer Entropy coder Bitstream assembly Modular software design Experimental results & demos Conclusion. Introduction. Introduction – Audio Compression. Audio Waveform. - PowerPoint PPT Presentation

Transcript of Embedded Audio Coder

175

Embedded Audio Coder

Jin Li

275

Outline

IntroductionEmbedded audio coder - Algorithm MLT with window switching Quantizer Entropy coder Bitstream assembly Modular software design

Experimental results amp demosConclusion

375

Introduction

475

Introduction ndash Audio Compression

Audio Waveform

Bitstream

575

EAC vs Other Compression

Existing audio compression schemes MP3 AAC MPEG4 audio WMA Real Audio hellip

Why research for a new audio codec

675

Media vs File Compression

File compression Every bit is important has to be compressed

losslessly

Media compression Exact bitvalue is not important distortion is

tolerable Amount of media is huge high compression ratio is

required Media needs adaptation

775

Key Features of EAC

Not only good compression performance

But also flexible bitstream syntax The compressed bitstream may be manipulated for

Different bitrate Different of audio channels Different audio sampling rate

Versatile Lossless Low delay Streamingstorage application

875

EAC Encoder

Encoder

Master Bitstream

Companion File

975

Parser

Except header application bitstream is a subset of the master bitstream (parsing is fast)

May be changed according to the required bitrate of audio channels and audio sampling rate

Parser

Master Bitstream

Companion File

Application Bitstream

1075

EAC Decoder

Encoder

Bitstream

Speaker (Direct Sound)

wav file

1175

Embedded Audio Coder- Algorithm Description

1275

Frame Work - Encoder

Transform Entropy coder

BitstreamAssembly

TransformEntropy

coderBitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

1375

Audio Transform

Input audio sample

Output transform coefficient

Goal convert audio from space domain to frequency domain Compact energy Better match with psychoacoustic

characteristics Enable audio sampling rate change

1475

Lossy vs Lossless Mode

MLT(SW)Audio

Quantization

Lossy mode

Reversible MLT(SW)

Audio

Lossless mode

1575

Lossy (Float) Pass

1675

MLT - Modulated Lapped Transforms

0 100 200 300 400 500 600 700 800 900 1000-1

-08

-06

-04

-02

0

02

04

06

08

1

Spatial Response

0 01 02 03 04 05 06 07 08 09 1-100

-80

-60

-40

-20

0

20

40

Frequency (pi)G

ain

(dB

)

Frequency Domain

1775

MLT with Window Switching

Features Basic window size 2048 Short window size 256 Switching criterion A frame (2048 samples) is switched to short window if and

only if Energy is bigger than a certain threshold Energy within the 8 subframes (256 samples) differs more than Ta

There are at least two neighbor subframes where the energy of the former subframe is greater than the latter subframe by Tb

1875

Band Separation

Audio (441kHz sampling)

MLT with window switching

Band separation0

1975

Synthesis (Half Sampling)

Audio (2205kHz sampling)

MLT with window switching

Band separation

0

2075

Synthesis (Quarter Sampling)

Audio (11025kHz sampling)

MLT with window switching

Band separation

0

2175

Quantizer

Input coefficient

Output quantized coefficient

Goal convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding

2275

Quantizer

Scalar quantizer with a deadzone

s

snmsnm

1

0n][m

][][

Quantized Magnitude Sign

0

2375

Lossless (Integer) Pass

2475

Key to Achieve Lossless

Break the MLT into small steps

Make every step reversible

Definition of reversible transform Integer input integer output The transform should have a determinant of 1

(donot expand data volume)

2575

MLT Framework

Pre-R

otate

Com

plex FF

T

Post R

otation

DCT IV

Window

Lapped Transform

Pre-R

otate-l

Com

plex FF

T-l

Post R

otation-l

Inv Window

-l

Forward MLT

Inverse MLT

2675

Window Operation

x(n)x(-n-1)

N

n

4

)21(

4

Complex Rotate

2775

Pre-Rotation

Complex Rotate ndash32 xw(0)

xw(1)

xw(2)

xw(3)

xw(4)

xw(5)

xw(6)

xw(7)

Complex Rotate ndash532

Complex Rotate ndash932

Complex Rotate ndash1332

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2875

FFT (4 Point Complex)

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

xc(0)

xc(1)

xc(2)

xc(3)

-

- e-j2

-

-

yc(0)

yc(1)

yc(2)

yc(3)

yp(0)

yp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2975

Post-Rotation

Conjugate Rotate ndash0y(0)

y(1)

y(2)

y(3)

y(4)

y(5)

y(6)

y(7)

Conjugate Rotate ndash8

Conjugate Rotate ndash28

Conjugate Rotate ndash38

yp(0)

yp(1)

yp(2)

yp(3)

yp(4)

yp(5)

yp(6)

yp(7)

3075

Reversible MLT

Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation

3175

Reversible Unit Transform

b

a

b

a

11

11

2

1

2

1

0

actb

tcba

bcat

21

21

1

20

c

cc

3275

Entropy Coder

Input quantized coefficients

Output embedded coded bitstream with R-D

performance curve

Goal Compression Embedded bitstream for future manipulation

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

275

Outline

IntroductionEmbedded audio coder - Algorithm MLT with window switching Quantizer Entropy coder Bitstream assembly Modular software design

Experimental results amp demosConclusion

375

Introduction

475

Introduction ndash Audio Compression

Audio Waveform

Bitstream

575

EAC vs Other Compression

Existing audio compression schemes MP3 AAC MPEG4 audio WMA Real Audio hellip

Why research for a new audio codec

675

Media vs File Compression

File compression Every bit is important has to be compressed

losslessly

Media compression Exact bitvalue is not important distortion is

tolerable Amount of media is huge high compression ratio is

required Media needs adaptation

775

Key Features of EAC

Not only good compression performance

But also flexible bitstream syntax The compressed bitstream may be manipulated for

Different bitrate Different of audio channels Different audio sampling rate

Versatile Lossless Low delay Streamingstorage application

875

EAC Encoder

Encoder

Master Bitstream

Companion File

975

Parser

Except header application bitstream is a subset of the master bitstream (parsing is fast)

May be changed according to the required bitrate of audio channels and audio sampling rate

Parser

Master Bitstream

Companion File

Application Bitstream

1075

EAC Decoder

Encoder

Bitstream

Speaker (Direct Sound)

wav file

1175

Embedded Audio Coder- Algorithm Description

1275

Frame Work - Encoder

Transform Entropy coder

BitstreamAssembly

TransformEntropy

coderBitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

1375

Audio Transform

Input audio sample

Output transform coefficient

Goal convert audio from space domain to frequency domain Compact energy Better match with psychoacoustic

characteristics Enable audio sampling rate change

1475

Lossy vs Lossless Mode

MLT(SW)Audio

Quantization

Lossy mode

Reversible MLT(SW)

Audio

Lossless mode

1575

Lossy (Float) Pass

1675

MLT - Modulated Lapped Transforms

0 100 200 300 400 500 600 700 800 900 1000-1

-08

-06

-04

-02

0

02

04

06

08

1

Spatial Response

0 01 02 03 04 05 06 07 08 09 1-100

-80

-60

-40

-20

0

20

40

Frequency (pi)G

ain

(dB

)

Frequency Domain

1775

MLT with Window Switching

Features Basic window size 2048 Short window size 256 Switching criterion A frame (2048 samples) is switched to short window if and

only if Energy is bigger than a certain threshold Energy within the 8 subframes (256 samples) differs more than Ta

There are at least two neighbor subframes where the energy of the former subframe is greater than the latter subframe by Tb

1875

Band Separation

Audio (441kHz sampling)

MLT with window switching

Band separation0

1975

Synthesis (Half Sampling)

Audio (2205kHz sampling)

MLT with window switching

Band separation

0

2075

Synthesis (Quarter Sampling)

Audio (11025kHz sampling)

MLT with window switching

Band separation

0

2175

Quantizer

Input coefficient

Output quantized coefficient

Goal convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding

2275

Quantizer

Scalar quantizer with a deadzone

s

snmsnm

1

0n][m

][][

Quantized Magnitude Sign

0

2375

Lossless (Integer) Pass

2475

Key to Achieve Lossless

Break the MLT into small steps

Make every step reversible

Definition of reversible transform Integer input integer output The transform should have a determinant of 1

(donot expand data volume)

2575

MLT Framework

Pre-R

otate

Com

plex FF

T

Post R

otation

DCT IV

Window

Lapped Transform

Pre-R

otate-l

Com

plex FF

T-l

Post R

otation-l

Inv Window

-l

Forward MLT

Inverse MLT

2675

Window Operation

x(n)x(-n-1)

N

n

4

)21(

4

Complex Rotate

2775

Pre-Rotation

Complex Rotate ndash32 xw(0)

xw(1)

xw(2)

xw(3)

xw(4)

xw(5)

xw(6)

xw(7)

Complex Rotate ndash532

Complex Rotate ndash932

Complex Rotate ndash1332

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2875

FFT (4 Point Complex)

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

xc(0)

xc(1)

xc(2)

xc(3)

-

- e-j2

-

-

yc(0)

yc(1)

yc(2)

yc(3)

yp(0)

yp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2975

Post-Rotation

Conjugate Rotate ndash0y(0)

y(1)

y(2)

y(3)

y(4)

y(5)

y(6)

y(7)

Conjugate Rotate ndash8

Conjugate Rotate ndash28

Conjugate Rotate ndash38

yp(0)

yp(1)

yp(2)

yp(3)

yp(4)

yp(5)

yp(6)

yp(7)

3075

Reversible MLT

Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation

3175

Reversible Unit Transform

b

a

b

a

11

11

2

1

2

1

0

actb

tcba

bcat

21

21

1

20

c

cc

3275

Entropy Coder

Input quantized coefficients

Output embedded coded bitstream with R-D

performance curve

Goal Compression Embedded bitstream for future manipulation

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

375

Introduction

475

Introduction ndash Audio Compression

Audio Waveform

Bitstream

575

EAC vs Other Compression

Existing audio compression schemes MP3 AAC MPEG4 audio WMA Real Audio hellip

Why research for a new audio codec

675

Media vs File Compression

File compression Every bit is important has to be compressed

losslessly

Media compression Exact bitvalue is not important distortion is

tolerable Amount of media is huge high compression ratio is

required Media needs adaptation

775

Key Features of EAC

Not only good compression performance

But also flexible bitstream syntax The compressed bitstream may be manipulated for

Different bitrate Different of audio channels Different audio sampling rate

Versatile Lossless Low delay Streamingstorage application

875

EAC Encoder

Encoder

Master Bitstream

Companion File

975

Parser

Except header application bitstream is a subset of the master bitstream (parsing is fast)

May be changed according to the required bitrate of audio channels and audio sampling rate

Parser

Master Bitstream

Companion File

Application Bitstream

1075

EAC Decoder

Encoder

Bitstream

Speaker (Direct Sound)

wav file

1175

Embedded Audio Coder- Algorithm Description

1275

Frame Work - Encoder

Transform Entropy coder

BitstreamAssembly

TransformEntropy

coderBitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

1375

Audio Transform

Input audio sample

Output transform coefficient

Goal convert audio from space domain to frequency domain Compact energy Better match with psychoacoustic

characteristics Enable audio sampling rate change

1475

Lossy vs Lossless Mode

MLT(SW)Audio

Quantization

Lossy mode

Reversible MLT(SW)

Audio

Lossless mode

1575

Lossy (Float) Pass

1675

MLT - Modulated Lapped Transforms

0 100 200 300 400 500 600 700 800 900 1000-1

-08

-06

-04

-02

0

02

04

06

08

1

Spatial Response

0 01 02 03 04 05 06 07 08 09 1-100

-80

-60

-40

-20

0

20

40

Frequency (pi)G

ain

(dB

)

Frequency Domain

1775

MLT with Window Switching

Features Basic window size 2048 Short window size 256 Switching criterion A frame (2048 samples) is switched to short window if and

only if Energy is bigger than a certain threshold Energy within the 8 subframes (256 samples) differs more than Ta

There are at least two neighbor subframes where the energy of the former subframe is greater than the latter subframe by Tb

1875

Band Separation

Audio (441kHz sampling)

MLT with window switching

Band separation0

1975

Synthesis (Half Sampling)

Audio (2205kHz sampling)

MLT with window switching

Band separation

0

2075

Synthesis (Quarter Sampling)

Audio (11025kHz sampling)

MLT with window switching

Band separation

0

2175

Quantizer

Input coefficient

Output quantized coefficient

Goal convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding

2275

Quantizer

Scalar quantizer with a deadzone

s

snmsnm

1

0n][m

][][

Quantized Magnitude Sign

0

2375

Lossless (Integer) Pass

2475

Key to Achieve Lossless

Break the MLT into small steps

Make every step reversible

Definition of reversible transform Integer input integer output The transform should have a determinant of 1

(donot expand data volume)

2575

MLT Framework

Pre-R

otate

Com

plex FF

T

Post R

otation

DCT IV

Window

Lapped Transform

Pre-R

otate-l

Com

plex FF

T-l

Post R

otation-l

Inv Window

-l

Forward MLT

Inverse MLT

2675

Window Operation

x(n)x(-n-1)

N

n

4

)21(

4

Complex Rotate

2775

Pre-Rotation

Complex Rotate ndash32 xw(0)

xw(1)

xw(2)

xw(3)

xw(4)

xw(5)

xw(6)

xw(7)

Complex Rotate ndash532

Complex Rotate ndash932

Complex Rotate ndash1332

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2875

FFT (4 Point Complex)

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

xc(0)

xc(1)

xc(2)

xc(3)

-

- e-j2

-

-

yc(0)

yc(1)

yc(2)

yc(3)

yp(0)

yp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2975

Post-Rotation

Conjugate Rotate ndash0y(0)

y(1)

y(2)

y(3)

y(4)

y(5)

y(6)

y(7)

Conjugate Rotate ndash8

Conjugate Rotate ndash28

Conjugate Rotate ndash38

yp(0)

yp(1)

yp(2)

yp(3)

yp(4)

yp(5)

yp(6)

yp(7)

3075

Reversible MLT

Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation

3175

Reversible Unit Transform

b

a

b

a

11

11

2

1

2

1

0

actb

tcba

bcat

21

21

1

20

c

cc

3275

Entropy Coder

Input quantized coefficients

Output embedded coded bitstream with R-D

performance curve

Goal Compression Embedded bitstream for future manipulation

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

475

Introduction ndash Audio Compression

Audio Waveform

Bitstream

575

EAC vs Other Compression

Existing audio compression schemes MP3 AAC MPEG4 audio WMA Real Audio hellip

Why research for a new audio codec

675

Media vs File Compression

File compression Every bit is important has to be compressed

losslessly

Media compression Exact bitvalue is not important distortion is

tolerable Amount of media is huge high compression ratio is

required Media needs adaptation

775

Key Features of EAC

Not only good compression performance

But also flexible bitstream syntax The compressed bitstream may be manipulated for

Different bitrate Different of audio channels Different audio sampling rate

Versatile Lossless Low delay Streamingstorage application

875

EAC Encoder

Encoder

Master Bitstream

Companion File

975

Parser

Except header application bitstream is a subset of the master bitstream (parsing is fast)

May be changed according to the required bitrate of audio channels and audio sampling rate

Parser

Master Bitstream

Companion File

Application Bitstream

1075

EAC Decoder

Encoder

Bitstream

Speaker (Direct Sound)

wav file

1175

Embedded Audio Coder- Algorithm Description

1275

Frame Work - Encoder

Transform Entropy coder

BitstreamAssembly

TransformEntropy

coderBitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

1375

Audio Transform

Input audio sample

Output transform coefficient

Goal convert audio from space domain to frequency domain Compact energy Better match with psychoacoustic

characteristics Enable audio sampling rate change

1475

Lossy vs Lossless Mode

MLT(SW)Audio

Quantization

Lossy mode

Reversible MLT(SW)

Audio

Lossless mode

1575

Lossy (Float) Pass

1675

MLT - Modulated Lapped Transforms

0 100 200 300 400 500 600 700 800 900 1000-1

-08

-06

-04

-02

0

02

04

06

08

1

Spatial Response

0 01 02 03 04 05 06 07 08 09 1-100

-80

-60

-40

-20

0

20

40

Frequency (pi)G

ain

(dB

)

Frequency Domain

1775

MLT with Window Switching

Features Basic window size 2048 Short window size 256 Switching criterion A frame (2048 samples) is switched to short window if and

only if Energy is bigger than a certain threshold Energy within the 8 subframes (256 samples) differs more than Ta

There are at least two neighbor subframes where the energy of the former subframe is greater than the latter subframe by Tb

1875

Band Separation

Audio (441kHz sampling)

MLT with window switching

Band separation0

1975

Synthesis (Half Sampling)

Audio (2205kHz sampling)

MLT with window switching

Band separation

0

2075

Synthesis (Quarter Sampling)

Audio (11025kHz sampling)

MLT with window switching

Band separation

0

2175

Quantizer

Input coefficient

Output quantized coefficient

Goal convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding

2275

Quantizer

Scalar quantizer with a deadzone

s

snmsnm

1

0n][m

][][

Quantized Magnitude Sign

0

2375

Lossless (Integer) Pass

2475

Key to Achieve Lossless

Break the MLT into small steps

Make every step reversible

Definition of reversible transform Integer input integer output The transform should have a determinant of 1

(donot expand data volume)

2575

MLT Framework

Pre-R

otate

Com

plex FF

T

Post R

otation

DCT IV

Window

Lapped Transform

Pre-R

otate-l

Com

plex FF

T-l

Post R

otation-l

Inv Window

-l

Forward MLT

Inverse MLT

2675

Window Operation

x(n)x(-n-1)

N

n

4

)21(

4

Complex Rotate

2775

Pre-Rotation

Complex Rotate ndash32 xw(0)

xw(1)

xw(2)

xw(3)

xw(4)

xw(5)

xw(6)

xw(7)

Complex Rotate ndash532

Complex Rotate ndash932

Complex Rotate ndash1332

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2875

FFT (4 Point Complex)

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

xc(0)

xc(1)

xc(2)

xc(3)

-

- e-j2

-

-

yc(0)

yc(1)

yc(2)

yc(3)

yp(0)

yp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2975

Post-Rotation

Conjugate Rotate ndash0y(0)

y(1)

y(2)

y(3)

y(4)

y(5)

y(6)

y(7)

Conjugate Rotate ndash8

Conjugate Rotate ndash28

Conjugate Rotate ndash38

yp(0)

yp(1)

yp(2)

yp(3)

yp(4)

yp(5)

yp(6)

yp(7)

3075

Reversible MLT

Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation

3175

Reversible Unit Transform

b

a

b

a

11

11

2

1

2

1

0

actb

tcba

bcat

21

21

1

20

c

cc

3275

Entropy Coder

Input quantized coefficients

Output embedded coded bitstream with R-D

performance curve

Goal Compression Embedded bitstream for future manipulation

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

575

EAC vs Other Compression

Existing audio compression schemes MP3 AAC MPEG4 audio WMA Real Audio hellip

Why research for a new audio codec

675

Media vs File Compression

File compression Every bit is important has to be compressed

losslessly

Media compression Exact bitvalue is not important distortion is

tolerable Amount of media is huge high compression ratio is

required Media needs adaptation

775

Key Features of EAC

Not only good compression performance

But also flexible bitstream syntax The compressed bitstream may be manipulated for

Different bitrate Different of audio channels Different audio sampling rate

Versatile Lossless Low delay Streamingstorage application

875

EAC Encoder

Encoder

Master Bitstream

Companion File

975

Parser

Except header application bitstream is a subset of the master bitstream (parsing is fast)

May be changed according to the required bitrate of audio channels and audio sampling rate

Parser

Master Bitstream

Companion File

Application Bitstream

1075

EAC Decoder

Encoder

Bitstream

Speaker (Direct Sound)

wav file

1175

Embedded Audio Coder- Algorithm Description

1275

Frame Work - Encoder

Transform Entropy coder

BitstreamAssembly

TransformEntropy

coderBitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

1375

Audio Transform

Input audio sample

Output transform coefficient

Goal convert audio from space domain to frequency domain Compact energy Better match with psychoacoustic

characteristics Enable audio sampling rate change

1475

Lossy vs Lossless Mode

MLT(SW)Audio

Quantization

Lossy mode

Reversible MLT(SW)

Audio

Lossless mode

1575

Lossy (Float) Pass

1675

MLT - Modulated Lapped Transforms

0 100 200 300 400 500 600 700 800 900 1000-1

-08

-06

-04

-02

0

02

04

06

08

1

Spatial Response

0 01 02 03 04 05 06 07 08 09 1-100

-80

-60

-40

-20

0

20

40

Frequency (pi)G

ain

(dB

)

Frequency Domain

1775

MLT with Window Switching

Features Basic window size 2048 Short window size 256 Switching criterion A frame (2048 samples) is switched to short window if and

only if Energy is bigger than a certain threshold Energy within the 8 subframes (256 samples) differs more than Ta

There are at least two neighbor subframes where the energy of the former subframe is greater than the latter subframe by Tb

1875

Band Separation

Audio (441kHz sampling)

MLT with window switching

Band separation0

1975

Synthesis (Half Sampling)

Audio (2205kHz sampling)

MLT with window switching

Band separation

0

2075

Synthesis (Quarter Sampling)

Audio (11025kHz sampling)

MLT with window switching

Band separation

0

2175

Quantizer

Input coefficient

Output quantized coefficient

Goal convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding

2275

Quantizer

Scalar quantizer with a deadzone

s

snmsnm

1

0n][m

][][

Quantized Magnitude Sign

0

2375

Lossless (Integer) Pass

2475

Key to Achieve Lossless

Break the MLT into small steps

Make every step reversible

Definition of reversible transform Integer input integer output The transform should have a determinant of 1

(donot expand data volume)

2575

MLT Framework

Pre-R

otate

Com

plex FF

T

Post R

otation

DCT IV

Window

Lapped Transform

Pre-R

otate-l

Com

plex FF

T-l

Post R

otation-l

Inv Window

-l

Forward MLT

Inverse MLT

2675

Window Operation

x(n)x(-n-1)

N

n

4

)21(

4

Complex Rotate

2775

Pre-Rotation

Complex Rotate ndash32 xw(0)

xw(1)

xw(2)

xw(3)

xw(4)

xw(5)

xw(6)

xw(7)

Complex Rotate ndash532

Complex Rotate ndash932

Complex Rotate ndash1332

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2875

FFT (4 Point Complex)

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

xc(0)

xc(1)

xc(2)

xc(3)

-

- e-j2

-

-

yc(0)

yc(1)

yc(2)

yc(3)

yp(0)

yp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2975

Post-Rotation

Conjugate Rotate ndash0y(0)

y(1)

y(2)

y(3)

y(4)

y(5)

y(6)

y(7)

Conjugate Rotate ndash8

Conjugate Rotate ndash28

Conjugate Rotate ndash38

yp(0)

yp(1)

yp(2)

yp(3)

yp(4)

yp(5)

yp(6)

yp(7)

3075

Reversible MLT

Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation

3175

Reversible Unit Transform

b

a

b

a

11

11

2

1

2

1

0

actb

tcba

bcat

21

21

1

20

c

cc

3275

Entropy Coder

Input quantized coefficients

Output embedded coded bitstream with R-D

performance curve

Goal Compression Embedded bitstream for future manipulation

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

675

Media vs File Compression

File compression Every bit is important has to be compressed

losslessly

Media compression Exact bitvalue is not important distortion is

tolerable Amount of media is huge high compression ratio is

required Media needs adaptation

775

Key Features of EAC

Not only good compression performance

But also flexible bitstream syntax The compressed bitstream may be manipulated for

Different bitrate Different of audio channels Different audio sampling rate

Versatile Lossless Low delay Streamingstorage application

875

EAC Encoder

Encoder

Master Bitstream

Companion File

975

Parser

Except header application bitstream is a subset of the master bitstream (parsing is fast)

May be changed according to the required bitrate of audio channels and audio sampling rate

Parser

Master Bitstream

Companion File

Application Bitstream

1075

EAC Decoder

Encoder

Bitstream

Speaker (Direct Sound)

wav file

1175

Embedded Audio Coder- Algorithm Description

1275

Frame Work - Encoder

Transform Entropy coder

BitstreamAssembly

TransformEntropy

coderBitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

1375

Audio Transform

Input audio sample

Output transform coefficient

Goal convert audio from space domain to frequency domain Compact energy Better match with psychoacoustic

characteristics Enable audio sampling rate change

1475

Lossy vs Lossless Mode

MLT(SW)Audio

Quantization

Lossy mode

Reversible MLT(SW)

Audio

Lossless mode

1575

Lossy (Float) Pass

1675

MLT - Modulated Lapped Transforms

0 100 200 300 400 500 600 700 800 900 1000-1

-08

-06

-04

-02

0

02

04

06

08

1

Spatial Response

0 01 02 03 04 05 06 07 08 09 1-100

-80

-60

-40

-20

0

20

40

Frequency (pi)G

ain

(dB

)

Frequency Domain

1775

MLT with Window Switching

Features Basic window size 2048 Short window size 256 Switching criterion A frame (2048 samples) is switched to short window if and

only if Energy is bigger than a certain threshold Energy within the 8 subframes (256 samples) differs more than Ta

There are at least two neighbor subframes where the energy of the former subframe is greater than the latter subframe by Tb

1875

Band Separation

Audio (441kHz sampling)

MLT with window switching

Band separation0

1975

Synthesis (Half Sampling)

Audio (2205kHz sampling)

MLT with window switching

Band separation

0

2075

Synthesis (Quarter Sampling)

Audio (11025kHz sampling)

MLT with window switching

Band separation

0

2175

Quantizer

Input coefficient

Output quantized coefficient

Goal convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding

2275

Quantizer

Scalar quantizer with a deadzone

s

snmsnm

1

0n][m

][][

Quantized Magnitude Sign

0

2375

Lossless (Integer) Pass

2475

Key to Achieve Lossless

Break the MLT into small steps

Make every step reversible

Definition of reversible transform Integer input integer output The transform should have a determinant of 1

(donot expand data volume)

2575

MLT Framework

Pre-R

otate

Com

plex FF

T

Post R

otation

DCT IV

Window

Lapped Transform

Pre-R

otate-l

Com

plex FF

T-l

Post R

otation-l

Inv Window

-l

Forward MLT

Inverse MLT

2675

Window Operation

x(n)x(-n-1)

N

n

4

)21(

4

Complex Rotate

2775

Pre-Rotation

Complex Rotate ndash32 xw(0)

xw(1)

xw(2)

xw(3)

xw(4)

xw(5)

xw(6)

xw(7)

Complex Rotate ndash532

Complex Rotate ndash932

Complex Rotate ndash1332

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2875

FFT (4 Point Complex)

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

xc(0)

xc(1)

xc(2)

xc(3)

-

- e-j2

-

-

yc(0)

yc(1)

yc(2)

yc(3)

yp(0)

yp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2975

Post-Rotation

Conjugate Rotate ndash0y(0)

y(1)

y(2)

y(3)

y(4)

y(5)

y(6)

y(7)

Conjugate Rotate ndash8

Conjugate Rotate ndash28

Conjugate Rotate ndash38

yp(0)

yp(1)

yp(2)

yp(3)

yp(4)

yp(5)

yp(6)

yp(7)

3075

Reversible MLT

Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation

3175

Reversible Unit Transform

b

a

b

a

11

11

2

1

2

1

0

actb

tcba

bcat

21

21

1

20

c

cc

3275

Entropy Coder

Input quantized coefficients

Output embedded coded bitstream with R-D

performance curve

Goal Compression Embedded bitstream for future manipulation

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

775

Key Features of EAC

Not only good compression performance

But also flexible bitstream syntax The compressed bitstream may be manipulated for

Different bitrate Different of audio channels Different audio sampling rate

Versatile Lossless Low delay Streamingstorage application

875

EAC Encoder

Encoder

Master Bitstream

Companion File

975

Parser

Except header application bitstream is a subset of the master bitstream (parsing is fast)

May be changed according to the required bitrate of audio channels and audio sampling rate

Parser

Master Bitstream

Companion File

Application Bitstream

1075

EAC Decoder

Encoder

Bitstream

Speaker (Direct Sound)

wav file

1175

Embedded Audio Coder- Algorithm Description

1275

Frame Work - Encoder

Transform Entropy coder

BitstreamAssembly

TransformEntropy

coderBitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

1375

Audio Transform

Input audio sample

Output transform coefficient

Goal convert audio from space domain to frequency domain Compact energy Better match with psychoacoustic

characteristics Enable audio sampling rate change

1475

Lossy vs Lossless Mode

MLT(SW)Audio

Quantization

Lossy mode

Reversible MLT(SW)

Audio

Lossless mode

1575

Lossy (Float) Pass

1675

MLT - Modulated Lapped Transforms

0 100 200 300 400 500 600 700 800 900 1000-1

-08

-06

-04

-02

0

02

04

06

08

1

Spatial Response

0 01 02 03 04 05 06 07 08 09 1-100

-80

-60

-40

-20

0

20

40

Frequency (pi)G

ain

(dB

)

Frequency Domain

1775

MLT with Window Switching

Features Basic window size 2048 Short window size 256 Switching criterion A frame (2048 samples) is switched to short window if and

only if Energy is bigger than a certain threshold Energy within the 8 subframes (256 samples) differs more than Ta

There are at least two neighbor subframes where the energy of the former subframe is greater than the latter subframe by Tb

1875

Band Separation

Audio (441kHz sampling)

MLT with window switching

Band separation0

1975

Synthesis (Half Sampling)

Audio (2205kHz sampling)

MLT with window switching

Band separation

0

2075

Synthesis (Quarter Sampling)

Audio (11025kHz sampling)

MLT with window switching

Band separation

0

2175

Quantizer

Input coefficient

Output quantized coefficient

Goal convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding

2275

Quantizer

Scalar quantizer with a deadzone

s

snmsnm

1

0n][m

][][

Quantized Magnitude Sign

0

2375

Lossless (Integer) Pass

2475

Key to Achieve Lossless

Break the MLT into small steps

Make every step reversible

Definition of reversible transform Integer input integer output The transform should have a determinant of 1

(donot expand data volume)

2575

MLT Framework

Pre-R

otate

Com

plex FF

T

Post R

otation

DCT IV

Window

Lapped Transform

Pre-R

otate-l

Com

plex FF

T-l

Post R

otation-l

Inv Window

-l

Forward MLT

Inverse MLT

2675

Window Operation

x(n)x(-n-1)

N

n

4

)21(

4

Complex Rotate

2775

Pre-Rotation

Complex Rotate ndash32 xw(0)

xw(1)

xw(2)

xw(3)

xw(4)

xw(5)

xw(6)

xw(7)

Complex Rotate ndash532

Complex Rotate ndash932

Complex Rotate ndash1332

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2875

FFT (4 Point Complex)

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

xc(0)

xc(1)

xc(2)

xc(3)

-

- e-j2

-

-

yc(0)

yc(1)

yc(2)

yc(3)

yp(0)

yp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2975

Post-Rotation

Conjugate Rotate ndash0y(0)

y(1)

y(2)

y(3)

y(4)

y(5)

y(6)

y(7)

Conjugate Rotate ndash8

Conjugate Rotate ndash28

Conjugate Rotate ndash38

yp(0)

yp(1)

yp(2)

yp(3)

yp(4)

yp(5)

yp(6)

yp(7)

3075

Reversible MLT

Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation

3175

Reversible Unit Transform

b

a

b

a

11

11

2

1

2

1

0

actb

tcba

bcat

21

21

1

20

c

cc

3275

Entropy Coder

Input quantized coefficients

Output embedded coded bitstream with R-D

performance curve

Goal Compression Embedded bitstream for future manipulation

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

875

EAC Encoder

Encoder

Master Bitstream

Companion File

975

Parser

Except header application bitstream is a subset of the master bitstream (parsing is fast)

May be changed according to the required bitrate of audio channels and audio sampling rate

Parser

Master Bitstream

Companion File

Application Bitstream

1075

EAC Decoder

Encoder

Bitstream

Speaker (Direct Sound)

wav file

1175

Embedded Audio Coder- Algorithm Description

1275

Frame Work - Encoder

Transform Entropy coder

BitstreamAssembly

TransformEntropy

coderBitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

1375

Audio Transform

Input audio sample

Output transform coefficient

Goal convert audio from space domain to frequency domain Compact energy Better match with psychoacoustic

characteristics Enable audio sampling rate change

1475

Lossy vs Lossless Mode

MLT(SW)Audio

Quantization

Lossy mode

Reversible MLT(SW)

Audio

Lossless mode

1575

Lossy (Float) Pass

1675

MLT - Modulated Lapped Transforms

0 100 200 300 400 500 600 700 800 900 1000-1

-08

-06

-04

-02

0

02

04

06

08

1

Spatial Response

0 01 02 03 04 05 06 07 08 09 1-100

-80

-60

-40

-20

0

20

40

Frequency (pi)G

ain

(dB

)

Frequency Domain

1775

MLT with Window Switching

Features Basic window size 2048 Short window size 256 Switching criterion A frame (2048 samples) is switched to short window if and

only if Energy is bigger than a certain threshold Energy within the 8 subframes (256 samples) differs more than Ta

There are at least two neighbor subframes where the energy of the former subframe is greater than the latter subframe by Tb

1875

Band Separation

Audio (441kHz sampling)

MLT with window switching

Band separation0

1975

Synthesis (Half Sampling)

Audio (2205kHz sampling)

MLT with window switching

Band separation

0

2075

Synthesis (Quarter Sampling)

Audio (11025kHz sampling)

MLT with window switching

Band separation

0

2175

Quantizer

Input coefficient

Output quantized coefficient

Goal convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding

2275

Quantizer

Scalar quantizer with a deadzone

s

snmsnm

1

0n][m

][][

Quantized Magnitude Sign

0

2375

Lossless (Integer) Pass

2475

Key to Achieve Lossless

Break the MLT into small steps

Make every step reversible

Definition of reversible transform Integer input integer output The transform should have a determinant of 1

(donot expand data volume)

2575

MLT Framework

Pre-R

otate

Com

plex FF

T

Post R

otation

DCT IV

Window

Lapped Transform

Pre-R

otate-l

Com

plex FF

T-l

Post R

otation-l

Inv Window

-l

Forward MLT

Inverse MLT

2675

Window Operation

x(n)x(-n-1)

N

n

4

)21(

4

Complex Rotate

2775

Pre-Rotation

Complex Rotate ndash32 xw(0)

xw(1)

xw(2)

xw(3)

xw(4)

xw(5)

xw(6)

xw(7)

Complex Rotate ndash532

Complex Rotate ndash932

Complex Rotate ndash1332

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2875

FFT (4 Point Complex)

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

xc(0)

xc(1)

xc(2)

xc(3)

-

- e-j2

-

-

yc(0)

yc(1)

yc(2)

yc(3)

yp(0)

yp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2975

Post-Rotation

Conjugate Rotate ndash0y(0)

y(1)

y(2)

y(3)

y(4)

y(5)

y(6)

y(7)

Conjugate Rotate ndash8

Conjugate Rotate ndash28

Conjugate Rotate ndash38

yp(0)

yp(1)

yp(2)

yp(3)

yp(4)

yp(5)

yp(6)

yp(7)

3075

Reversible MLT

Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation

3175

Reversible Unit Transform

b

a

b

a

11

11

2

1

2

1

0

actb

tcba

bcat

21

21

1

20

c

cc

3275

Entropy Coder

Input quantized coefficients

Output embedded coded bitstream with R-D

performance curve

Goal Compression Embedded bitstream for future manipulation

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

975

Parser

Except header application bitstream is a subset of the master bitstream (parsing is fast)

May be changed according to the required bitrate of audio channels and audio sampling rate

Parser

Master Bitstream

Companion File

Application Bitstream

1075

EAC Decoder

Encoder

Bitstream

Speaker (Direct Sound)

wav file

1175

Embedded Audio Coder- Algorithm Description

1275

Frame Work - Encoder

Transform Entropy coder

BitstreamAssembly

TransformEntropy

coderBitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

1375

Audio Transform

Input audio sample

Output transform coefficient

Goal convert audio from space domain to frequency domain Compact energy Better match with psychoacoustic

characteristics Enable audio sampling rate change

1475

Lossy vs Lossless Mode

MLT(SW)Audio

Quantization

Lossy mode

Reversible MLT(SW)

Audio

Lossless mode

1575

Lossy (Float) Pass

1675

MLT - Modulated Lapped Transforms

0 100 200 300 400 500 600 700 800 900 1000-1

-08

-06

-04

-02

0

02

04

06

08

1

Spatial Response

0 01 02 03 04 05 06 07 08 09 1-100

-80

-60

-40

-20

0

20

40

Frequency (pi)G

ain

(dB

)

Frequency Domain

1775

MLT with Window Switching

Features Basic window size 2048 Short window size 256 Switching criterion A frame (2048 samples) is switched to short window if and

only if Energy is bigger than a certain threshold Energy within the 8 subframes (256 samples) differs more than Ta

There are at least two neighbor subframes where the energy of the former subframe is greater than the latter subframe by Tb

1875

Band Separation

Audio (441kHz sampling)

MLT with window switching

Band separation0

1975

Synthesis (Half Sampling)

Audio (2205kHz sampling)

MLT with window switching

Band separation

0

2075

Synthesis (Quarter Sampling)

Audio (11025kHz sampling)

MLT with window switching

Band separation

0

2175

Quantizer

Input coefficient

Output quantized coefficient

Goal convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding

2275

Quantizer

Scalar quantizer with a deadzone

s

snmsnm

1

0n][m

][][

Quantized Magnitude Sign

0

2375

Lossless (Integer) Pass

2475

Key to Achieve Lossless

Break the MLT into small steps

Make every step reversible

Definition of reversible transform Integer input integer output The transform should have a determinant of 1

(donot expand data volume)

2575

MLT Framework

Pre-R

otate

Com

plex FF

T

Post R

otation

DCT IV

Window

Lapped Transform

Pre-R

otate-l

Com

plex FF

T-l

Post R

otation-l

Inv Window

-l

Forward MLT

Inverse MLT

2675

Window Operation

x(n)x(-n-1)

N

n

4

)21(

4

Complex Rotate

2775

Pre-Rotation

Complex Rotate ndash32 xw(0)

xw(1)

xw(2)

xw(3)

xw(4)

xw(5)

xw(6)

xw(7)

Complex Rotate ndash532

Complex Rotate ndash932

Complex Rotate ndash1332

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2875

FFT (4 Point Complex)

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

xc(0)

xc(1)

xc(2)

xc(3)

-

- e-j2

-

-

yc(0)

yc(1)

yc(2)

yc(3)

yp(0)

yp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2975

Post-Rotation

Conjugate Rotate ndash0y(0)

y(1)

y(2)

y(3)

y(4)

y(5)

y(6)

y(7)

Conjugate Rotate ndash8

Conjugate Rotate ndash28

Conjugate Rotate ndash38

yp(0)

yp(1)

yp(2)

yp(3)

yp(4)

yp(5)

yp(6)

yp(7)

3075

Reversible MLT

Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation

3175

Reversible Unit Transform

b

a

b

a

11

11

2

1

2

1

0

actb

tcba

bcat

21

21

1

20

c

cc

3275

Entropy Coder

Input quantized coefficients

Output embedded coded bitstream with R-D

performance curve

Goal Compression Embedded bitstream for future manipulation

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

1075

EAC Decoder

Encoder

Bitstream

Speaker (Direct Sound)

wav file

1175

Embedded Audio Coder- Algorithm Description

1275

Frame Work - Encoder

Transform Entropy coder

BitstreamAssembly

TransformEntropy

coderBitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

1375

Audio Transform

Input audio sample

Output transform coefficient

Goal convert audio from space domain to frequency domain Compact energy Better match with psychoacoustic

characteristics Enable audio sampling rate change

1475

Lossy vs Lossless Mode

MLT(SW)Audio

Quantization

Lossy mode

Reversible MLT(SW)

Audio

Lossless mode

1575

Lossy (Float) Pass

1675

MLT - Modulated Lapped Transforms

0 100 200 300 400 500 600 700 800 900 1000-1

-08

-06

-04

-02

0

02

04

06

08

1

Spatial Response

0 01 02 03 04 05 06 07 08 09 1-100

-80

-60

-40

-20

0

20

40

Frequency (pi)G

ain

(dB

)

Frequency Domain

1775

MLT with Window Switching

Features Basic window size 2048 Short window size 256 Switching criterion A frame (2048 samples) is switched to short window if and

only if Energy is bigger than a certain threshold Energy within the 8 subframes (256 samples) differs more than Ta

There are at least two neighbor subframes where the energy of the former subframe is greater than the latter subframe by Tb

1875

Band Separation

Audio (441kHz sampling)

MLT with window switching

Band separation0

1975

Synthesis (Half Sampling)

Audio (2205kHz sampling)

MLT with window switching

Band separation

0

2075

Synthesis (Quarter Sampling)

Audio (11025kHz sampling)

MLT with window switching

Band separation

0

2175

Quantizer

Input coefficient

Output quantized coefficient

Goal convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding

2275

Quantizer

Scalar quantizer with a deadzone

s

snmsnm

1

0n][m

][][

Quantized Magnitude Sign

0

2375

Lossless (Integer) Pass

2475

Key to Achieve Lossless

Break the MLT into small steps

Make every step reversible

Definition of reversible transform Integer input integer output The transform should have a determinant of 1

(donot expand data volume)

2575

MLT Framework

Pre-R

otate

Com

plex FF

T

Post R

otation

DCT IV

Window

Lapped Transform

Pre-R

otate-l

Com

plex FF

T-l

Post R

otation-l

Inv Window

-l

Forward MLT

Inverse MLT

2675

Window Operation

x(n)x(-n-1)

N

n

4

)21(

4

Complex Rotate

2775

Pre-Rotation

Complex Rotate ndash32 xw(0)

xw(1)

xw(2)

xw(3)

xw(4)

xw(5)

xw(6)

xw(7)

Complex Rotate ndash532

Complex Rotate ndash932

Complex Rotate ndash1332

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2875

FFT (4 Point Complex)

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

xc(0)

xc(1)

xc(2)

xc(3)

-

- e-j2

-

-

yc(0)

yc(1)

yc(2)

yc(3)

yp(0)

yp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2975

Post-Rotation

Conjugate Rotate ndash0y(0)

y(1)

y(2)

y(3)

y(4)

y(5)

y(6)

y(7)

Conjugate Rotate ndash8

Conjugate Rotate ndash28

Conjugate Rotate ndash38

yp(0)

yp(1)

yp(2)

yp(3)

yp(4)

yp(5)

yp(6)

yp(7)

3075

Reversible MLT

Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation

3175

Reversible Unit Transform

b

a

b

a

11

11

2

1

2

1

0

actb

tcba

bcat

21

21

1

20

c

cc

3275

Entropy Coder

Input quantized coefficients

Output embedded coded bitstream with R-D

performance curve

Goal Compression Embedded bitstream for future manipulation

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

1175

Embedded Audio Coder- Algorithm Description

1275

Frame Work - Encoder

Transform Entropy coder

BitstreamAssembly

TransformEntropy

coderBitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

1375

Audio Transform

Input audio sample

Output transform coefficient

Goal convert audio from space domain to frequency domain Compact energy Better match with psychoacoustic

characteristics Enable audio sampling rate change

1475

Lossy vs Lossless Mode

MLT(SW)Audio

Quantization

Lossy mode

Reversible MLT(SW)

Audio

Lossless mode

1575

Lossy (Float) Pass

1675

MLT - Modulated Lapped Transforms

0 100 200 300 400 500 600 700 800 900 1000-1

-08

-06

-04

-02

0

02

04

06

08

1

Spatial Response

0 01 02 03 04 05 06 07 08 09 1-100

-80

-60

-40

-20

0

20

40

Frequency (pi)G

ain

(dB

)

Frequency Domain

1775

MLT with Window Switching

Features Basic window size 2048 Short window size 256 Switching criterion A frame (2048 samples) is switched to short window if and

only if Energy is bigger than a certain threshold Energy within the 8 subframes (256 samples) differs more than Ta

There are at least two neighbor subframes where the energy of the former subframe is greater than the latter subframe by Tb

1875

Band Separation

Audio (441kHz sampling)

MLT with window switching

Band separation0

1975

Synthesis (Half Sampling)

Audio (2205kHz sampling)

MLT with window switching

Band separation

0

2075

Synthesis (Quarter Sampling)

Audio (11025kHz sampling)

MLT with window switching

Band separation

0

2175

Quantizer

Input coefficient

Output quantized coefficient

Goal convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding

2275

Quantizer

Scalar quantizer with a deadzone

s

snmsnm

1

0n][m

][][

Quantized Magnitude Sign

0

2375

Lossless (Integer) Pass

2475

Key to Achieve Lossless

Break the MLT into small steps

Make every step reversible

Definition of reversible transform Integer input integer output The transform should have a determinant of 1

(donot expand data volume)

2575

MLT Framework

Pre-R

otate

Com

plex FF

T

Post R

otation

DCT IV

Window

Lapped Transform

Pre-R

otate-l

Com

plex FF

T-l

Post R

otation-l

Inv Window

-l

Forward MLT

Inverse MLT

2675

Window Operation

x(n)x(-n-1)

N

n

4

)21(

4

Complex Rotate

2775

Pre-Rotation

Complex Rotate ndash32 xw(0)

xw(1)

xw(2)

xw(3)

xw(4)

xw(5)

xw(6)

xw(7)

Complex Rotate ndash532

Complex Rotate ndash932

Complex Rotate ndash1332

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2875

FFT (4 Point Complex)

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

xc(0)

xc(1)

xc(2)

xc(3)

-

- e-j2

-

-

yc(0)

yc(1)

yc(2)

yc(3)

yp(0)

yp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2975

Post-Rotation

Conjugate Rotate ndash0y(0)

y(1)

y(2)

y(3)

y(4)

y(5)

y(6)

y(7)

Conjugate Rotate ndash8

Conjugate Rotate ndash28

Conjugate Rotate ndash38

yp(0)

yp(1)

yp(2)

yp(3)

yp(4)

yp(5)

yp(6)

yp(7)

3075

Reversible MLT

Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation

3175

Reversible Unit Transform

b

a

b

a

11

11

2

1

2

1

0

actb

tcba

bcat

21

21

1

20

c

cc

3275

Entropy Coder

Input quantized coefficients

Output embedded coded bitstream with R-D

performance curve

Goal Compression Embedded bitstream for future manipulation

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

1275

Frame Work - Encoder

Transform Entropy coder

BitstreamAssembly

TransformEntropy

coderBitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

1375

Audio Transform

Input audio sample

Output transform coefficient

Goal convert audio from space domain to frequency domain Compact energy Better match with psychoacoustic

characteristics Enable audio sampling rate change

1475

Lossy vs Lossless Mode

MLT(SW)Audio

Quantization

Lossy mode

Reversible MLT(SW)

Audio

Lossless mode

1575

Lossy (Float) Pass

1675

MLT - Modulated Lapped Transforms

0 100 200 300 400 500 600 700 800 900 1000-1

-08

-06

-04

-02

0

02

04

06

08

1

Spatial Response

0 01 02 03 04 05 06 07 08 09 1-100

-80

-60

-40

-20

0

20

40

Frequency (pi)G

ain

(dB

)

Frequency Domain

1775

MLT with Window Switching

Features Basic window size 2048 Short window size 256 Switching criterion A frame (2048 samples) is switched to short window if and

only if Energy is bigger than a certain threshold Energy within the 8 subframes (256 samples) differs more than Ta

There are at least two neighbor subframes where the energy of the former subframe is greater than the latter subframe by Tb

1875

Band Separation

Audio (441kHz sampling)

MLT with window switching

Band separation0

1975

Synthesis (Half Sampling)

Audio (2205kHz sampling)

MLT with window switching

Band separation

0

2075

Synthesis (Quarter Sampling)

Audio (11025kHz sampling)

MLT with window switching

Band separation

0

2175

Quantizer

Input coefficient

Output quantized coefficient

Goal convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding

2275

Quantizer

Scalar quantizer with a deadzone

s

snmsnm

1

0n][m

][][

Quantized Magnitude Sign

0

2375

Lossless (Integer) Pass

2475

Key to Achieve Lossless

Break the MLT into small steps

Make every step reversible

Definition of reversible transform Integer input integer output The transform should have a determinant of 1

(donot expand data volume)

2575

MLT Framework

Pre-R

otate

Com

plex FF

T

Post R

otation

DCT IV

Window

Lapped Transform

Pre-R

otate-l

Com

plex FF

T-l

Post R

otation-l

Inv Window

-l

Forward MLT

Inverse MLT

2675

Window Operation

x(n)x(-n-1)

N

n

4

)21(

4

Complex Rotate

2775

Pre-Rotation

Complex Rotate ndash32 xw(0)

xw(1)

xw(2)

xw(3)

xw(4)

xw(5)

xw(6)

xw(7)

Complex Rotate ndash532

Complex Rotate ndash932

Complex Rotate ndash1332

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2875

FFT (4 Point Complex)

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

xc(0)

xc(1)

xc(2)

xc(3)

-

- e-j2

-

-

yc(0)

yc(1)

yc(2)

yc(3)

yp(0)

yp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2975

Post-Rotation

Conjugate Rotate ndash0y(0)

y(1)

y(2)

y(3)

y(4)

y(5)

y(6)

y(7)

Conjugate Rotate ndash8

Conjugate Rotate ndash28

Conjugate Rotate ndash38

yp(0)

yp(1)

yp(2)

yp(3)

yp(4)

yp(5)

yp(6)

yp(7)

3075

Reversible MLT

Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation

3175

Reversible Unit Transform

b

a

b

a

11

11

2

1

2

1

0

actb

tcba

bcat

21

21

1

20

c

cc

3275

Entropy Coder

Input quantized coefficients

Output embedded coded bitstream with R-D

performance curve

Goal Compression Embedded bitstream for future manipulation

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

1375

Audio Transform

Input audio sample

Output transform coefficient

Goal convert audio from space domain to frequency domain Compact energy Better match with psychoacoustic

characteristics Enable audio sampling rate change

1475

Lossy vs Lossless Mode

MLT(SW)Audio

Quantization

Lossy mode

Reversible MLT(SW)

Audio

Lossless mode

1575

Lossy (Float) Pass

1675

MLT - Modulated Lapped Transforms

0 100 200 300 400 500 600 700 800 900 1000-1

-08

-06

-04

-02

0

02

04

06

08

1

Spatial Response

0 01 02 03 04 05 06 07 08 09 1-100

-80

-60

-40

-20

0

20

40

Frequency (pi)G

ain

(dB

)

Frequency Domain

1775

MLT with Window Switching

Features Basic window size 2048 Short window size 256 Switching criterion A frame (2048 samples) is switched to short window if and

only if Energy is bigger than a certain threshold Energy within the 8 subframes (256 samples) differs more than Ta

There are at least two neighbor subframes where the energy of the former subframe is greater than the latter subframe by Tb

1875

Band Separation

Audio (441kHz sampling)

MLT with window switching

Band separation0

1975

Synthesis (Half Sampling)

Audio (2205kHz sampling)

MLT with window switching

Band separation

0

2075

Synthesis (Quarter Sampling)

Audio (11025kHz sampling)

MLT with window switching

Band separation

0

2175

Quantizer

Input coefficient

Output quantized coefficient

Goal convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding

2275

Quantizer

Scalar quantizer with a deadzone

s

snmsnm

1

0n][m

][][

Quantized Magnitude Sign

0

2375

Lossless (Integer) Pass

2475

Key to Achieve Lossless

Break the MLT into small steps

Make every step reversible

Definition of reversible transform Integer input integer output The transform should have a determinant of 1

(donot expand data volume)

2575

MLT Framework

Pre-R

otate

Com

plex FF

T

Post R

otation

DCT IV

Window

Lapped Transform

Pre-R

otate-l

Com

plex FF

T-l

Post R

otation-l

Inv Window

-l

Forward MLT

Inverse MLT

2675

Window Operation

x(n)x(-n-1)

N

n

4

)21(

4

Complex Rotate

2775

Pre-Rotation

Complex Rotate ndash32 xw(0)

xw(1)

xw(2)

xw(3)

xw(4)

xw(5)

xw(6)

xw(7)

Complex Rotate ndash532

Complex Rotate ndash932

Complex Rotate ndash1332

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2875

FFT (4 Point Complex)

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

xc(0)

xc(1)

xc(2)

xc(3)

-

- e-j2

-

-

yc(0)

yc(1)

yc(2)

yc(3)

yp(0)

yp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2975

Post-Rotation

Conjugate Rotate ndash0y(0)

y(1)

y(2)

y(3)

y(4)

y(5)

y(6)

y(7)

Conjugate Rotate ndash8

Conjugate Rotate ndash28

Conjugate Rotate ndash38

yp(0)

yp(1)

yp(2)

yp(3)

yp(4)

yp(5)

yp(6)

yp(7)

3075

Reversible MLT

Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation

3175

Reversible Unit Transform

b

a

b

a

11

11

2

1

2

1

0

actb

tcba

bcat

21

21

1

20

c

cc

3275

Entropy Coder

Input quantized coefficients

Output embedded coded bitstream with R-D

performance curve

Goal Compression Embedded bitstream for future manipulation

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

1475

Lossy vs Lossless Mode

MLT(SW)Audio

Quantization

Lossy mode

Reversible MLT(SW)

Audio

Lossless mode

1575

Lossy (Float) Pass

1675

MLT - Modulated Lapped Transforms

0 100 200 300 400 500 600 700 800 900 1000-1

-08

-06

-04

-02

0

02

04

06

08

1

Spatial Response

0 01 02 03 04 05 06 07 08 09 1-100

-80

-60

-40

-20

0

20

40

Frequency (pi)G

ain

(dB

)

Frequency Domain

1775

MLT with Window Switching

Features Basic window size 2048 Short window size 256 Switching criterion A frame (2048 samples) is switched to short window if and

only if Energy is bigger than a certain threshold Energy within the 8 subframes (256 samples) differs more than Ta

There are at least two neighbor subframes where the energy of the former subframe is greater than the latter subframe by Tb

1875

Band Separation

Audio (441kHz sampling)

MLT with window switching

Band separation0

1975

Synthesis (Half Sampling)

Audio (2205kHz sampling)

MLT with window switching

Band separation

0

2075

Synthesis (Quarter Sampling)

Audio (11025kHz sampling)

MLT with window switching

Band separation

0

2175

Quantizer

Input coefficient

Output quantized coefficient

Goal convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding

2275

Quantizer

Scalar quantizer with a deadzone

s

snmsnm

1

0n][m

][][

Quantized Magnitude Sign

0

2375

Lossless (Integer) Pass

2475

Key to Achieve Lossless

Break the MLT into small steps

Make every step reversible

Definition of reversible transform Integer input integer output The transform should have a determinant of 1

(donot expand data volume)

2575

MLT Framework

Pre-R

otate

Com

plex FF

T

Post R

otation

DCT IV

Window

Lapped Transform

Pre-R

otate-l

Com

plex FF

T-l

Post R

otation-l

Inv Window

-l

Forward MLT

Inverse MLT

2675

Window Operation

x(n)x(-n-1)

N

n

4

)21(

4

Complex Rotate

2775

Pre-Rotation

Complex Rotate ndash32 xw(0)

xw(1)

xw(2)

xw(3)

xw(4)

xw(5)

xw(6)

xw(7)

Complex Rotate ndash532

Complex Rotate ndash932

Complex Rotate ndash1332

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2875

FFT (4 Point Complex)

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

xc(0)

xc(1)

xc(2)

xc(3)

-

- e-j2

-

-

yc(0)

yc(1)

yc(2)

yc(3)

yp(0)

yp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2975

Post-Rotation

Conjugate Rotate ndash0y(0)

y(1)

y(2)

y(3)

y(4)

y(5)

y(6)

y(7)

Conjugate Rotate ndash8

Conjugate Rotate ndash28

Conjugate Rotate ndash38

yp(0)

yp(1)

yp(2)

yp(3)

yp(4)

yp(5)

yp(6)

yp(7)

3075

Reversible MLT

Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation

3175

Reversible Unit Transform

b

a

b

a

11

11

2

1

2

1

0

actb

tcba

bcat

21

21

1

20

c

cc

3275

Entropy Coder

Input quantized coefficients

Output embedded coded bitstream with R-D

performance curve

Goal Compression Embedded bitstream for future manipulation

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

1575

Lossy (Float) Pass

1675

MLT - Modulated Lapped Transforms

0 100 200 300 400 500 600 700 800 900 1000-1

-08

-06

-04

-02

0

02

04

06

08

1

Spatial Response

0 01 02 03 04 05 06 07 08 09 1-100

-80

-60

-40

-20

0

20

40

Frequency (pi)G

ain

(dB

)

Frequency Domain

1775

MLT with Window Switching

Features Basic window size 2048 Short window size 256 Switching criterion A frame (2048 samples) is switched to short window if and

only if Energy is bigger than a certain threshold Energy within the 8 subframes (256 samples) differs more than Ta

There are at least two neighbor subframes where the energy of the former subframe is greater than the latter subframe by Tb

1875

Band Separation

Audio (441kHz sampling)

MLT with window switching

Band separation0

1975

Synthesis (Half Sampling)

Audio (2205kHz sampling)

MLT with window switching

Band separation

0

2075

Synthesis (Quarter Sampling)

Audio (11025kHz sampling)

MLT with window switching

Band separation

0

2175

Quantizer

Input coefficient

Output quantized coefficient

Goal convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding

2275

Quantizer

Scalar quantizer with a deadzone

s

snmsnm

1

0n][m

][][

Quantized Magnitude Sign

0

2375

Lossless (Integer) Pass

2475

Key to Achieve Lossless

Break the MLT into small steps

Make every step reversible

Definition of reversible transform Integer input integer output The transform should have a determinant of 1

(donot expand data volume)

2575

MLT Framework

Pre-R

otate

Com

plex FF

T

Post R

otation

DCT IV

Window

Lapped Transform

Pre-R

otate-l

Com

plex FF

T-l

Post R

otation-l

Inv Window

-l

Forward MLT

Inverse MLT

2675

Window Operation

x(n)x(-n-1)

N

n

4

)21(

4

Complex Rotate

2775

Pre-Rotation

Complex Rotate ndash32 xw(0)

xw(1)

xw(2)

xw(3)

xw(4)

xw(5)

xw(6)

xw(7)

Complex Rotate ndash532

Complex Rotate ndash932

Complex Rotate ndash1332

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2875

FFT (4 Point Complex)

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

xc(0)

xc(1)

xc(2)

xc(3)

-

- e-j2

-

-

yc(0)

yc(1)

yc(2)

yc(3)

yp(0)

yp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2975

Post-Rotation

Conjugate Rotate ndash0y(0)

y(1)

y(2)

y(3)

y(4)

y(5)

y(6)

y(7)

Conjugate Rotate ndash8

Conjugate Rotate ndash28

Conjugate Rotate ndash38

yp(0)

yp(1)

yp(2)

yp(3)

yp(4)

yp(5)

yp(6)

yp(7)

3075

Reversible MLT

Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation

3175

Reversible Unit Transform

b

a

b

a

11

11

2

1

2

1

0

actb

tcba

bcat

21

21

1

20

c

cc

3275

Entropy Coder

Input quantized coefficients

Output embedded coded bitstream with R-D

performance curve

Goal Compression Embedded bitstream for future manipulation

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

1675

MLT - Modulated Lapped Transforms

0 100 200 300 400 500 600 700 800 900 1000-1

-08

-06

-04

-02

0

02

04

06

08

1

Spatial Response

0 01 02 03 04 05 06 07 08 09 1-100

-80

-60

-40

-20

0

20

40

Frequency (pi)G

ain

(dB

)

Frequency Domain

1775

MLT with Window Switching

Features Basic window size 2048 Short window size 256 Switching criterion A frame (2048 samples) is switched to short window if and

only if Energy is bigger than a certain threshold Energy within the 8 subframes (256 samples) differs more than Ta

There are at least two neighbor subframes where the energy of the former subframe is greater than the latter subframe by Tb

1875

Band Separation

Audio (441kHz sampling)

MLT with window switching

Band separation0

1975

Synthesis (Half Sampling)

Audio (2205kHz sampling)

MLT with window switching

Band separation

0

2075

Synthesis (Quarter Sampling)

Audio (11025kHz sampling)

MLT with window switching

Band separation

0

2175

Quantizer

Input coefficient

Output quantized coefficient

Goal convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding

2275

Quantizer

Scalar quantizer with a deadzone

s

snmsnm

1

0n][m

][][

Quantized Magnitude Sign

0

2375

Lossless (Integer) Pass

2475

Key to Achieve Lossless

Break the MLT into small steps

Make every step reversible

Definition of reversible transform Integer input integer output The transform should have a determinant of 1

(donot expand data volume)

2575

MLT Framework

Pre-R

otate

Com

plex FF

T

Post R

otation

DCT IV

Window

Lapped Transform

Pre-R

otate-l

Com

plex FF

T-l

Post R

otation-l

Inv Window

-l

Forward MLT

Inverse MLT

2675

Window Operation

x(n)x(-n-1)

N

n

4

)21(

4

Complex Rotate

2775

Pre-Rotation

Complex Rotate ndash32 xw(0)

xw(1)

xw(2)

xw(3)

xw(4)

xw(5)

xw(6)

xw(7)

Complex Rotate ndash532

Complex Rotate ndash932

Complex Rotate ndash1332

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2875

FFT (4 Point Complex)

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

xc(0)

xc(1)

xc(2)

xc(3)

-

- e-j2

-

-

yc(0)

yc(1)

yc(2)

yc(3)

yp(0)

yp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2975

Post-Rotation

Conjugate Rotate ndash0y(0)

y(1)

y(2)

y(3)

y(4)

y(5)

y(6)

y(7)

Conjugate Rotate ndash8

Conjugate Rotate ndash28

Conjugate Rotate ndash38

yp(0)

yp(1)

yp(2)

yp(3)

yp(4)

yp(5)

yp(6)

yp(7)

3075

Reversible MLT

Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation

3175

Reversible Unit Transform

b

a

b

a

11

11

2

1

2

1

0

actb

tcba

bcat

21

21

1

20

c

cc

3275

Entropy Coder

Input quantized coefficients

Output embedded coded bitstream with R-D

performance curve

Goal Compression Embedded bitstream for future manipulation

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

1775

MLT with Window Switching

Features Basic window size 2048 Short window size 256 Switching criterion A frame (2048 samples) is switched to short window if and

only if Energy is bigger than a certain threshold Energy within the 8 subframes (256 samples) differs more than Ta

There are at least two neighbor subframes where the energy of the former subframe is greater than the latter subframe by Tb

1875

Band Separation

Audio (441kHz sampling)

MLT with window switching

Band separation0

1975

Synthesis (Half Sampling)

Audio (2205kHz sampling)

MLT with window switching

Band separation

0

2075

Synthesis (Quarter Sampling)

Audio (11025kHz sampling)

MLT with window switching

Band separation

0

2175

Quantizer

Input coefficient

Output quantized coefficient

Goal convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding

2275

Quantizer

Scalar quantizer with a deadzone

s

snmsnm

1

0n][m

][][

Quantized Magnitude Sign

0

2375

Lossless (Integer) Pass

2475

Key to Achieve Lossless

Break the MLT into small steps

Make every step reversible

Definition of reversible transform Integer input integer output The transform should have a determinant of 1

(donot expand data volume)

2575

MLT Framework

Pre-R

otate

Com

plex FF

T

Post R

otation

DCT IV

Window

Lapped Transform

Pre-R

otate-l

Com

plex FF

T-l

Post R

otation-l

Inv Window

-l

Forward MLT

Inverse MLT

2675

Window Operation

x(n)x(-n-1)

N

n

4

)21(

4

Complex Rotate

2775

Pre-Rotation

Complex Rotate ndash32 xw(0)

xw(1)

xw(2)

xw(3)

xw(4)

xw(5)

xw(6)

xw(7)

Complex Rotate ndash532

Complex Rotate ndash932

Complex Rotate ndash1332

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2875

FFT (4 Point Complex)

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

xc(0)

xc(1)

xc(2)

xc(3)

-

- e-j2

-

-

yc(0)

yc(1)

yc(2)

yc(3)

yp(0)

yp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2975

Post-Rotation

Conjugate Rotate ndash0y(0)

y(1)

y(2)

y(3)

y(4)

y(5)

y(6)

y(7)

Conjugate Rotate ndash8

Conjugate Rotate ndash28

Conjugate Rotate ndash38

yp(0)

yp(1)

yp(2)

yp(3)

yp(4)

yp(5)

yp(6)

yp(7)

3075

Reversible MLT

Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation

3175

Reversible Unit Transform

b

a

b

a

11

11

2

1

2

1

0

actb

tcba

bcat

21

21

1

20

c

cc

3275

Entropy Coder

Input quantized coefficients

Output embedded coded bitstream with R-D

performance curve

Goal Compression Embedded bitstream for future manipulation

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

1875

Band Separation

Audio (441kHz sampling)

MLT with window switching

Band separation0

1975

Synthesis (Half Sampling)

Audio (2205kHz sampling)

MLT with window switching

Band separation

0

2075

Synthesis (Quarter Sampling)

Audio (11025kHz sampling)

MLT with window switching

Band separation

0

2175

Quantizer

Input coefficient

Output quantized coefficient

Goal convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding

2275

Quantizer

Scalar quantizer with a deadzone

s

snmsnm

1

0n][m

][][

Quantized Magnitude Sign

0

2375

Lossless (Integer) Pass

2475

Key to Achieve Lossless

Break the MLT into small steps

Make every step reversible

Definition of reversible transform Integer input integer output The transform should have a determinant of 1

(donot expand data volume)

2575

MLT Framework

Pre-R

otate

Com

plex FF

T

Post R

otation

DCT IV

Window

Lapped Transform

Pre-R

otate-l

Com

plex FF

T-l

Post R

otation-l

Inv Window

-l

Forward MLT

Inverse MLT

2675

Window Operation

x(n)x(-n-1)

N

n

4

)21(

4

Complex Rotate

2775

Pre-Rotation

Complex Rotate ndash32 xw(0)

xw(1)

xw(2)

xw(3)

xw(4)

xw(5)

xw(6)

xw(7)

Complex Rotate ndash532

Complex Rotate ndash932

Complex Rotate ndash1332

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2875

FFT (4 Point Complex)

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

xc(0)

xc(1)

xc(2)

xc(3)

-

- e-j2

-

-

yc(0)

yc(1)

yc(2)

yc(3)

yp(0)

yp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2975

Post-Rotation

Conjugate Rotate ndash0y(0)

y(1)

y(2)

y(3)

y(4)

y(5)

y(6)

y(7)

Conjugate Rotate ndash8

Conjugate Rotate ndash28

Conjugate Rotate ndash38

yp(0)

yp(1)

yp(2)

yp(3)

yp(4)

yp(5)

yp(6)

yp(7)

3075

Reversible MLT

Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation

3175

Reversible Unit Transform

b

a

b

a

11

11

2

1

2

1

0

actb

tcba

bcat

21

21

1

20

c

cc

3275

Entropy Coder

Input quantized coefficients

Output embedded coded bitstream with R-D

performance curve

Goal Compression Embedded bitstream for future manipulation

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

1975

Synthesis (Half Sampling)

Audio (2205kHz sampling)

MLT with window switching

Band separation

0

2075

Synthesis (Quarter Sampling)

Audio (11025kHz sampling)

MLT with window switching

Band separation

0

2175

Quantizer

Input coefficient

Output quantized coefficient

Goal convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding

2275

Quantizer

Scalar quantizer with a deadzone

s

snmsnm

1

0n][m

][][

Quantized Magnitude Sign

0

2375

Lossless (Integer) Pass

2475

Key to Achieve Lossless

Break the MLT into small steps

Make every step reversible

Definition of reversible transform Integer input integer output The transform should have a determinant of 1

(donot expand data volume)

2575

MLT Framework

Pre-R

otate

Com

plex FF

T

Post R

otation

DCT IV

Window

Lapped Transform

Pre-R

otate-l

Com

plex FF

T-l

Post R

otation-l

Inv Window

-l

Forward MLT

Inverse MLT

2675

Window Operation

x(n)x(-n-1)

N

n

4

)21(

4

Complex Rotate

2775

Pre-Rotation

Complex Rotate ndash32 xw(0)

xw(1)

xw(2)

xw(3)

xw(4)

xw(5)

xw(6)

xw(7)

Complex Rotate ndash532

Complex Rotate ndash932

Complex Rotate ndash1332

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2875

FFT (4 Point Complex)

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

xc(0)

xc(1)

xc(2)

xc(3)

-

- e-j2

-

-

yc(0)

yc(1)

yc(2)

yc(3)

yp(0)

yp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2975

Post-Rotation

Conjugate Rotate ndash0y(0)

y(1)

y(2)

y(3)

y(4)

y(5)

y(6)

y(7)

Conjugate Rotate ndash8

Conjugate Rotate ndash28

Conjugate Rotate ndash38

yp(0)

yp(1)

yp(2)

yp(3)

yp(4)

yp(5)

yp(6)

yp(7)

3075

Reversible MLT

Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation

3175

Reversible Unit Transform

b

a

b

a

11

11

2

1

2

1

0

actb

tcba

bcat

21

21

1

20

c

cc

3275

Entropy Coder

Input quantized coefficients

Output embedded coded bitstream with R-D

performance curve

Goal Compression Embedded bitstream for future manipulation

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

2075

Synthesis (Quarter Sampling)

Audio (11025kHz sampling)

MLT with window switching

Band separation

0

2175

Quantizer

Input coefficient

Output quantized coefficient

Goal convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding

2275

Quantizer

Scalar quantizer with a deadzone

s

snmsnm

1

0n][m

][][

Quantized Magnitude Sign

0

2375

Lossless (Integer) Pass

2475

Key to Achieve Lossless

Break the MLT into small steps

Make every step reversible

Definition of reversible transform Integer input integer output The transform should have a determinant of 1

(donot expand data volume)

2575

MLT Framework

Pre-R

otate

Com

plex FF

T

Post R

otation

DCT IV

Window

Lapped Transform

Pre-R

otate-l

Com

plex FF

T-l

Post R

otation-l

Inv Window

-l

Forward MLT

Inverse MLT

2675

Window Operation

x(n)x(-n-1)

N

n

4

)21(

4

Complex Rotate

2775

Pre-Rotation

Complex Rotate ndash32 xw(0)

xw(1)

xw(2)

xw(3)

xw(4)

xw(5)

xw(6)

xw(7)

Complex Rotate ndash532

Complex Rotate ndash932

Complex Rotate ndash1332

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2875

FFT (4 Point Complex)

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

xc(0)

xc(1)

xc(2)

xc(3)

-

- e-j2

-

-

yc(0)

yc(1)

yc(2)

yc(3)

yp(0)

yp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2975

Post-Rotation

Conjugate Rotate ndash0y(0)

y(1)

y(2)

y(3)

y(4)

y(5)

y(6)

y(7)

Conjugate Rotate ndash8

Conjugate Rotate ndash28

Conjugate Rotate ndash38

yp(0)

yp(1)

yp(2)

yp(3)

yp(4)

yp(5)

yp(6)

yp(7)

3075

Reversible MLT

Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation

3175

Reversible Unit Transform

b

a

b

a

11

11

2

1

2

1

0

actb

tcba

bcat

21

21

1

20

c

cc

3275

Entropy Coder

Input quantized coefficients

Output embedded coded bitstream with R-D

performance curve

Goal Compression Embedded bitstream for future manipulation

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

2175

Quantizer

Input coefficient

Output quantized coefficient

Goal convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding

2275

Quantizer

Scalar quantizer with a deadzone

s

snmsnm

1

0n][m

][][

Quantized Magnitude Sign

0

2375

Lossless (Integer) Pass

2475

Key to Achieve Lossless

Break the MLT into small steps

Make every step reversible

Definition of reversible transform Integer input integer output The transform should have a determinant of 1

(donot expand data volume)

2575

MLT Framework

Pre-R

otate

Com

plex FF

T

Post R

otation

DCT IV

Window

Lapped Transform

Pre-R

otate-l

Com

plex FF

T-l

Post R

otation-l

Inv Window

-l

Forward MLT

Inverse MLT

2675

Window Operation

x(n)x(-n-1)

N

n

4

)21(

4

Complex Rotate

2775

Pre-Rotation

Complex Rotate ndash32 xw(0)

xw(1)

xw(2)

xw(3)

xw(4)

xw(5)

xw(6)

xw(7)

Complex Rotate ndash532

Complex Rotate ndash932

Complex Rotate ndash1332

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2875

FFT (4 Point Complex)

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

xc(0)

xc(1)

xc(2)

xc(3)

-

- e-j2

-

-

yc(0)

yc(1)

yc(2)

yc(3)

yp(0)

yp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2975

Post-Rotation

Conjugate Rotate ndash0y(0)

y(1)

y(2)

y(3)

y(4)

y(5)

y(6)

y(7)

Conjugate Rotate ndash8

Conjugate Rotate ndash28

Conjugate Rotate ndash38

yp(0)

yp(1)

yp(2)

yp(3)

yp(4)

yp(5)

yp(6)

yp(7)

3075

Reversible MLT

Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation

3175

Reversible Unit Transform

b

a

b

a

11

11

2

1

2

1

0

actb

tcba

bcat

21

21

1

20

c

cc

3275

Entropy Coder

Input quantized coefficients

Output embedded coded bitstream with R-D

performance curve

Goal Compression Embedded bitstream for future manipulation

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

2275

Quantizer

Scalar quantizer with a deadzone

s

snmsnm

1

0n][m

][][

Quantized Magnitude Sign

0

2375

Lossless (Integer) Pass

2475

Key to Achieve Lossless

Break the MLT into small steps

Make every step reversible

Definition of reversible transform Integer input integer output The transform should have a determinant of 1

(donot expand data volume)

2575

MLT Framework

Pre-R

otate

Com

plex FF

T

Post R

otation

DCT IV

Window

Lapped Transform

Pre-R

otate-l

Com

plex FF

T-l

Post R

otation-l

Inv Window

-l

Forward MLT

Inverse MLT

2675

Window Operation

x(n)x(-n-1)

N

n

4

)21(

4

Complex Rotate

2775

Pre-Rotation

Complex Rotate ndash32 xw(0)

xw(1)

xw(2)

xw(3)

xw(4)

xw(5)

xw(6)

xw(7)

Complex Rotate ndash532

Complex Rotate ndash932

Complex Rotate ndash1332

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2875

FFT (4 Point Complex)

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

xc(0)

xc(1)

xc(2)

xc(3)

-

- e-j2

-

-

yc(0)

yc(1)

yc(2)

yc(3)

yp(0)

yp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2975

Post-Rotation

Conjugate Rotate ndash0y(0)

y(1)

y(2)

y(3)

y(4)

y(5)

y(6)

y(7)

Conjugate Rotate ndash8

Conjugate Rotate ndash28

Conjugate Rotate ndash38

yp(0)

yp(1)

yp(2)

yp(3)

yp(4)

yp(5)

yp(6)

yp(7)

3075

Reversible MLT

Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation

3175

Reversible Unit Transform

b

a

b

a

11

11

2

1

2

1

0

actb

tcba

bcat

21

21

1

20

c

cc

3275

Entropy Coder

Input quantized coefficients

Output embedded coded bitstream with R-D

performance curve

Goal Compression Embedded bitstream for future manipulation

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

2375

Lossless (Integer) Pass

2475

Key to Achieve Lossless

Break the MLT into small steps

Make every step reversible

Definition of reversible transform Integer input integer output The transform should have a determinant of 1

(donot expand data volume)

2575

MLT Framework

Pre-R

otate

Com

plex FF

T

Post R

otation

DCT IV

Window

Lapped Transform

Pre-R

otate-l

Com

plex FF

T-l

Post R

otation-l

Inv Window

-l

Forward MLT

Inverse MLT

2675

Window Operation

x(n)x(-n-1)

N

n

4

)21(

4

Complex Rotate

2775

Pre-Rotation

Complex Rotate ndash32 xw(0)

xw(1)

xw(2)

xw(3)

xw(4)

xw(5)

xw(6)

xw(7)

Complex Rotate ndash532

Complex Rotate ndash932

Complex Rotate ndash1332

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2875

FFT (4 Point Complex)

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

xc(0)

xc(1)

xc(2)

xc(3)

-

- e-j2

-

-

yc(0)

yc(1)

yc(2)

yc(3)

yp(0)

yp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2975

Post-Rotation

Conjugate Rotate ndash0y(0)

y(1)

y(2)

y(3)

y(4)

y(5)

y(6)

y(7)

Conjugate Rotate ndash8

Conjugate Rotate ndash28

Conjugate Rotate ndash38

yp(0)

yp(1)

yp(2)

yp(3)

yp(4)

yp(5)

yp(6)

yp(7)

3075

Reversible MLT

Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation

3175

Reversible Unit Transform

b

a

b

a

11

11

2

1

2

1

0

actb

tcba

bcat

21

21

1

20

c

cc

3275

Entropy Coder

Input quantized coefficients

Output embedded coded bitstream with R-D

performance curve

Goal Compression Embedded bitstream for future manipulation

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

2475

Key to Achieve Lossless

Break the MLT into small steps

Make every step reversible

Definition of reversible transform Integer input integer output The transform should have a determinant of 1

(donot expand data volume)

2575

MLT Framework

Pre-R

otate

Com

plex FF

T

Post R

otation

DCT IV

Window

Lapped Transform

Pre-R

otate-l

Com

plex FF

T-l

Post R

otation-l

Inv Window

-l

Forward MLT

Inverse MLT

2675

Window Operation

x(n)x(-n-1)

N

n

4

)21(

4

Complex Rotate

2775

Pre-Rotation

Complex Rotate ndash32 xw(0)

xw(1)

xw(2)

xw(3)

xw(4)

xw(5)

xw(6)

xw(7)

Complex Rotate ndash532

Complex Rotate ndash932

Complex Rotate ndash1332

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2875

FFT (4 Point Complex)

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

xc(0)

xc(1)

xc(2)

xc(3)

-

- e-j2

-

-

yc(0)

yc(1)

yc(2)

yc(3)

yp(0)

yp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2975

Post-Rotation

Conjugate Rotate ndash0y(0)

y(1)

y(2)

y(3)

y(4)

y(5)

y(6)

y(7)

Conjugate Rotate ndash8

Conjugate Rotate ndash28

Conjugate Rotate ndash38

yp(0)

yp(1)

yp(2)

yp(3)

yp(4)

yp(5)

yp(6)

yp(7)

3075

Reversible MLT

Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation

3175

Reversible Unit Transform

b

a

b

a

11

11

2

1

2

1

0

actb

tcba

bcat

21

21

1

20

c

cc

3275

Entropy Coder

Input quantized coefficients

Output embedded coded bitstream with R-D

performance curve

Goal Compression Embedded bitstream for future manipulation

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

2575

MLT Framework

Pre-R

otate

Com

plex FF

T

Post R

otation

DCT IV

Window

Lapped Transform

Pre-R

otate-l

Com

plex FF

T-l

Post R

otation-l

Inv Window

-l

Forward MLT

Inverse MLT

2675

Window Operation

x(n)x(-n-1)

N

n

4

)21(

4

Complex Rotate

2775

Pre-Rotation

Complex Rotate ndash32 xw(0)

xw(1)

xw(2)

xw(3)

xw(4)

xw(5)

xw(6)

xw(7)

Complex Rotate ndash532

Complex Rotate ndash932

Complex Rotate ndash1332

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2875

FFT (4 Point Complex)

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

xc(0)

xc(1)

xc(2)

xc(3)

-

- e-j2

-

-

yc(0)

yc(1)

yc(2)

yc(3)

yp(0)

yp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2975

Post-Rotation

Conjugate Rotate ndash0y(0)

y(1)

y(2)

y(3)

y(4)

y(5)

y(6)

y(7)

Conjugate Rotate ndash8

Conjugate Rotate ndash28

Conjugate Rotate ndash38

yp(0)

yp(1)

yp(2)

yp(3)

yp(4)

yp(5)

yp(6)

yp(7)

3075

Reversible MLT

Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation

3175

Reversible Unit Transform

b

a

b

a

11

11

2

1

2

1

0

actb

tcba

bcat

21

21

1

20

c

cc

3275

Entropy Coder

Input quantized coefficients

Output embedded coded bitstream with R-D

performance curve

Goal Compression Embedded bitstream for future manipulation

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

2675

Window Operation

x(n)x(-n-1)

N

n

4

)21(

4

Complex Rotate

2775

Pre-Rotation

Complex Rotate ndash32 xw(0)

xw(1)

xw(2)

xw(3)

xw(4)

xw(5)

xw(6)

xw(7)

Complex Rotate ndash532

Complex Rotate ndash932

Complex Rotate ndash1332

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2875

FFT (4 Point Complex)

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

xc(0)

xc(1)

xc(2)

xc(3)

-

- e-j2

-

-

yc(0)

yc(1)

yc(2)

yc(3)

yp(0)

yp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2975

Post-Rotation

Conjugate Rotate ndash0y(0)

y(1)

y(2)

y(3)

y(4)

y(5)

y(6)

y(7)

Conjugate Rotate ndash8

Conjugate Rotate ndash28

Conjugate Rotate ndash38

yp(0)

yp(1)

yp(2)

yp(3)

yp(4)

yp(5)

yp(6)

yp(7)

3075

Reversible MLT

Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation

3175

Reversible Unit Transform

b

a

b

a

11

11

2

1

2

1

0

actb

tcba

bcat

21

21

1

20

c

cc

3275

Entropy Coder

Input quantized coefficients

Output embedded coded bitstream with R-D

performance curve

Goal Compression Embedded bitstream for future manipulation

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

2775

Pre-Rotation

Complex Rotate ndash32 xw(0)

xw(1)

xw(2)

xw(3)

xw(4)

xw(5)

xw(6)

xw(7)

Complex Rotate ndash532

Complex Rotate ndash932

Complex Rotate ndash1332

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2875

FFT (4 Point Complex)

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

xc(0)

xc(1)

xc(2)

xc(3)

-

- e-j2

-

-

yc(0)

yc(1)

yc(2)

yc(3)

yp(0)

yp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2975

Post-Rotation

Conjugate Rotate ndash0y(0)

y(1)

y(2)

y(3)

y(4)

y(5)

y(6)

y(7)

Conjugate Rotate ndash8

Conjugate Rotate ndash28

Conjugate Rotate ndash38

yp(0)

yp(1)

yp(2)

yp(3)

yp(4)

yp(5)

yp(6)

yp(7)

3075

Reversible MLT

Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation

3175

Reversible Unit Transform

b

a

b

a

11

11

2

1

2

1

0

actb

tcba

bcat

21

21

1

20

c

cc

3275

Entropy Coder

Input quantized coefficients

Output embedded coded bitstream with R-D

performance curve

Goal Compression Embedded bitstream for future manipulation

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

2875

FFT (4 Point Complex)

xp(0)

xp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

xc(0)

xc(1)

xc(2)

xc(3)

-

- e-j2

-

-

yc(0)

yc(1)

yc(2)

yc(3)

yp(0)

yp(1)

xp(2)

xp(3)

xp(4)

xp(5)

xp(6)

xp(7)

2975

Post-Rotation

Conjugate Rotate ndash0y(0)

y(1)

y(2)

y(3)

y(4)

y(5)

y(6)

y(7)

Conjugate Rotate ndash8

Conjugate Rotate ndash28

Conjugate Rotate ndash38

yp(0)

yp(1)

yp(2)

yp(3)

yp(4)

yp(5)

yp(6)

yp(7)

3075

Reversible MLT

Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation

3175

Reversible Unit Transform

b

a

b

a

11

11

2

1

2

1

0

actb

tcba

bcat

21

21

1

20

c

cc

3275

Entropy Coder

Input quantized coefficients

Output embedded coded bitstream with R-D

performance curve

Goal Compression Embedded bitstream for future manipulation

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

2975

Post-Rotation

Conjugate Rotate ndash0y(0)

y(1)

y(2)

y(3)

y(4)

y(5)

y(6)

y(7)

Conjugate Rotate ndash8

Conjugate Rotate ndash28

Conjugate Rotate ndash38

yp(0)

yp(1)

yp(2)

yp(3)

yp(4)

yp(5)

yp(6)

yp(7)

3075

Reversible MLT

Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation

3175

Reversible Unit Transform

b

a

b

a

11

11

2

1

2

1

0

actb

tcba

bcat

21

21

1

20

c

cc

3275

Entropy Coder

Input quantized coefficients

Output embedded coded bitstream with R-D

performance curve

Goal Compression Embedded bitstream for future manipulation

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

3075

Reversible MLT

Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation

3175

Reversible Unit Transform

b

a

b

a

11

11

2

1

2

1

0

actb

tcba

bcat

21

21

1

20

c

cc

3275

Entropy Coder

Input quantized coefficients

Output embedded coded bitstream with R-D

performance curve

Goal Compression Embedded bitstream for future manipulation

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

3175

Reversible Unit Transform

b

a

b

a

11

11

2

1

2

1

0

actb

tcba

bcat

21

21

1

20

c

cc

3275

Entropy Coder

Input quantized coefficients

Output embedded coded bitstream with R-D

performance curve

Goal Compression Embedded bitstream for future manipulation

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

3275

Entropy Coder

Input quantized coefficients

Output embedded coded bitstream with R-D

performance curve

Goal Compression Embedded bitstream for future manipulation

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

3375

Frame Grouping

Time slot

1 2 3 4 5 6 7 8

Fram

e

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

3475

Entropy Coder

D

R

Bitstream

R-D curve

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

3575

Entropy Coder

Embedded coding

Implicit psychoacoustic masking

Context modeling

Arithmetic coding

Implementation concerns

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

3675

A block of coefficients

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Next View graph

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

3775

Bits of Coefficients

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

coef

fici

ent

45

-74

21

14-4

-18

4

-1

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

3875

Conventional Coding

First

Second

Third

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +

Signb1 b2 b3 b4 b5 b6 b7

w0

w1

w2

w3

w4

w5

w6

w7

46

-74

22

00

0

00

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

3975

Embedded Coding

01 -000000

1 +0000000

001 +001 -00

Signb1 b2 b3 b4 b5 b6 b7

0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -

First Second

Third

w0

w1

w2

w3

w4

w5

w6

w7

Value

40

Range

3247

-72 -79-64

163124

-31310

-31310

-3131-24

-31310

-31310

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

4075

Audio Masking

FrequencyCriticalBand

NeighboringBand

Noise Level

Signal

Masking Threshold

Maximum Mask

Signal-tomask ratio

Noise-tomask ratio

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

4175

Psychoacoustic Masking

Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding

approach) according to the masking Encode the transform coefficients

Note Mask modifies the coding content

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

4275

Implicit Psychoacoustic Masking

Key Mask modifies the coding order the content is the same

Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

4375

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

Signb1 b2 b3 b4 b5 b6 b7

001 -000000

First

w0

w1

w2

w3

w4

w5

w6

w7

Value

0

Range

-6363

-96 -127-64

-63630

-63630

-63630

-63630

-1271270

-1271270

Coefficient SignificantInsignificant

Mask

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

4475

Embedded Coding with Implicit Psychoacoustic Masking

01 -000000

1 +0000000

Signb1 b2 b3 b4 b5 b6 b7

0 10 1 +1 0 -0 00 00 00 00 00 0

First Second

w0

w1

w2

w3

w4

w5

w6

w7

Value

48

Range

3263

-96 -127-64

-31310

-31310

-31310

-31310

-63630

-63630

Coefficient SignificantInsignificant

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

4575

Context Modeling

Context Zero coding

Significant statuses of neighbor coefficients Refinement

Whether it is the 1st refinement pass Significant statuses of neighbor coefficients

Sign Neighbor signs

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

4675

After Implicit Psychoacoustic Masking amp Context Modeling

45 0 0 0-74 -13 0 0

21 0 4 014 0 23 23

0 0 0 03 0 4 0

0 3 5 00 0 0 0

0 1 -1 0-4 33 0 -1

0 0 1 00 0 0 0

-4 5 0 0-18 0 0 19

4 0 23 0-1 0 0 0

Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip

Automatically generated

To be encoded

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

4775

Arithmetic Coding ndash Illustration (QM Coder used)

What is arithmetic coding

0

1

1-P0

P0

1-P1

P1 1-P2

P2

S0=0 S1=1 S2=0

0100

Coding result

(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)

AB

C

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

4875

Entropy Coder (Summary)

D

R

Bitstream

R-D curve

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

4975

Speed Up Issues

Context Modeling Use stored context Update context when a coefficient becomes significant

Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask

R-D curve calculation Lookup table calculation of distortion

Context entropy coder QM coder Run-length Rice coder

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

5075

Bitstream Assembly

Input Bitstream R-D curve

Output Assembled bitstream Companion file

Bitstream assembling

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

5175

EAC Bitstream Syntax

Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)

EA

C m

arke

rG

loba

l H

eade

r Timeslot

Head Body

Timeslot

Head Body

Timeslot

Head Body

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

5275

Companion FileG

loba

l H

eade

r Timeslot

Head R-D curve

Timeslot

Head R-D curve

Timeslot

Head R-D curve

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

5375

Rate-Distortion Optimized Assembling (Single Timeslot)

D1

R1

D2

R2

D3

R3

D4

R4

D1

R1

D2

R2

D3

R3

D4

R4

r1 r2

r3 r4

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

5475

Rate-Distortion Optimized Assembling (Multiple Timeslots)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

5575

Allocated Bytes Per Timeslots

Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time

Where Bi allocated bytes for timeslot i

Bufi buffer occupancy level at timeslot i

Ratetrans coding (network) rate per second Time time duration of the timeslot

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

5675

Optimization

Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate

level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the

current timeslot

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

5775

Search (R-D slope)B

uffe

r O

ccup

ancy

(B

ytes

)

Time (timeslots)

Illegal Region

Illegal Region

Underflow (too many bytes)

Overflow (too few bytes)

Wastebytes

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

5875

Multiple Timeslots ndash Constant Bitrate

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

5975

Multiple Timeslots ndash Internet Streaming (Slow Start)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

Buffer-Occupancy Curve

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

6075

Multiple Timeslots ndash Internet Streaming (Normal)

Buf

fer

Occ

upan

cy (

Byt

es)

Time (timeslots)

Illegal Region

Illegal Region

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

6175

Modular Software Design

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

MLT(SW) Quantizer Entropy coder

BitstreamAssembly

Audio

Bitstream

L+R(or mono)

L-R

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

6275

Modular Software Design

Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo

compression as well Probe and data input can be inserted into any part of the

program

Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory

Memory and computation efficient Working memory preallocated

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

6375

Experimental Results

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

6475

EAC ndash Highly Efficient (NMR)

Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better

669568280-22EAC

847556325040WMA

748700571448MP4TwinVQ

8kbps16kbps32kbps48kbpsCodec

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

6575

EAC ndash Lossless

Results based on the average of 16 MPEG4 test clips

132WinZip

272Monkeyrsquos Audio

272EAC

Compression RatioCodec

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

6675

EAC (Versatile)

Versatile Real time 2-way communication (Low delay

mode) Storage device (Pocket PC Xbox) Internet streaming

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

6775

EAC (Low Delay Mode)

Reducing frame size

Timeslot = 1 frame

Fixed length timeslot bitstream

Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line

delay = 3 frames )

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

6875

EAC (Low Delay Mode)

Encoder

Frame = i-1 i i+1Start Encoding Frame i

MLT Quantizer Entropy

Bitstream

Start Decoding Frame iEntropy Quantizer

network

Playable here

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

6975

EAC ndash Flexible Bitstream Syntax

Flexible bitstream syntax Parser may reassemble the bitstream 1000x real

time Change

bit rate of audio channels audio sampling rate

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

7075

EAC ndash Software

Software Encoder 8x realtime (Stereo 441kHz

sampling) Decoder 20x realtime Parser 1000x realtime

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

7175

EAC - Encoder

Audio

EncoderStereo128kbps

Companion file

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

7275

EAC - Parser

Parser

Companion file

Stereo128kbps

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

Server

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

7375

EAC - Decoder

Decoder

Stereo 16kbps

Mono 8kbps

Stereo 16kbps Slow start

Mono 8kbps 11kHz sampling

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

7475

Comparison

Original MP4 TwinVQ WMA EAC

MP3

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions

7575

Conclusions

An embedded audio coder is developed Highly efficient Versatile

Low delay constant bitrate streaming Flexible bitstream

Parsing for bitrate of audio channels audio sampling rate

Good prototype available realtime execution small memory footprint

  • Embedded Audio Coder
  • Outline
  • Introduction
  • Introduction ndash Audio Compression
  • EAC vs Other Compression
  • Media vs File Compression
  • Key Features of EAC
  • EAC Encoder
  • Parser
  • EAC Decoder
  • Embedded Audio Coder - Algorithm Description
  • Frame Work - Encoder
  • Audio Transform
  • Lossy vs Lossless Mode
  • Lossy (Float) Pass
  • MLT - Modulated Lapped Transforms
  • MLT with Window Switching
  • Band Separation
  • Synthesis (Half Sampling)
  • Synthesis (Quarter Sampling)
  • Quantizer
  • Quantizer
  • Lossless (Integer) Pass
  • Key to Achieve Lossless
  • MLT Framework
  • Window Operation
  • Pre-Rotation
  • FFT (4 Point Complex)
  • Post-Rotation
  • Reversible MLT
  • Reversible Unit Transform
  • Entropy Coder
  • Frame Grouping
  • Slide 34
  • Slide 35
  • A block of coefficients
  • Bits of Coefficients
  • Conventional Coding
  • Embedded Coding
  • Audio Masking
  • Psychoacoustic Masking
  • Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Embedded Coding with Implicit Psychoacoustic Masking
  • Context Modeling
  • After Implicit Psychoacoustic Masking amp Context Modeling
  • Arithmetic Coding ndash Illustration (QM Coder used)
  • Entropy Coder (Summary)
  • Speed Up Issues
  • Bitstream Assembly
  • EAC Bitstream Syntax
  • Companion File
  • Rate-Distortion Optimized Assembling (Single Timeslot)
  • Rate-Distortion Optimized Assembling (Multiple Timeslots)
  • Allocated Bytes Per Timeslots
  • Optimization
  • Search (R-D slope)
  • Multiple Timeslots ndash Constant Bitrate
  • Multiple Timeslots ndash Internet Streaming (Slow Start)
  • Multiple Timeslots ndash Internet Streaming (Normal)
  • Modular Software Design
  • Slide 62
  • Experimental Results
  • EAC ndash Highly Efficient (NMR)
  • EAC ndash Lossless
  • EAC (Versatile)
  • EAC (Low Delay Mode)
  • Slide 68
  • EAC ndash Flexible Bitstream Syntax
  • EAC ndash Software
  • EAC - Encoder
  • EAC - Parser
  • EAC - Decoder
  • Comparison
  • Conclusions