Voice Over Packet Networks Getting the most from your voice codec Philippe Gournay VoiceAge Corp....

Voice Over Packet NetworksGetting the most from your voice codec

Philippe GournayVoiceAge Corp.750 Lucerne Road, Suite 250Montreal (Quebec)H3R 2H6 Canada

Phone: +1.514.737.4940Fax: +1.514.908.2037

www.voipdeveloper.comAugust 8-10, 2006

Santa Clara, CaliforniaHyatt Regency Santa Clara

Choosing a codec for VoIP

• VoIP over a converged voice and data network reduces service delivery costs and enables innovative value-added applications (e.g., multimedia conferencing)

• Delivering acceptable QoS and PSTN-grade voice quality are challenging in some real-world networks (especially over wireless links)

• While being able to analyze your network for acceptable QoS / QoE is essential, the choice of the appropriate speech codec and processing technologies will help to ensure that your network consistently performs within your specification



Different Voice Coding Technologies, Different Compromises

• Three factors are decisive when choosing a codec for VoIP:– Bit rate, delay and robustness to packet losses– Quality in clear channel and complexity are also important factors– One can trade off between these factors (e.g. increase bit rate, delay

or complexity to increase robustness)• Sample-based codecs

– PCM (G.711), ADPCM– Higher quality, lower delay and complexity– Higher bit rate (64 kbits/s for PCM)

• Frame-based (CELP) codecs– G.729, GSM codecs– Lower bit rate (typically 6-24 kbits/s)– Higher delay and complexity– Quality highly dependent on bit rate and input conditions



There’s more than just PCM or G.729

Although this is a very successful family, the list of standard-based voice codecs is not limited to

ACELP…



Wideband Speech Communication

• Substantially increases captured speech information– Delivers double the audio

signal bandwidth

• Enables digital end-to-end packet-based services to deliver much better speech communication quality than traditional PSTN circuit-switched telephony– VoIP quality differentiator



VoIP Presents Very Specific Challenges

• Variable transmission time– Jitter

– Delay spikes

– Clock drift between sender and receiver

• Transmission impairments– Out-of-sequence packets

– Late packets

– Lost packets

The receiver in general, and the codec in particular, must be prepared to handle those impairments



Various Delay and Error Profiles1. Low-amplitude, static jitter

2. High-amplitude, static jitter

3. Low/high amplitude changing jitter

4. Varying jitter situation with high packet loss rate

5. Moderate basic static jitter characteristics with occasional moderate delay spikes and significant packet loss rate

6. Moderate basic static jitter characteristics with frequent and severe delay spikes

Proposed by 3GPP for jitter buffer characterization



Jitter Buffering• Voice communications over packet networks (VoIP) is

characterized by a variable transmission time (jitter)• VoIP receivers generally use a jitter buffer to control the

effect of the jitter• The jitter buffer works by introducing an additional playout

delay• The playout delay is chosen to minimize the number of late

packets while keeping the total end-to-end delay within acceptable limits

• Packets that arrive before their playout time are temporarily stored in a reception buffer



Jitter Buffering Strategies

• Fixed jitter buffer– The playout delay is chosen at the beginning of the

conversation

• Variable jitter buffer– Under the basic talk-spurt based strategy, the playout

delay is changed at the beginning of each silence period– For quickly varying networks, better results are obtained

when the playout delay is also adapted during active speech



Adaptive Jitter Buffering

• Requires a means for deciding when and how to change the playout delay– Decision based either on a prediction of the jitter, or on

an histogram of its past values– Jitter spikes, clock drift, and lost & out-of-order packets

must be handled properly

• Also requires a means of time scaling of speech– Playing out a longer frame increases the playout delay,

while playing out a shorter frame decreases it– Can be done either outside the decoder in the PCM

domain, or inside the decoder



Packet Loss Concealment (PLC)

• What to do when a packet is missing?– Compute a replacement speech frame

• Sample-based codecs– Require an external PLC– G.711 Appendix I: "A high quality low-complexity algorithm for

packet loss concealment with G.711"

• Predictive codecs (G.729, AMR-WB/G.722.2 & VMR-WB)– PLC generally built into their standards– Based on prediction / extrapolation of speech– Interpolation is more effective but increases delay



Packet Loss Recovery

• What to do when packets are received again?– Just smoothing for sample-based or non-predictive

codecs– Resynchronization to limit error propagation for CELP

codecs

• There are a number of proprietary solutions for resynchronization– Using side information or not– Interoperability with standards may be an issue



Late Packet Processing

• Late packets are most often considered as lost and discarded

• The concealment procedure does not correctly update the internal state of predictive voice decoders– Error propagates well after the packet loss

• Using information in late packets instead of simply discarding them substantially improves the recovery of the voice decoder– Enables updating of the internal state of the decoder

• In a VoIP environment, late packet processing:– Enhances robustness against jitter without increasing the delay– Lowers the buffering delay without degrading quality



Forward Error Correction (FEC)

• Not all missing speech frames can be concealed, especially when concealment uses only the past signal– Onsets, transients (i.e., abrupt/quick changes in the speech signal )

• The concealment error can propagate over several frames, even frames received correctly– Because of the use of (long-term) prediction

• Add redundancy at the packet level to decrease frame loss probability– Increases the robustness of the codec – Increases the bit rate, and possibly the delay



Add Redundancy to G.729

– Basic configuration– R=8kbps, D=25ms

– Delay and redundancy– R=12kbps, D=35ms

– Partial redundancy– More or less delay– R=14.1kbps, D=25ms/45ms

– Redundancy + delay– R=16kbps, D=45ms

… …F2k-2 F2k-1

F2k

F2k F2k+1

F2k+2

F2k+2 F2k+3

F2k+4

… …F2k-2 F2k-1

F’2k-3

F2k F2k+1

F’2k-1

F2k+2 F2k+3

F’2k+1F’2k-4 F’2k-2 F’2k

… …F2k-2 F2k-1

F2k-3

F2k F2k+1

F2k-1

F2k+2 F2k+3

F2k+1F2k-4 F2k-2 F2k

… …F2k-2 F2k-1

Pk-1 Pk Pk+1

F2k F2k+1 F2k+2 F2k+3

• G.729-0

• G.729-1

• G.729-2 / 3

• G-729-4

*The G.729 RTP payload already supports solutions G.729-1 and G.729-4

20ms packet (network)

10ms frames (codec)



FEC Listening Test Results

R (kbits/s) D(ms)

11.8 15

16 45

14.1 45

12 3514.1 25

8 25

0 5 10 15 20 251

1.5

2

2.5

3

3.5

4

4.5

5

Frame Erasure Rate (%)

MO

S

ORIGINAL

G.729E (0% FER)

G.729-4

G.729-3

iLBCG.729-1

G.729-2

G.729-0



Conclusion• Packet switching is a very challenging environment for voice codecs:

– Jitter, packet losses

• These performance problems can be mitigated by the choice of the appropriate voice codec and processing technologies:– Jitter buffering, late packet processing– Packet loss concealment and recovery– Forward error correction– Use wideband speech codecs?

• The overall VoIP quality of service is a combination of:– Pure voice quality (MOS score) especially in the presence of packet losses

(robustness)– Bit rate, Delay, Complexity

• … but one can trade off between these factors to provide the best possible quality to as many users as possible.



Philippe GournayVoiceAge Corp.750 Lucerne Road, Suite 250Montreal (Quebec)H3R 2H6 Canada

Phone: +1.514.737.4940Fax: +1.514.908.2037



Fixed Transmission Time (no jitter)

Transmission delay

0 1 2

0 1 2 n n+1

0 1 2 n n+1Sender

Receiver

Playout

Playoutdelay

A fixed playout delay is enough to produce a sustained flow of speech to the listener



Variable Transmission Time (some jitter), Fixed Playout Delay

Transmission delay

0 1 2

0 1 2

Playoutdelay

0 1 2 n n+1

n-1 n n+1

Sender

Receiver

Playout

Some packets (n+1) arrive too late to be decoded

Voice Over Packet Networks Getting the most from your voice codec Philippe Gournay VoiceAge Corp....

Documents

Transcript of Voice Over Packet Networks Getting the most from your voice codec Philippe Gournay VoiceAge Corp....