Thesis Defense
Olufunke Olaleye
Symbiotic Audio Communication on
Interactive Transport
March 21, 2007 Symbiotic Audio Communication on Interactive Transport
2
Technology Overview
Figure 1.1 Bandwidth Limited Network
Today's Internet traffic contains both audio and data packets. Sensitive traffic (audio) shares bandwidth with other non-sensitive traffic as shown fig 1.1
VoIP Audio Chat Online Radio Real time internet lecture Real time Internet conference Online Music service Internet News
Audio chat
Internet
Internet
Internet
Telephone
Telephone
Router
Router
PBX
Internetconference
Telephone
Telephone
Audio chat
Online radio
Online MusicService
Router
Internetconference
Laptop computer
Laptop computer
March 21, 2007 Symbiotic Audio Communication on Interactive Transport
3
Growth of Audio
GLOBAL VoIP GROWTH
2005 2006 2007 2008 2009
Total Subscribers
24,043,303
47,346,874
81,618,331
111,209,271
133,633,938
Growth % 67 97 72 36 20Net New Subscribers 9,682,349 23,303,571 34,271,457 29,590,940 22,424,668
Source: Infonetics Research, February 2006
While, the technology has moved many businesses online, the use of audio traffic over the Internet has growth exponentially.
However, audio perception is highly susceptible to disturbance in temporal quality. Packet loss, delay and jitter affects quality of service during congestion.
To maximize the audio/voice quality, some algorithms proposed to adapt:
Size of the playout buffer Coding rate Packet path diversity [25]
Challenge Receiving feedback from the network about congestion.
Global VOIP Growth
0
50000000
100000000
150000000
2005 2006 2007 2008 2009
Year
# of
Sub
scrib
ers
Subscribers
March 21, 2007 Symbiotic Audio Communication on Interactive Transport
4
Research Goal of Thesis A well-engineered, end-to-end network is necessary to transmit audio over the Internet.
Goal of this thesis:
Reduce the delay and jitter faced by audio traffic in the network during congestion.
An efficient solution: Sender’s end (encoder) ability to sense the state of the network and react accordingly.
R &D:
Developed symbiotic perceptual audio streaming mechanism that is capable of detecting the underlying bandwidth of the network and modify the target bit rate to suit the network condition.
Combines the quantization technique to accurately represent the audio signals without distortion.
Proposed TCP Interactive (iTCP)
--- Operationally state equivalent to the conventional TCP except applications.
--- Optionally subscribe and receive selected local end-point protocol events in real-time.
March 21, 2007 Symbiotic Audio Communication on Interactive Transport
5
The Human Auditory Perception
Figure 2.2 Threshold in quiet and masking threshold
Human ear perceives signal within the range 20 and 20000 Hz.
High sensitivity is between 2.5 and 5 kHz and decreases beyond and below this frequency band.
Two Principle Perception is based on
Threshold in quiet --- Sensitivity.
Masking threshold ---
Temporal
Simultaneous masking
Human auditory system perceives signal in a non-equal width sub-band called critical bands, it’s unit is barks
March 21, 2007 Symbiotic Audio Communication on Interactive Transport
6
Impact of Threshold in Quiet
Quantized values with 25% adaptation
Quantized values without adaptation
Threshold in Quiet
Signal strength below the threshold in quiet are inaudible to the human ear.
March 21, 2007 Symbiotic Audio Communication on Interactive Transport
7
Impact of Masking Threshold
Quantizes value with a single masker – 37.5%
Final Output
Quantized values with multiple maskers ---50%
March 21, 2007 Symbiotic Audio Communication on Interactive Transport
8
Effect of Simultaneous & Temporal Masking
25 75 125 175 225 275 325 375
Signal A
Signal C
Signal E 0
100
200
Sound Presure (dB)
Time in ms
Effect of Multiple Signals
Signal A Signal B Signal C Signal D Signal E
25
75 125 17
5 225 27
5 325 37
5
Signal A
Signal C
Signal E 0
200
Sound Presure (dB)
Time in ms
Effect of Multiple Signals
Signal A Signal B Signal C Signal D Signal E
Masking threshold ---
Temporal
Simultaneous masking
March 21, 2007 Symbiotic Audio Communication on Interactive Transport
9
Audio signal simultaneously passes though the hybrid filter bank and psychoacoustic model.
The hybrid filter bank
Divides the input signal into chucks of 576 samples called granule and sub bands of frequency.
Provides a specified mapping in time and frequency.
Audio Adaptation
The psychoacoustic model behaves like the human auditory system.
Computes a just noticeable noise level in each subband.
Determines the block (window) type --- (short/long).
Computes the energy in each partitions band (threshold calculation partition).
Convolves the partitioned energy by applying the spreading function (frequency spread of masking).
Apply pre-echo control using some constants (2 or 16).
Compares the threshold with the last threshold and quiet threshold and takes the maximum.
Converts threshold calculation partition to scalefactor bands and calculates the signal to mask ratio.
Figure 2.5 Block Diagram of the encoder
March 21, 2007 Symbiotic Audio Communication on Interactive Transport
10
Tables for Threshold calculation partitions
no. FFT-lines minval qthr norm bval0 1 24.5 4.532 0.951 01 1 24.5 4.532 0.7 0.4312 1 24.5 4.532 0.681 0.861- - - - - -- - - - - -5 1 20 0.09 0.665 2.1536 1 20 0.09 0.664 2.5847 1 20 0.029 0.664 3.015- - - - - -
12 1 18 0.009 0.578 5.057- - - - - -
15 2 12 0.018 0.856 6.42216 2 6 0.018 0.846 7.026- - - - - -
60 36 0 32.554 0.483 23.897
Threshold calculation partitions is computed with following parameters: width, minval, threshold in quiet, norm and bval:
(44.1kHz sampling rate---long) (44.1kHz sampling rate---short)no. FFT-lines qthr norm SNR (db) bval0 1 4.532 0.952 -8.24 01 1 0.904 0.7 -8.24 1.7232 1 0.029 0.681 -8.24 3.445- - - - - -- - - - - -5 1 0.009 0.665 -8.24 7.6096 1 0.009 0.664 -8.24 8.71- - - - - -- - - - - -
12 1 0.009 0.578 -7.447 13.2113 1 0.009 0.541 -7.447 13.74814 1 0.009 0.575 -7.447 14.241- - - - - -- - - - - -
37 7 6.33 0.57 -5.229 23.828Parameters for computing the SNR, for short window, it is read from a table.Norm is normaling constant for each sub bandBval is bark value.For low freq, the strength of the masking is limited by minval
bark: a non-linear frequency scale.
March 21, 2007 Symbiotic Audio Communication on Interactive Transport
11
no. sb cbw bu bo w1 w20 2 0 3 1 0.1671 2 3 5 0.833 0.8332 3 5 8 0.167 0.5- - - - - -- - - - - -5 5 15 20 0.833 0.256 3 20 23 0.75 0.583- - - - - -- - - - - -9 3 30 33 0.625 0.310 3 33 36 0.7 0.16711 2 36 38 0.833 1
no. sb cbw bu bo w1 w20 3 0 4 1 0.0561 3 4 7 0.944 0.6112 4 7 11 0.389 0.167- - - - - -- - - - - -5 1 17 18 0.861 0.9176 3 18 21 0.083 0.583- - - - - -- - - - - -
12 4 36 40 0.18 0.113 3 40 43 0.9 0.46814 3 43 46 0.532 0.623- - - - - -- - - - - -
20 2 59 61 0.278 0.96
There are 21 bands at each sampling frequency for long windows and 12 bands each for short windows.
(44.1kHz sampling rate---long) (44.1kHz sampling rate---short)
Tables for converting threshold calculation partitions to scalefactor
bands
The number of partitions (cbw) converted to one scalefactor band (excluding the first and the last partition).
The threshold calculation partitions are converted directly to scalefactor bands. The first partition which is added to the scalefactor band is weighted with w1, the last with w2
bo is the first index value of cbw bu is the last index value of cbw
March 21, 2007 Symbiotic Audio Communication on Interactive Transport
12
Audio Adaptation
It uses two nested iteration loops.
Distortion control loop (outer loop)
Rate control loop – (inner loop)
The inner loop quantizes the input signal and increases the quantizer step size until the output can be coded with the available amount of bit.
After completion of the inner loop, an outer loop checks the distortion of each scalefactor band, if the allowed distortion is exceeded, it amplifies the scalefactor band and calls the inner loop again.
The noise allocation block
Uses the output of the psychoacoustic model, “signal to mask” ratio.
Used the noise level in noise allocation to determine the actual quantizers and quantizer levels.
Determines how to allocate the number of code available for quantization of subband signals. The spectrum (frequencies) are broken into "scalefactor bands".
Thes bands are determined by the MPEG ISO spec. In the noise shaping/quantization code, we allocate bits among the partition bands to achieve the best possible quality
quantize in an iterative process
March 21, 2007 Symbiotic Audio Communication on Interactive Transport
13
Audio Adaptation
If the overall bit sum is less than the available bits to encode a frame. The best quantized values are coded by Huffman coding to further reduce their space requirement.
The bitstream formatting block assembles and formats quantized subband signal using Huffman code and other side information into bitstream.
This is then passed into the network for transmission.
March 21, 2007 Symbiotic Audio Communication on Interactive Transport
14
TCP Congestion Control TCP provides a connection-oriented, reliable delivery
of data streams between two applications or hosts
TCP uses two mechanism to detect network congestion
i. Retransmission timer time out.
ii. Duplicate ACKs.
Congestion Control Algorithms
Slow-start and congestion- avoidance.
Fast-retransmit and fast-recovery.
Figure 3.1 Slow Start/Congestion Avoidance (SSCA) mechanism.
Figure 3.2 Fast Retransmit/Fast Recovery (FRFR) mechanism
March 21, 2007 Symbiotic Audio Communication on Interactive Transport
15
TCP Congestion Control Internal Events
Table 3.1. TCP Congestion Control Internal Events
SubscribableEvents
Event Denotation Explanation SSCA FRFR Sub
1 Retransmission time out Congested network / Lost segment.
2 New ACK received Increment snd_cwnd exponential or linearly.
3 snd_cwnd reached the slow start threshold ssthresh
Switch snd_cwnd increment from exponential to linear.
4 Third duplicate ACK received Lost segment, execute fast retransmit.
5 Fourth (or more) duplicate ACK received
A segment left the network; transmit a new segment.
6 New ACK received
Retransmitted segment arrived at the receiver and all out of order segments buffered at the receiver are acknowledged
In this thesis, for simplicity, we use event 1, retransmission timer time out.
}
March 21, 2007 Symbiotic Audio Communication on Interactive Transport
16
iTCP: interactive TCP
TCP Kernel
SocketAPI
TCP Connection
SubscriptionAPI
Event Monitor
Probing API
Connection State
Signal Handler
User Space
Application T-ware[ 1 ]
T-ware[ 2 ]
T-ware[ n ]
1
2
3a
3b4b
5 6a
6b
7
System
Event Information
4a
Figure 4. The iTCP extension and API.
An application that has subscribed to the kernel also binds a T-ware to a selected TCP event through the subscription API as shown by line 1 and 2 of figure 4.
When the TCP state changes as a result of congestion in the network, it also causes an event to occur in the TCP kernel.
The event monitor is aware of the changes that occurred; thus, it responds instantly by sending a signal in (3a) to the signal handler and also stores the event information in (3b).
The signal handler catches the signal from the kernel and requests the event type (4a, 4b) from the kernel via the probing API.
The appropriate T- Transientware Modules (or T-ware) (5) is triggered to serve the particular event.
March 21, 2007 Symbiotic Audio Communication on Interactive Transport
17
Symbiosis Throttling Model
Figure 5.1 Symbiosis throttling Model (Input Rate)
Figure 5.2 Symbiosis throttling Model (Output Rate)
During congestion, detected by the time-out event ( ζ = 1 ), the model reduces the target bit rate to considerable lesser or minimum rate.
bmax, target bitrate at a normal state bmin, the minimum acceptable rate
Reduction ratio = rate retraction ratio = ρ = b min/ bmax (5.1)
“Running generation threshold” function = link between the underlying TCP and the model.
(5.2)
otherwisetg
whentgtg
T
TT
)1(
1)1(2
1)(
March 21, 2007 Symbiotic Audio Communication on Interactive Transport
18
Symbiosis Throttling Model Running control generation function” b(t) =
(5.3)
Rs = reservoir size Estimated Target Buffer Fullness (5.4)
(5.5)
A = Actual number of bits per granule
)1()1(&1]1)1(,min[
)1(2
1)(&1)1(.2
1.)(
max
max
tgtbwhentbb
tgtbwhentb
whenbtb
T
T
ATRs
sRtTETBF )(*9.0Figure 5.1 Symbiosis throttling Model (Input Rate)
Figure 5.2 Symbiosis throttling Model (Output Rate)
12/))/1152..((
12/))/1152.(()(
max
max
whenZfreqb
whenZfreqbtT
March 21, 2007 Symbiotic Audio Communication on Interactive Transport
19
Code Information
Quant Compare
Compute Allowed Distortion
Amplify violated masking threshold
Perceptual Model
(xmin – xfsf) of scalefactor band j < 0
xmin(sb)
ratio(sb)
Output audio stream
Noise Allocation
Compute Distortion in Quantization
Compute Distortion in Quantization
xfsf(sb)
xfsf(sb)
xr (i) = xr (i) * ifqstep
Reservoir
Quantization of amplified energy
xr ix
Quantization of actual energy
xr ix
Estimated Target Buffer Fullness
Rs
Rs
ix
TT
T (t), xr
0.9 * T 0.9 * T
0.9 * T (t), Rs
ix
T-ware
bb
b (t), ρ
Input Source audio
xr xr
Event
Symbiosis Throttling Model
, Rs
The best quantization with allowed distortion to sharp the noise.
ifqstep = sqrt(2) ^ ((1 + scalefac_scale) * ifq(scalefactor band))
violatedok
March 21, 2007 Symbiotic Audio Communication on Interactive Transport
20
Symbiosis Mechanism: The T-ware
The mechanism use to reduce delay is called T-ware.
It gives the transport layer the ability to communicate with the application layer (encoder).
The key element is the loss event handler that generates a signal.
The signal information is used to probe iTCP service.
Couple with the retraction ratio
ρ , the rate is reduced to achieve the objective of the thesis.
Figure 6a & 6b Pseudo code of the Signal Handler and the Recovery
handler
Signal handler
• Catch the signal from the kernel
and invoke the appropriate T ware.
• Encoder subscribe with the
“retransmit timer out" event only.
• T-ware calculates a frugal
state target bitrate base
on the retraction ratio
• Stores the reduced rate
in the “rate.par” file.
Recovery-T-ware
• The recovery T-ware kicks in at the
Trecovery time
• Returns the encoder bit rate to
normal rate.
March 21, 2007 Symbiotic Audio Communication on Interactive Transport
21
Experiment Set Up
Figure 7.1 Experiment setup
TCPFreeBSD Player
Internet
iTCPFreeBSD Encoder
CongestionInjector
Triggers 1 to 2 retransmit timer out events
March 21, 2007 Symbiotic Audio Communication on Interactive Transport
22
Experiment (Test Samples)TEST ( 3 types of audio sound quality) High quality music (HighQmusic), Low or Poor quality music (LowQmusic) Speech mixed with music (SpeechMusic)
3 Running Mode iEXP iOFF Classic
Table 7.1 Experiment control flags and running modes
March 21, 2007 Symbiotic Audio Communication on Interactive Transport
23
Experiment and Performance Analysis
The parameters used in the experiments:
(i) Predetermined rate retraction ratio ( ρ = 0.50)
(ii) Bit rate
To compute b(t), the rate retraction ratio is multiplied by
the current bit rate. (Bit rate * ρ )
Frames information are collected for the first 800 up to 1200 audio frames of the encoder and the player.
March 21, 2007 Symbiotic Audio Communication on Interactive Transport
24
Performance Analysis
Figure 7.2. Frame arrival delay on the three audio qualities
Performance of iEXP is better than the classic TCP.
The delay buildup in Classic and iOFF are much higher than that experienced by the iEXP (iTCP) as indicated by the step jump.
The step jump of iEXP is much smaller
as a result of the rate retraction ρ iEXP was able to recover from delay buildup in few seconds compare to the Classic and iOFF.
March 21, 2007 Symbiotic Audio Communication on Interactive Transport
25
Performance Analysis
Figure 8.3. Referential Jitter on the three audio qualities
Assuming a packet arrives at the receiver end at time tj
Expected arrival time for the packet is ej.
Referential jitter (refJitter(j)) = (tj - ej)
refjitter(j) is negative if packet j arrives early at the receiver end.
It can be buffered and played at the actual time.
refjitter(j) is positive, if packet j has arrived late at the receiver end.
The player will pause and wait for packet that arrive late.
The higher the delay, the higher the jitter experience by the frames.
The step jumps in the iEXP were much smaller than those in TCP-classic and iOFF.
Furthermore, this indicates that the interactive TCP reduces jitter.
March 21, 2007 Symbiotic Audio Communication on Interactive Transport
26
Performance Analysis
Figure 7.4. Symbiotic Rate Reduction on the three audio qualities
Symbiotic rate reduction that occurred as a result of the rate modification between the rate controller of the encoder and the symbiosis unit
Target bits and the actual bits generated by the encoder for each frame.
The rate retraction ratio of the symbiosis kicks in when a time out event is triggered and reported by iTCP during congestion.
The effect is observed on the plots as number of bits drop in accordance with the rate retraction ratio.
March 21, 2007 Symbiotic Audio Communication on Interactive Transport
27
Performance Analysis
Table 7.2 Average Frame Delay and acceptance ratio Comparison of frame delay and
acceptance ratio
A delay tolerance of d = 2, 4, and 6 seconds are introduced to measure the average frame delay and acceptance ratio.
iEXP mode experience a low delay and high acceptance ratio.
iTCP’s T-ware mechanism allowed the application to use sophisticated techniques to control the temporal qualities of its traffic.
Low delay & High acceptance ratio.
March 21, 2007 Symbiotic Audio Communication on Interactive Transport
28
Performance Analysis
Table 7.3 Percentage of total bits delivered for each mode
Overall stream compression
The overall delivery bits in the iEXP mode reduce to 80 – 90% of the original bits
iOFF and Classic cases shows no adaptation.
The file size of iEXP mode reduce significantly compared to the other modes.
The audio quality between the sending end and the receiving end is achieved perceptually by the temporal and spectral resolution.
iTCP provides a means of tradeoff of terrible frame delay for a satisfactory reduction of quality.
Table 7.4 Sizes of the audio files
March 21, 2007 Symbiotic Audio Communication on Interactive Transport
29
Performance Analysis Aim of the iOFF mode --- Study the overhead introduced by the event notification service.
Increase total transmission time for all modes.
iOFF mode is higher than the classic TCP mode due to event notification service enabled.
iEXP mode is much smaller than the other modes
Indicates the application level performance outweighs the overhead.
0
2
4
6
8
10
12
14
16
18
HighQmusic Low Qmusic SpeechMusic
iOFF iExp classic
Figure 7.5 Overhead of the interactivity service
March 21, 2007 Symbiotic Audio Communication on Interactive Transport
30
ConclusionsAim: Reduce delay and jitter of audio traffic.
Solution: Design an interactive and friendly application that receives the state feedback from the network – Symbiotic encoder.
Achievements: Dynamic reduction of jitter and delay of time sensitive traffic during congestion.
Network congestion reduction by reducing the traffic at the source.
Trade off quality for delay and jitter.
The approach is simple and does not alter any network dynamic to be optimal (e.g fair queuing).
Its effect is entirely on the application layer.
It also further validates interactive transport control protocol (iTCP) usefulness and the efficiency through the idea of the event notification.
Technically, this scheme is not applicable to non-elastic traffic such as simple file transfer.
March 21, 2007 Symbiotic Audio Communication on Interactive Transport
31
Future WorkOptimization: The parameters in the scheme are user defined but can be optimized with the
symbiotic throttling model in [13].
The experiment can be performed using the other subsribable events(i.e. duplicate ACK).
March 21, 2007 Symbiotic Audio Communication on Interactive Transport
32
BIBLOGRPHY
[1] Andersen D., Bansal D., Curtis D., Seshan S., and Balakrishnan H., "System Support for Bandwidth Management and Content Adaptation in Internet Applications," Proc. of OSDI'OO, Oct. 2000, San Diego, CA
[2] Balakrishnan H., Rahul H., and Seshan S., "An Integrated Congestion Management Architecture for Internet Hosts," Proc. of ACM SIGCOMM, Cambridge, MA, Sep 1999. pp.115-187.
[3] Catherine Boutremans, Jean-Yve L Boudec, “Adaptive Joint Playout Buffer and FEC adjustment for the Internet Telephony,” 2003.
[4] Eitan Altman, Chadi Barakat, Victor Ramos, “Queuing Analysis of Simple FEC schemes for IP Telephony,” 2001.
[5] Floyd S., Handley M., Padhye J., and Widmer J.,“Equation-Based Congestion Control for Unicast Applications,” August 2000. SIGCOMM 2000
[6] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, “RTP: A transport protocol for real-time applications,” RFC 1889, 1996
[7] Handley M., Floyd S., Pahdye J., and Widmer J., “TCP Friendly Rate Control” (TFRC): Protocol[8] ISO/IEC 11172-3. Information technology -- Coding of moving pictures and associated audio for digital
storage media at up to about 1,5 Mbit/s -- Part 3[9] J. Mahdavi and S. Floyd, “TCP-Friendly unicast rate-based flow control,” in Draft posted on end2end
mailing list, January 1997, http://www.psc.edu/networking/papers/tcp%20friendly.html.[10] Jacobson V. and Michael J. Karels, "Congestion Avoidance and Control," Computer Communication
Review, vol. 18, no. 4, pp. 314-329, Aug. 1988.[11] Jacobson V., "Modified TCP Congestion Avoidance Algorithm," end2end-interest mailing list, April 1990[12] Johnny Matta, Christine Pepin, Khosrow Lashkari, Ravi Jain, “A source and channel rate adaptation
algorithm for AMR in VoIP using E-model,” June 2003.[13] Khan J. and Zaghal R., "Symbotic Rate adaptation for Time sensitive Elastic Traffic with Interactive
Transport," Journal of Computer Networks, Elsevier Science, March. 2006.
March 21, 2007 Symbiotic Audio Communication on Interactive Transport
33
BIBLOGRPHY(cont’d)
[14] Khan J., Zaghal R., and Gu Q., "Rate Control in an MPEG-2 Video Rate Transcoder for Transport Feedback based Quality-Rate Tradeoff;" PV2002, Pittsburgh, P A, April 2002.[15] Khan J., Zaghal R., and Gu Q., "Dynamic QoS Adaptation for Time Sensitive Traffic with Transientware,"
IASTED WOC'03, Banff; Canada, July 2003.[16] Khan J. and Zaghal R., "Event Model and Application Programming Interface of TCP Interactive," Technical Report 'TR2003-02-02', Feb. 2003.[17] N. Shacham and P.M Kenney, “Packet recovery in high-speed networks using coding and buffer management,” in Proc. IEEE Infocom 1999 .[18] Pradhan P., Chiueh T. and Neogi A., “Aggregate TCP Congestion Control Using Multiple Network Probing,” Proc. Of the 20th International Conference on Distributing Systems, ICDCS 2000.[19] Rishi Sinha, Christos Paradopoulos, Chris Kyriakakis, “Loss Concealment for Multi-channel streaming
audio,” June 2003.[20] Sisalem D. and Wolisz A., “Towards TCP-Friendly Adaptive Multimedia Applications Based on RTP,”
Proc. of the 4th IEEE Symposium on Computers and Communications, 1998.[21] Stevens W. R., “TCP/IP Illustrated, Volume 1: The Protocols,” Addison-Wesley, 1994.[22] Wang R., Yamada K., Sanadidi M. Y., and Gerla M.,“TCP with sender-side intelligence to handle dynamic,
large, leaky pipes,” IEEE Journal on Selected Areas in Communications, 23(2):235-248, 2005.[23] Wen-Tsai Liao, Janet J.C.Chen, Ming-Syan Chen Chen, “Adaptive Recovery Techniques for Real Time
Audio Stream”[24] Wenyu Jiang, Hening Schulzrinne, “VoIP: Comparision and optimatization of packet loss repair methods in
VoIP perceived quality under bursty loss,” May 2002[25] Yi J. Liang, Eckehard G. Steinbach, Bernd Girod, “Voice over IP: Real-time voice communication over the
nternet using packet path diversity,” October 2001.[26] TR2007-02-01 Test Audio Set: Symbiosis Audio Streaming with iTCP, Olufunke Olaleye and Javed I
Khan, February 2007.
March 21, 2007 Symbiotic Audio Communication on Interactive Transport
34
Questions and Comments
Top Related