Application Layer Traffic Optimization (ALTO) Network Positioning System
Application Layer Optimization for Efficient Video
Transcript of Application Layer Optimization for Efficient Video
-
8/6/2019 Application Layer Optimization for Efficient Video
1/5
APPLICATION LAYER OPTIMIZATION FOR EFFICIENT VIDEO
STREAMING OVER IEEE 802.11 BASED WIRELESS NETWORKS
Azfar Moid and Abraham O. FapojuwoDepartment of Electrical and Computer EngineeringUniversity of Calgary, AB, Canada, T2N 1N4
{[email protected], [email protected]}
AbstractMost of the existing video streaming systemsemploy the worst case analysis in application layer buffer
size dimensioning. Even though the worst case buffer size
dimensioning provides deterministic quality of service
(QoS) guarantees that are desirable in multimediatransmission, however, this also over-provisions the scarce
memory resources. In this paper, we propose a dynamictechnique for buffer and rate allocation under two
scenarios: 1) when the channel conditions are known a-
priori, and 2) when the channel conditions are unknown.
Simulation results show up to an order of magnitude
savings in the application layer buffer requirements for the
two scenarios considered. Furthermore, a-priori knowledgeof the channel conditions at the application layer yields an
improved video quality.
Index Terms Buffer dimensioning, Video streaming,
Wireless networks, Rate-control.
I. INTRODUCTIONThe H.264 video format is the latest state-of-the-art
international video coding standard developed by the Joint
Video Team (JVT) of ITU-T and ISO/IEC [1], which is alsouseful for video streaming purposes. For video streaming
over wireless networks, high efficiency can be achieved by
making the mobile devices adapt to the network conditions
(e.g., the real-time channel conditions, available
transmission bandwidth, traffic load, desired spatial or
temporal resolution, delay allowance, error resilience, and
so forth).
The dynamic content of video frames makes the bit-rate
of the encoded video variable in nature, necessitating buffer
management at the application layers of the transcoder and
decoder. To avoid long delays in real-time video streaming,
the transcoder and decoder buffer sizes are usually limited.
However, with smaller buffer sizes, there is an inherent risk
of packet-dropping at the application layer. Moreover, on
client devices, the memory is an important contributor to the
overall power budget [2]. Hence, an optimized solution is
required at the application layer to balance the tradeoff
between packet-delay and packet-dropping. The key to
application layer buffer management is the rate-control
schemes employed. A rate control scheme determines the
optimum encoding rate, which is used during the video
compression process for adjusting the coding parameters,
e.g., the quantization point (QP), to prevent the application
layer buffers from overflow or underflow.
The motivation for this work comes from the fact that the
dynamic buffer management for video coding is not very
well studied in the literature, mainly because of the variable
sizes of video frames. The fixed group of pictures (GOP)
size causes the periodic inclusion of intra (I-) frames for
MPEG-4 video sequences, thus making the video frames
variable in nature. In the case of H.264 video encodingscheme, where only a single or a very few I-frames are used
to encode the video sequence [1], the generated bits per
frame are relatively identical for successive frames, unlike
those for the MPEG-4 scheme. Therefore, there is a
requirement of revisiting the transcoder and decoder buffer
dynamics, to get the optimized sizes of the application layer
buffers, under the constraints of avoiding buffer overflow
and underflow. For example, Reference [3] employed the
transcoding ratio constraints to avoid overflow and
underflow of the transcoder and decoder buffers, regardless
of the video encoding scheme but assuming fixed buffer
sizes. It is shown in this paper that dynamic buffer sizes,
used in conjunction with controlling the encoding rate help prevent buffer overflow and underflow. The advantage of
dynamic buffer sizes over fixed buffer sizes is the
application layer memory savings.
The main contribution of this paper is the proposal of a
technique for dynamic buffer and rate control management,
studied with and without a-priori knowledge of channel
information at the application layer. Aside from the memory
savings, it is also shown that the a-priori knowledge of
channel information at the application layer enhances the
video quality. The problem is formulated as an optimization
problem, where the goal is to minimize the distortion
without violating the buffer constraints. Throughout the
paper, a packet refers to an IEEE 802.11 data-link layer
protocol data unit, whereas a frame denotes a video frame at
the application layer.
The paper is organized as follows. In section II, the
preliminaries of the analysis are first discussed followed by
a formal definition of the problem. Section III contains the
proposed solution scenarios for dynamically controlling the
transcoder and decoder buffers. In Section IV, simulation
results are presented and the paper concludes in section V.
978-1-4244-3508-1/09/$25.00 2009 IEEE 789
-
8/6/2019 Application Layer Optimization for Efficient Video
2/5
II. PROBLEM FORMULATIONA. Preliminaries
1) Model AssumptionsThe system model assumptions are as follows:
A1. The maximum size of transcoder and decoder buffers
is limited and denoted by Btmax
and Bdmax
(in bits),respectively.
A2. The decoder waits forD video frames in its buffer
before starting the decoding process. It is necessary to keep
a certain minimum threshold number of frames in the
decoder buffer to provide a cushion against any blackout
periods, in the case of buffer underflow.
A3. The decoder buffer is considered empty when there are
only D frames in the buffer. The deadline time, during
which the next frame should arrive, is assumed to meet the
criteria of maintaining the threshold ofD frames in the
decoder buffer.
A4. Transcoder and decoder buffers are empty at the
startup time t=0, i.e., Bt(t=0) = 0 and Bd(t=0) = 0,respectively. Here, Bt(t) and Bd(t) are respectively the
transcoder and decoder buffer size at time t.
2) Video DistortionVideo distortion is a measure of the pixel quality of the
received video as compared to the transmitted video. For a
given frame y, usually it is estimated as the mean-square
error (MSE) value of the difference between pixel value
( f ) of the transmitted frame and pixel value ( f ) of the
received frame, as given in (1):
{ }2
( ) ( ), , , ,
1 1 1
MSE( ) .SL MB PX N N N
y yz s g z s g
z s g
y E f f
= = =
=
(1)
In (1), NSL is the number of slices in frame y, NMB is thenumber of macro-blocks in a slice, and NPX represents the
number of pixels in a macro-block.
3) Transcoder BufferLet r(t) denote the incoming video bit-rate (in bits/sec) at
the transcoder input, r'(t) denotes the bit-rate (bits/sec) of
the transcoded video, and Rc(t) is the channel bit-rate
(bits/sec). The transcoded video bit-rate can be written as:
( ) ( ) ( )r' t t r t = , where (t) is a scaling function. After a
video frame y is processed at the transcoder, the total
number of bits generated( ) ( )ybgR T at the buffer, during a
video frame interval time T, is calculated by:
( )
( 1)
( ) ( ) ,
yT
ybg
y T
R T r' t dt
= (2)
wherey is the video frame index and T is the frame inter-
arrival time.
Similarly, the transmitted bits( )
( )y
btR T from the
transcoder buffer, during the interval (y-1)TtoyT, is:
( )
( 1)
( ) ( ) .
yT
ycbt
y T
R T R t dt
= (3)
By assumption A4, the instantaneous transcoder buffer
occupancy at any time tcan be calculated as:
( )0
( ) ( ) ( ) .
t
t cB t r' h R h dh= (4)Specifically, the transcoder buffer occupancy after
transcodingy frames is given as:
( )0
( ) ( ) ( ) .
yT
t cB yT r' h R h dh= (5)
This can also be written in discrete form as:
( ) ( )
1
( ) ( ) ( ) ,
yj j
t bg bt
j
B yT R T R T
=
= (6)
wherej is the frame index.
The expression in (6) shows that the buffer occupancy
after transcoding the yth frame is just the summation of allthe accumulated bits during the interval 0 to yT at the
transcoder buffer. Equation (6) can also be written in a
recursive manner, where the current buffer occupancy after
transcoding the yth frame is written in the form of buffer
occupancy after transcoding the (y-1)th
frame.
( )
1( ) ( ) ( ) ( )
1
( ) ( )
( ) ( ) ( ) ( ) ( ) ,
( 1) ( ) ( ) .
yj j y y
t bg bt bg bt
j
y yt bg bt
B yT R T R T R T R T
B y T R T R T
=
= +
= +
(7)
4) Decoder BufferLet r''(t) denote the rate (in bits/sec) of rendering the
video sequence to the user terminal. The number of bits
rendered ( ) ( )y
brR T to the video terminal during the interval
( 1)y T toyT, is given as:
( )
( 1)
( ) ( ) .
yT
ybr
y T
R T r'' t dt
= (8)
According to assumption A2, the decoder waits for D
frames before starting the decoding process, this
corresponds to a delay ofDT seconds. The initial decoder
buffer occupancy at t=DTis calculated by:
( )
1
( ) ( ) ,
Dj
d bt
j
B DT R T
=
= (9)
In general, the decoder buffer occupancy after decoding the
yth frame is given by:
( ) ( ) ( )
1
( ) ( ) ( ) .
yD j j
d d bt br
j
B yT B DT R T R T +
=
= + (10)
The expression given in (10) shows that the instantaneous
790
-
8/6/2019 Application Layer Optimization for Efficient Video
3/5
decoder buffer occupancy is a function of the initial buffer
occupancy and accumulated bits at the decoder buffer.
5) Channel EstimationAs given in [4] and [5], for an IEEE 802.11 wireless
channel, the channel information can be estimated at the
data-link layer using the number of transmission attempts.
Each transmission attempt at the data-link layer costs around-trip time (RTT), which is a measure of the delay on
the network. Because of the RTT cost, the maximum
number of transmission attempts (Rmax) is limited for time-
sensitive applications, such as video streaming. According
to [5], if the number of transmission attempts reaches Rmax,
this indicates a bad network condition. The typical Rmax
value for IEEE 802.11 based wireless network is 4 [6]. In
this paper, we introduce three thresholds L1, L2 and L3
packet transmission attempts for defining the state of the
channel. We assume the threshold L1 = 1 transmission
attempt indicates a good channel. The threshold L2 = 2
packet transmission attempts indicates a moderate channel
condition. Finally, the threshold L3 = 3 or 4 packettransmission attempts denotes a bad channel, this setting is
consistent with [4] and [5]. The channel information (i.e.,
good, moderate or bad channel condition) is available after a
successful transmission of each data-link layer packet and
this information is used for encoding the next video frame.
B. The Optimization ProblemDefine a vector G, which denotes the application layer
parameters:
{ }( )= ( ), ( ), ( ) .yt d bgB yT B yT R T G (11)
where Bt(yT), Bd(yT) and ( )ybgR T are given by eqns. (7),
(10) and (2), respectively.
Problem P1:
( ){ }
max
max
arg min MSE( ) ,
subject to:
1 0 ( ) ,
2 0 ( ) .
d d
t t
y
B yT B
B yT B
<
<
G
(12)
where MSE(y) is given by eqn. (1). According to (12), the
goal is to find the application layer parameters vectorG, for
which the video distortion is minimized without violating
the buffer constraints.
III. SOLUTION SCENARIOSThe problem P1 is solved by considering two scenarios.
A. Scenario 1:Without Knowledge of ChannelInformation
When the channel information is not known at the
application layer, the transcoding rate cannot be adapted to
the channel. The problem P1 is then solved to determine the
optimum values for the transcoder and decoder buffer,
subject to non-occurrence of buffer underflow and overflow.
For the decoder buffer, it is important to note that both the
underflow and overflow are critical, as the former will lead
to terminal screen blackout due to packet starvation, while
the latter would cause the packet-dropping eventually
leading to video jerks. In case of the transcoder, bufferoverflow is more critical than the underflow because
overflow leads to packet-dropping, hence resulting in
quality loss. Conversely, transcoder buffer underflow would
not cause much harm as the decoder still carries a cushion of
packets (assumption A2) to be displayed at the terminal.
The decoder buffer underflow can be avoided if:
0 ( )dB yT < . Applying (10) and, after rearranging the terms,
the buffer underflow constraint becomes:
( ) ( )
1
( ) ( ) ( ) .
yj D j
dbr bt
j
R T R T B DT +
=
-
8/6/2019 Application Layer Optimization for Efficient Video
4/5
2) Step 2:Find the optimal buffer sizes for which the distortion can
be minimized, as given in section III.A. This sets an upper bound on the size of transcoder and decoder buffer for
which the optimization is achieved.3) Step 3:
For the given frame, after capping the transcoder and
decoder buffer sizes to a fixed value determined in step 2,
the new transcoding rates are calculated to ensure that the
constraints are not violated. It is proposed here to further
reduce the video transcoding rate if the channel condition is
bad. This will not only help improve the loading on the
network, but also smooth-out the transcoded video stream.
For the moderate channel, it is proposed to use the
calculated video bit-rate as is to take the full advantage of
the current channel state. When the channel condition is
good, the target bit-rate is increased to exploit the good
channel condition for higher video quality.
When the error correction mechanisms, e.g., joint forward
error correction (FEC) and automatic repeat request (ARQ)[7] are used for video streaming over wireless networks, the
packet transmission information is readily available at the
data-link layer. In this paper, we use the cross-layer
signaling strategy to convey the transmission and hence
channel condition information to the application layer,
where the transcoder utilizes this information for video
transcoding. An algorithm for refining the calculated target
transcoding rate is given as follows:
Algorithm I: Refining the Calculated Target Transcoding
Rate
Input: number of transmission attempts=L, ( ) ( )ybgR T
Output: ( ) ( )ybgR T
Begin( )
( )
( )
1
( ) ( )
2
( ) ( )
3
( )
{ / * c h a n n e l s t a te = * /
( ) 1 .2 ( )
}
{ / * c h a n n e l s t at e = * /
( ) ( )
}
{ / * c h a n n e l s t at e = * /
y yb g b g
y yb g b g
ybg
if L L
G o o d
R T R T
e ls e i f L L
o d e r a t e
R T R T
e ls e i f L L
Ba d
R
=
=
=
=
=
( )
( ) 0 .8 ( )
}
yb gT R T=
End
whereL1,L2 andL3 are given in section II.A.5. Note that the
multiplication factors in Algorithm I are empirically
determined values that best suit the channel conditions. For
the good channel condition, a larger value (>1.2) of the
multiplication factor would lead to the disturbance in pre-
calculated bit-budget allocation in H.264 encoder [1], hence
should be avoided. Also, a lower value (
-
8/6/2019 Application Layer Optimization for Efficient Video
5/5
packet-dropping at the data-link layer, but the rate reduction
mechanism proposed in this paper reduces the video bit-rate,
thereby lowering the probability of packet-dropping and
hence an increase in PSNR. For the good channel condition(i.e., =10
-4), a slight increase of about 0.1 dB to 0.2 dB can
be seen in all the three video sequences, when rate control
mechanism is used, due to the increase in video encoding
bit-rate as given by Algorithm I.
B. Dynamic Buffer StabilizationFrom Fig. 3, it is seen that, under moderate channel
condition (=10-2), the average buffer requirement drops by
almost an order of magnitude when both buffer and buffer
plus rate control schemes are employed. The reduction in
buffer size for the adaptive schemes is due to the fact that
buffer sizes are now calculated in real-time for each video
frame instead of being at a fixed value, as is the case whenthere is no control. Comparing only the two adaptive
schemes, there is a small increase of about 20 bits for the
case where the buffer plus rate control scheme is employed.
This is attributed to the fact that in the case of availability of
channel information, the transcoder gets another chance of
increasing or decreasing the encoding rate, based on good or
bad channel conditions, respectively. Under bad channel
condition, the encoding rate drops thus giving less number
of bits per frame, however the packet error probability
increases, hence negating the effect of lower bit-rate for the
decoder buffer. Under good channel condition, the increase
in encoding bit-rate translates to a higher buffer
requirement, but this is not a very significant increase when
compared to the fixed buffer case.
V. CONCLUSIONIn this paper, we have presented a technique for
dynamically optimizing the application layer parameters and
compared it against the case where no such scheme is
implemented. It is shown in this paper that when the channel
information is available at the application layer, the video
quality improves by up to 1 dB, which translates to a better
viewing experience. Additional saving of the application
layer buffer by approximately an order of magnitude is also
achieved, thereby decreasing the memory requirement.
ACKNOWLEDGMENT
The authors acknowledge the support of the University of
Calgary, TRLabs and National Sciences and Engineering
Research Council (NSERC) Canada for this research.
REFERENCES
[1] ITU-T and ISO/IEC JTC1, Advanced video coding for genericaudiovisual services, ITU-T Recommendation H.264 ISO/IEC
14496 AVC, 2003.[2] M. Yokotsuka, Memory motivates cell-phone growth, Wireless
Systems Design, vol. 9, no. 3, 2004, pp. 2730.[3] Z. Lei and N. D. Georganas, Adaptive video transcoding and
streaming over wireless channels, Journal of System and Software,March 2004, pp. 253 270.
[4] M. van der Schaar and D. S. Turaga, Cross-layer packetization andretransmission strategies for delay-sensitive wireless multimedia
transmission, IEEE Transactions on Multimedia, vol.9, no.1, Jan.2007, pp.185-197.
[5] J. Lee and M. Kang, Design of a dynamic bandwidth reallocationscheme for hot-spot video stream transmission over the IEEE 802.11
WLAN, 2006 TENCON, IEEE Region 10 Conference, Nov. 2006.[6] V. Sgardoni, P. Ferre, A. Doufexi, A. Nix and D. Bull, Frame delay
and loss analysis for video transmission over time-correlated802.11a/g channels, IEEE 18th International Symposium on
Personal, Indoor and Mobile Radio Communications, PIMRC 2007,3-7 Sept. 2007.
[7] A. Moid and A. O. Fapojuwo, An analytical model for optimumbyte-level and packet-level FEC assignment using buffer dynamics,
Research Letters in Communications, Article ID 546184, 2008.
[8] JM Reference Software, Available at:http://iphome.hhi.de/suehring/tml, Accessed on June 20, 2008.
[9] Network Simulator (NS2), Available at: http://www.isi.edu/nsnam/ns,Accessed on May 03, 2007.
Figure 2: Comparison of the average PSNR values
0
5000
10000
15000
20000
Foreman Container Akiyo
Video Sequence
DecoderB
ufferSize(bits) No Control
Buffer Control (w/o channel info)
Buffer+Rate Control (w/ channel info)
Figure 3: Comparison of decoder buffer sizes
793