Application Layer Optimization for Efficient Video

8/6/2019 Application Layer Optimization for Efficient Video

1/5

APPLICATION LAYER OPTIMIZATION FOR EFFICIENT VIDEO

STREAMING OVER IEEE 802.11 BASED WIRELESS NETWORKS

Azfar Moid and Abraham O. FapojuwoDepartment of Electrical and Computer EngineeringUniversity of Calgary, AB, Canada, T2N 1N4

{[email protected], [email protected]}

AbstractMost of the existing video streaming systemsemploy the worst case analysis in application layer buffer

size dimensioning. Even though the worst case buffer size

dimensioning provides deterministic quality of service

(QoS) guarantees that are desirable in multimediatransmission, however, this also over-provisions the scarce

memory resources. In this paper, we propose a dynamictechnique for buffer and rate allocation under two

scenarios: 1) when the channel conditions are known a-

priori, and 2) when the channel conditions are unknown.

Simulation results show up to an order of magnitude

savings in the application layer buffer requirements for the

two scenarios considered. Furthermore, a-priori knowledgeof the channel conditions at the application layer yields an

improved video quality.

Index Terms Buffer dimensioning, Video streaming,

Wireless networks, Rate-control.

I. INTRODUCTIONThe H.264 video format is the latest state-of-the-art

international video coding standard developed by the Joint

Video Team (JVT) of ITU-T and ISO/IEC [1], which is alsouseful for video streaming purposes. For video streaming

over wireless networks, high efficiency can be achieved by

making the mobile devices adapt to the network conditions

(e.g., the real-time channel conditions, available

transmission bandwidth, traffic load, desired spatial or

temporal resolution, delay allowance, error resilience, and

so forth).

The dynamic content of video frames makes the bit-rate

of the encoded video variable in nature, necessitating buffer

management at the application layers of the transcoder and

decoder. To avoid long delays in real-time video streaming,

the transcoder and decoder buffer sizes are usually limited.

However, with smaller buffer sizes, there is an inherent risk

of packet-dropping at the application layer. Moreover, on

client devices, the memory is an important contributor to the

overall power budget [2]. Hence, an optimized solution is

required at the application layer to balance the tradeoff

between packet-delay and packet-dropping. The key to

application layer buffer management is the rate-control

schemes employed. A rate control scheme determines the

optimum encoding rate, which is used during the video

compression process for adjusting the coding parameters,

e.g., the quantization point (QP), to prevent the application

layer buffers from overflow or underflow.

The motivation for this work comes from the fact that the

dynamic buffer management for video coding is not very

well studied in the literature, mainly because of the variable

sizes of video frames. The fixed group of pictures (GOP)

size causes the periodic inclusion of intra (I-) frames for

MPEG-4 video sequences, thus making the video frames

variable in nature. In the case of H.264 video encodingscheme, where only a single or a very few I-frames are used

to encode the video sequence [1], the generated bits per

frame are relatively identical for successive frames, unlike

those for the MPEG-4 scheme. Therefore, there is a

requirement of revisiting the transcoder and decoder buffer

dynamics, to get the optimized sizes of the application layer

buffers, under the constraints of avoiding buffer overflow

and underflow. For example, Reference [3] employed the

transcoding ratio constraints to avoid overflow and

underflow of the transcoder and decoder buffers, regardless

of the video encoding scheme but assuming fixed buffer

sizes. It is shown in this paper that dynamic buffer sizes,

used in conjunction with controlling the encoding rate help prevent buffer overflow and underflow. The advantage of

dynamic buffer sizes over fixed buffer sizes is the

application layer memory savings.

The main contribution of this paper is the proposal of a

technique for dynamic buffer and rate control management,

studied with and without a-priori knowledge of channel

information at the application layer. Aside from the memory

savings, it is also shown that the a-priori knowledge of

channel information at the application layer enhances the

video quality. The problem is formulated as an optimization

problem, where the goal is to minimize the distortion

without violating the buffer constraints. Throughout the

paper, a packet refers to an IEEE 802.11 data-link layer

protocol data unit, whereas a frame denotes a video frame at

the application layer.

The paper is organized as follows. In section II, the

preliminaries of the analysis are first discussed followed by

a formal definition of the problem. Section III contains the

proposed solution scenarios for dynamically controlling the

transcoder and decoder buffers. In Section IV, simulation

results are presented and the paper concludes in section V.

978-1-4244-3508-1/09/$25.00 2009 IEEE 789


2/5

II. PROBLEM FORMULATIONA. Preliminaries

1) Model AssumptionsThe system model assumptions are as follows:

A1. The maximum size of transcoder and decoder buffers

is limited and denoted by Btmax

and Bdmax

(in bits),respectively.

A2. The decoder waits forD video frames in its buffer

before starting the decoding process. It is necessary to keep

a certain minimum threshold number of frames in the

decoder buffer to provide a cushion against any blackout

periods, in the case of buffer underflow.

A3. The decoder buffer is considered empty when there are

only D frames in the buffer. The deadline time, during

which the next frame should arrive, is assumed to meet the

criteria of maintaining the threshold ofD frames in the

decoder buffer.

A4. Transcoder and decoder buffers are empty at the

startup time t=0, i.e., Bt(t=0) = 0 and Bd(t=0) = 0,respectively. Here, Bt(t) and Bd(t) are respectively the

transcoder and decoder buffer size at time t.

2) Video DistortionVideo distortion is a measure of the pixel quality of the

received video as compared to the transmitted video. For a

given frame y, usually it is estimated as the mean-square

error (MSE) value of the difference between pixel value

( f ) of the transmitted frame and pixel value ( f ) of the

received frame, as given in (1):

{ }2

( ) ( ), , , ,

1 1 1

MSE( ) .SL MB PX N N N

y yz s g z s g

z s g

y E f f

= = =

=

(1)

In (1), NSL is the number of slices in frame y, NMB is thenumber of macro-blocks in a slice, and NPX represents the

number of pixels in a macro-block.

3) Transcoder BufferLet r(t) denote the incoming video bit-rate (in bits/sec) at

the transcoder input, r'(t) denotes the bit-rate (bits/sec) of

the transcoded video, and Rc(t) is the channel bit-rate

(bits/sec). The transcoded video bit-rate can be written as:

( ) ( ) ( )r' t t r t = , where (t) is a scaling function. After a

video frame y is processed at the transcoder, the total

number of bits generated( ) ( )ybgR T at the buffer, during a

video frame interval time T, is calculated by:

( )

( 1)

( ) ( ) ,

yT

ybg

y T

R T r' t dt

= (2)

wherey is the video frame index and T is the frame inter-

arrival time.

Similarly, the transmitted bits( )

( )y

btR T from the

transcoder buffer, during the interval (y-1)TtoyT, is:

( )

( 1)

( ) ( ) .

yT

ycbt

y T

R T R t dt

= (3)

By assumption A4, the instantaneous transcoder buffer

occupancy at any time tcan be calculated as:

( )0

( ) ( ) ( ) .

t

t cB t r' h R h dh= (4)Specifically, the transcoder buffer occupancy after

transcodingy frames is given as:

( )0

( ) ( ) ( ) .

yT

t cB yT r' h R h dh= (5)

This can also be written in discrete form as:

( ) ( )

1

( ) ( ) ( ) ,

yj j

t bg bt

j

B yT R T R T

=

= (6)

wherej is the frame index.

The expression in (6) shows that the buffer occupancy

after transcoding the yth frame is just the summation of allthe accumulated bits during the interval 0 to yT at the

transcoder buffer. Equation (6) can also be written in a

recursive manner, where the current buffer occupancy after

transcoding the yth frame is written in the form of buffer

occupancy after transcoding the (y-1)th

frame.

( )

1( ) ( ) ( ) ( )

1

( ) ( )

( ) ( ) ( ) ( ) ( ) ,

( 1) ( ) ( ) .

yj j y y

t bg bt bg bt

j

y yt bg bt

B yT R T R T R T R T

B y T R T R T

=

= +

= +

(7)

4) Decoder BufferLet r''(t) denote the rate (in bits/sec) of rendering the

video sequence to the user terminal. The number of bits

rendered ( ) ( )y

brR T to the video terminal during the interval

( 1)y T toyT, is given as:

( )

( 1)

( ) ( ) .

yT

ybr

y T

R T r'' t dt

= (8)

According to assumption A2, the decoder waits for D

frames before starting the decoding process, this

corresponds to a delay ofDT seconds. The initial decoder

buffer occupancy at t=DTis calculated by:

( )

1

( ) ( ) ,

Dj

d bt

j

B DT R T

=

= (9)

In general, the decoder buffer occupancy after decoding the

yth frame is given by:

( ) ( ) ( )

1

( ) ( ) ( ) .

yD j j

d d bt br

j

B yT B DT R T R T +

=

= + (10)

The expression given in (10) shows that the instantaneous

790


3/5

decoder buffer occupancy is a function of the initial buffer

occupancy and accumulated bits at the decoder buffer.

5) Channel EstimationAs given in [4] and [5], for an IEEE 802.11 wireless

channel, the channel information can be estimated at the

data-link layer using the number of transmission attempts.

Each transmission attempt at the data-link layer costs around-trip time (RTT), which is a measure of the delay on

the network. Because of the RTT cost, the maximum

number of transmission attempts (Rmax) is limited for time-

sensitive applications, such as video streaming. According

to [5], if the number of transmission attempts reaches Rmax,

this indicates a bad network condition. The typical Rmax

value for IEEE 802.11 based wireless network is 4 [6]. In

this paper, we introduce three thresholds L1, L2 and L3

packet transmission attempts for defining the state of the

channel. We assume the threshold L1 = 1 transmission

attempt indicates a good channel. The threshold L2 = 2

packet transmission attempts indicates a moderate channel

condition. Finally, the threshold L3 = 3 or 4 packettransmission attempts denotes a bad channel, this setting is

consistent with [4] and [5]. The channel information (i.e.,

good, moderate or bad channel condition) is available after a

successful transmission of each data-link layer packet and

this information is used for encoding the next video frame.

B. The Optimization ProblemDefine a vector G, which denotes the application layer

parameters:

{ }( )= ( ), ( ), ( ) .yt d bgB yT B yT R T G (11)

where Bt(yT), Bd(yT) and ( )ybgR T are given by eqns. (7),

(10) and (2), respectively.

Problem P1:

( ){ }

max

max

arg min MSE( ) ,

subject to:

1 0 ( ) ,

2 0 ( ) .

d d

t t

y

B yT B

B yT B

<

<

G

(12)

where MSE(y) is given by eqn. (1). According to (12), the

goal is to find the application layer parameters vectorG, for

which the video distortion is minimized without violating

the buffer constraints.

III. SOLUTION SCENARIOSThe problem P1 is solved by considering two scenarios.

A. Scenario 1:Without Knowledge of ChannelInformation

When the channel information is not known at the

application layer, the transcoding rate cannot be adapted to

the channel. The problem P1 is then solved to determine the

optimum values for the transcoder and decoder buffer,

subject to non-occurrence of buffer underflow and overflow.

For the decoder buffer, it is important to note that both the

underflow and overflow are critical, as the former will lead

to terminal screen blackout due to packet starvation, while

the latter would cause the packet-dropping eventually

leading to video jerks. In case of the transcoder, bufferoverflow is more critical than the underflow because

overflow leads to packet-dropping, hence resulting in

quality loss. Conversely, transcoder buffer underflow would

not cause much harm as the decoder still carries a cushion of

packets (assumption A2) to be displayed at the terminal.

The decoder buffer underflow can be avoided if:

0 ( )dB yT < . Applying (10) and, after rearranging the terms,

the buffer underflow constraint becomes:

( ) ( )

1

( ) ( ) ( ) .

yj D j

dbr bt

j

R T R T B DT +

=


4/5

2) Step 2:Find the optimal buffer sizes for which the distortion can

be minimized, as given in section III.A. This sets an upper bound on the size of transcoder and decoder buffer for

which the optimization is achieved.3) Step 3:

For the given frame, after capping the transcoder and

decoder buffer sizes to a fixed value determined in step 2,

the new transcoding rates are calculated to ensure that the

constraints are not violated. It is proposed here to further

reduce the video transcoding rate if the channel condition is

bad. This will not only help improve the loading on the

network, but also smooth-out the transcoded video stream.

For the moderate channel, it is proposed to use the

calculated video bit-rate as is to take the full advantage of

the current channel state. When the channel condition is

good, the target bit-rate is increased to exploit the good

channel condition for higher video quality.

When the error correction mechanisms, e.g., joint forward

error correction (FEC) and automatic repeat request (ARQ)[7] are used for video streaming over wireless networks, the

packet transmission information is readily available at the

data-link layer. In this paper, we use the cross-layer

signaling strategy to convey the transmission and hence

channel condition information to the application layer,

where the transcoder utilizes this information for video

transcoding. An algorithm for refining the calculated target

transcoding rate is given as follows:

Algorithm I: Refining the Calculated Target Transcoding

Rate

Input: number of transmission attempts=L, ( ) ( )ybgR T

Output: ( ) ( )ybgR T

Begin( )

( )

( )

1

( ) ( )

2

( ) ( )

3

( )

{ / * c h a n n e l s t a te = * /

( ) 1 .2 ( )

}

{ / * c h a n n e l s t at e = * /

( ) ( )

}

{ / * c h a n n e l s t at e = * /

y yb g b g

y yb g b g

ybg

if L L

G o o d

R T R T

e ls e i f L L

o d e r a t e

R T R T

e ls e i f L L

Ba d

R

=

=

=

=

=

( )

( ) 0 .8 ( )

}

yb gT R T=

End

whereL1,L2 andL3 are given in section II.A.5. Note that the

multiplication factors in Algorithm I are empirically

determined values that best suit the channel conditions. For

the good channel condition, a larger value (>1.2) of the

multiplication factor would lead to the disturbance in pre-

calculated bit-budget allocation in H.264 encoder [1], hence

should be avoided. Also, a lower value (


5/5

packet-dropping at the data-link layer, but the rate reduction

mechanism proposed in this paper reduces the video bit-rate,

thereby lowering the probability of packet-dropping and

hence an increase in PSNR. For the good channel condition(i.e., =10

-4), a slight increase of about 0.1 dB to 0.2 dB can

be seen in all the three video sequences, when rate control

mechanism is used, due to the increase in video encoding

bit-rate as given by Algorithm I.

B. Dynamic Buffer StabilizationFrom Fig. 3, it is seen that, under moderate channel

condition (=10-2), the average buffer requirement drops by

almost an order of magnitude when both buffer and buffer

plus rate control schemes are employed. The reduction in

buffer size for the adaptive schemes is due to the fact that

buffer sizes are now calculated in real-time for each video

frame instead of being at a fixed value, as is the case whenthere is no control. Comparing only the two adaptive

schemes, there is a small increase of about 20 bits for the

case where the buffer plus rate control scheme is employed.

This is attributed to the fact that in the case of availability of

channel information, the transcoder gets another chance of

increasing or decreasing the encoding rate, based on good or

bad channel conditions, respectively. Under bad channel

condition, the encoding rate drops thus giving less number

of bits per frame, however the packet error probability

increases, hence negating the effect of lower bit-rate for the

decoder buffer. Under good channel condition, the increase

in encoding bit-rate translates to a higher buffer

requirement, but this is not a very significant increase when

compared to the fixed buffer case.

V. CONCLUSIONIn this paper, we have presented a technique for

dynamically optimizing the application layer parameters and

compared it against the case where no such scheme is

implemented. It is shown in this paper that when the channel

information is available at the application layer, the video

quality improves by up to 1 dB, which translates to a better

viewing experience. Additional saving of the application

layer buffer by approximately an order of magnitude is also

achieved, thereby decreasing the memory requirement.

ACKNOWLEDGMENT

The authors acknowledge the support of the University of

Calgary, TRLabs and National Sciences and Engineering

Research Council (NSERC) Canada for this research.

REFERENCES

[1] ITU-T and ISO/IEC JTC1, Advanced video coding for genericaudiovisual services, ITU-T Recommendation H.264 ISO/IEC

14496 AVC, 2003.[2] M. Yokotsuka, Memory motivates cell-phone growth, Wireless

Systems Design, vol. 9, no. 3, 2004, pp. 2730.[3] Z. Lei and N. D. Georganas, Adaptive video transcoding and

streaming over wireless channels, Journal of System and Software,March 2004, pp. 253 270.

[4] M. van der Schaar and D. S. Turaga, Cross-layer packetization andretransmission strategies for delay-sensitive wireless multimedia

transmission, IEEE Transactions on Multimedia, vol.9, no.1, Jan.2007, pp.185-197.

[5] J. Lee and M. Kang, Design of a dynamic bandwidth reallocationscheme for hot-spot video stream transmission over the IEEE 802.11

WLAN, 2006 TENCON, IEEE Region 10 Conference, Nov. 2006.[6] V. Sgardoni, P. Ferre, A. Doufexi, A. Nix and D. Bull, Frame delay

and loss analysis for video transmission over time-correlated802.11a/g channels, IEEE 18th International Symposium on

Personal, Indoor and Mobile Radio Communications, PIMRC 2007,3-7 Sept. 2007.

[7] A. Moid and A. O. Fapojuwo, An analytical model for optimumbyte-level and packet-level FEC assignment using buffer dynamics,

Research Letters in Communications, Article ID 546184, 2008.

[8] JM Reference Software, Available at:http://iphome.hhi.de/suehring/tml, Accessed on June 20, 2008.

[9] Network Simulator (NS2), Available at: http://www.isi.edu/nsnam/ns,Accessed on May 03, 2007.

Figure 2: Comparison of the average PSNR values

0

5000

10000

15000

20000

Foreman Container Akiyo

Video Sequence

DecoderB

ufferSize(bits) No Control

Buffer Control (w/o channel info)

Buffer+Rate Control (w/ channel info)

Figure 3: Comparison of decoder buffer sizes

793

Application Layer Optimization for Efficient Video

Documents

Transcript of Application Layer Optimization for Efficient Video