Application Layer Optimization for Efficient Video

download Application Layer Optimization for Efficient Video

of 5

Transcript of Application Layer Optimization for Efficient Video

  • 8/6/2019 Application Layer Optimization for Efficient Video

    1/5

    APPLICATION LAYER OPTIMIZATION FOR EFFICIENT VIDEO

    STREAMING OVER IEEE 802.11 BASED WIRELESS NETWORKS

    Azfar Moid and Abraham O. FapojuwoDepartment of Electrical and Computer EngineeringUniversity of Calgary, AB, Canada, T2N 1N4

    {[email protected], [email protected]}

    AbstractMost of the existing video streaming systemsemploy the worst case analysis in application layer buffer

    size dimensioning. Even though the worst case buffer size

    dimensioning provides deterministic quality of service

    (QoS) guarantees that are desirable in multimediatransmission, however, this also over-provisions the scarce

    memory resources. In this paper, we propose a dynamictechnique for buffer and rate allocation under two

    scenarios: 1) when the channel conditions are known a-

    priori, and 2) when the channel conditions are unknown.

    Simulation results show up to an order of magnitude

    savings in the application layer buffer requirements for the

    two scenarios considered. Furthermore, a-priori knowledgeof the channel conditions at the application layer yields an

    improved video quality.

    Index Terms Buffer dimensioning, Video streaming,

    Wireless networks, Rate-control.

    I. INTRODUCTIONThe H.264 video format is the latest state-of-the-art

    international video coding standard developed by the Joint

    Video Team (JVT) of ITU-T and ISO/IEC [1], which is alsouseful for video streaming purposes. For video streaming

    over wireless networks, high efficiency can be achieved by

    making the mobile devices adapt to the network conditions

    (e.g., the real-time channel conditions, available

    transmission bandwidth, traffic load, desired spatial or

    temporal resolution, delay allowance, error resilience, and

    so forth).

    The dynamic content of video frames makes the bit-rate

    of the encoded video variable in nature, necessitating buffer

    management at the application layers of the transcoder and

    decoder. To avoid long delays in real-time video streaming,

    the transcoder and decoder buffer sizes are usually limited.

    However, with smaller buffer sizes, there is an inherent risk

    of packet-dropping at the application layer. Moreover, on

    client devices, the memory is an important contributor to the

    overall power budget [2]. Hence, an optimized solution is

    required at the application layer to balance the tradeoff

    between packet-delay and packet-dropping. The key to

    application layer buffer management is the rate-control

    schemes employed. A rate control scheme determines the

    optimum encoding rate, which is used during the video

    compression process for adjusting the coding parameters,

    e.g., the quantization point (QP), to prevent the application

    layer buffers from overflow or underflow.

    The motivation for this work comes from the fact that the

    dynamic buffer management for video coding is not very

    well studied in the literature, mainly because of the variable

    sizes of video frames. The fixed group of pictures (GOP)

    size causes the periodic inclusion of intra (I-) frames for

    MPEG-4 video sequences, thus making the video frames

    variable in nature. In the case of H.264 video encodingscheme, where only a single or a very few I-frames are used

    to encode the video sequence [1], the generated bits per

    frame are relatively identical for successive frames, unlike

    those for the MPEG-4 scheme. Therefore, there is a

    requirement of revisiting the transcoder and decoder buffer

    dynamics, to get the optimized sizes of the application layer

    buffers, under the constraints of avoiding buffer overflow

    and underflow. For example, Reference [3] employed the

    transcoding ratio constraints to avoid overflow and

    underflow of the transcoder and decoder buffers, regardless

    of the video encoding scheme but assuming fixed buffer

    sizes. It is shown in this paper that dynamic buffer sizes,

    used in conjunction with controlling the encoding rate help prevent buffer overflow and underflow. The advantage of

    dynamic buffer sizes over fixed buffer sizes is the

    application layer memory savings.

    The main contribution of this paper is the proposal of a

    technique for dynamic buffer and rate control management,

    studied with and without a-priori knowledge of channel

    information at the application layer. Aside from the memory

    savings, it is also shown that the a-priori knowledge of

    channel information at the application layer enhances the

    video quality. The problem is formulated as an optimization

    problem, where the goal is to minimize the distortion

    without violating the buffer constraints. Throughout the

    paper, a packet refers to an IEEE 802.11 data-link layer

    protocol data unit, whereas a frame denotes a video frame at

    the application layer.

    The paper is organized as follows. In section II, the

    preliminaries of the analysis are first discussed followed by

    a formal definition of the problem. Section III contains the

    proposed solution scenarios for dynamically controlling the

    transcoder and decoder buffers. In Section IV, simulation

    results are presented and the paper concludes in section V.

    978-1-4244-3508-1/09/$25.00 2009 IEEE 789

  • 8/6/2019 Application Layer Optimization for Efficient Video

    2/5

    II. PROBLEM FORMULATIONA. Preliminaries

    1) Model AssumptionsThe system model assumptions are as follows:

    A1. The maximum size of transcoder and decoder buffers

    is limited and denoted by Btmax

    and Bdmax

    (in bits),respectively.

    A2. The decoder waits forD video frames in its buffer

    before starting the decoding process. It is necessary to keep

    a certain minimum threshold number of frames in the

    decoder buffer to provide a cushion against any blackout

    periods, in the case of buffer underflow.

    A3. The decoder buffer is considered empty when there are

    only D frames in the buffer. The deadline time, during

    which the next frame should arrive, is assumed to meet the

    criteria of maintaining the threshold ofD frames in the

    decoder buffer.

    A4. Transcoder and decoder buffers are empty at the

    startup time t=0, i.e., Bt(t=0) = 0 and Bd(t=0) = 0,respectively. Here, Bt(t) and Bd(t) are respectively the

    transcoder and decoder buffer size at time t.

    2) Video DistortionVideo distortion is a measure of the pixel quality of the

    received video as compared to the transmitted video. For a

    given frame y, usually it is estimated as the mean-square

    error (MSE) value of the difference between pixel value

    ( f ) of the transmitted frame and pixel value ( f ) of the

    received frame, as given in (1):

    { }2

    ( ) ( ), , , ,

    1 1 1

    MSE( ) .SL MB PX N N N

    y yz s g z s g

    z s g

    y E f f

    = = =

    =

    (1)

    In (1), NSL is the number of slices in frame y, NMB is thenumber of macro-blocks in a slice, and NPX represents the

    number of pixels in a macro-block.

    3) Transcoder BufferLet r(t) denote the incoming video bit-rate (in bits/sec) at

    the transcoder input, r'(t) denotes the bit-rate (bits/sec) of

    the transcoded video, and Rc(t) is the channel bit-rate

    (bits/sec). The transcoded video bit-rate can be written as:

    ( ) ( ) ( )r' t t r t = , where (t) is a scaling function. After a

    video frame y is processed at the transcoder, the total

    number of bits generated( ) ( )ybgR T at the buffer, during a

    video frame interval time T, is calculated by:

    ( )

    ( 1)

    ( ) ( ) ,

    yT

    ybg

    y T

    R T r' t dt

    = (2)

    wherey is the video frame index and T is the frame inter-

    arrival time.

    Similarly, the transmitted bits( )

    ( )y

    btR T from the

    transcoder buffer, during the interval (y-1)TtoyT, is:

    ( )

    ( 1)

    ( ) ( ) .

    yT

    ycbt

    y T

    R T R t dt

    = (3)

    By assumption A4, the instantaneous transcoder buffer

    occupancy at any time tcan be calculated as:

    ( )0

    ( ) ( ) ( ) .

    t

    t cB t r' h R h dh= (4)Specifically, the transcoder buffer occupancy after

    transcodingy frames is given as:

    ( )0

    ( ) ( ) ( ) .

    yT

    t cB yT r' h R h dh= (5)

    This can also be written in discrete form as:

    ( ) ( )

    1

    ( ) ( ) ( ) ,

    yj j

    t bg bt

    j

    B yT R T R T

    =

    = (6)

    wherej is the frame index.

    The expression in (6) shows that the buffer occupancy

    after transcoding the yth frame is just the summation of allthe accumulated bits during the interval 0 to yT at the

    transcoder buffer. Equation (6) can also be written in a

    recursive manner, where the current buffer occupancy after

    transcoding the yth frame is written in the form of buffer

    occupancy after transcoding the (y-1)th

    frame.

    ( )

    1( ) ( ) ( ) ( )

    1

    ( ) ( )

    ( ) ( ) ( ) ( ) ( ) ,

    ( 1) ( ) ( ) .

    yj j y y

    t bg bt bg bt

    j

    y yt bg bt

    B yT R T R T R T R T

    B y T R T R T

    =

    = +

    = +

    (7)

    4) Decoder BufferLet r''(t) denote the rate (in bits/sec) of rendering the

    video sequence to the user terminal. The number of bits

    rendered ( ) ( )y

    brR T to the video terminal during the interval

    ( 1)y T toyT, is given as:

    ( )

    ( 1)

    ( ) ( ) .

    yT

    ybr

    y T

    R T r'' t dt

    = (8)

    According to assumption A2, the decoder waits for D

    frames before starting the decoding process, this

    corresponds to a delay ofDT seconds. The initial decoder

    buffer occupancy at t=DTis calculated by:

    ( )

    1

    ( ) ( ) ,

    Dj

    d bt

    j

    B DT R T

    =

    = (9)

    In general, the decoder buffer occupancy after decoding the

    yth frame is given by:

    ( ) ( ) ( )

    1

    ( ) ( ) ( ) .

    yD j j

    d d bt br

    j

    B yT B DT R T R T +

    =

    = + (10)

    The expression given in (10) shows that the instantaneous

    790

  • 8/6/2019 Application Layer Optimization for Efficient Video

    3/5

    decoder buffer occupancy is a function of the initial buffer

    occupancy and accumulated bits at the decoder buffer.

    5) Channel EstimationAs given in [4] and [5], for an IEEE 802.11 wireless

    channel, the channel information can be estimated at the

    data-link layer using the number of transmission attempts.

    Each transmission attempt at the data-link layer costs around-trip time (RTT), which is a measure of the delay on

    the network. Because of the RTT cost, the maximum

    number of transmission attempts (Rmax) is limited for time-

    sensitive applications, such as video streaming. According

    to [5], if the number of transmission attempts reaches Rmax,

    this indicates a bad network condition. The typical Rmax

    value for IEEE 802.11 based wireless network is 4 [6]. In

    this paper, we introduce three thresholds L1, L2 and L3

    packet transmission attempts for defining the state of the

    channel. We assume the threshold L1 = 1 transmission

    attempt indicates a good channel. The threshold L2 = 2

    packet transmission attempts indicates a moderate channel

    condition. Finally, the threshold L3 = 3 or 4 packettransmission attempts denotes a bad channel, this setting is

    consistent with [4] and [5]. The channel information (i.e.,

    good, moderate or bad channel condition) is available after a

    successful transmission of each data-link layer packet and

    this information is used for encoding the next video frame.

    B. The Optimization ProblemDefine a vector G, which denotes the application layer

    parameters:

    { }( )= ( ), ( ), ( ) .yt d bgB yT B yT R T G (11)

    where Bt(yT), Bd(yT) and ( )ybgR T are given by eqns. (7),

    (10) and (2), respectively.

    Problem P1:

    ( ){ }

    max

    max

    arg min MSE( ) ,

    subject to:

    1 0 ( ) ,

    2 0 ( ) .

    d d

    t t

    y

    B yT B

    B yT B

    <

    <

    G

    (12)

    where MSE(y) is given by eqn. (1). According to (12), the

    goal is to find the application layer parameters vectorG, for

    which the video distortion is minimized without violating

    the buffer constraints.

    III. SOLUTION SCENARIOSThe problem P1 is solved by considering two scenarios.

    A. Scenario 1:Without Knowledge of ChannelInformation

    When the channel information is not known at the

    application layer, the transcoding rate cannot be adapted to

    the channel. The problem P1 is then solved to determine the

    optimum values for the transcoder and decoder buffer,

    subject to non-occurrence of buffer underflow and overflow.

    For the decoder buffer, it is important to note that both the

    underflow and overflow are critical, as the former will lead

    to terminal screen blackout due to packet starvation, while

    the latter would cause the packet-dropping eventually

    leading to video jerks. In case of the transcoder, bufferoverflow is more critical than the underflow because

    overflow leads to packet-dropping, hence resulting in

    quality loss. Conversely, transcoder buffer underflow would

    not cause much harm as the decoder still carries a cushion of

    packets (assumption A2) to be displayed at the terminal.

    The decoder buffer underflow can be avoided if:

    0 ( )dB yT < . Applying (10) and, after rearranging the terms,

    the buffer underflow constraint becomes:

    ( ) ( )

    1

    ( ) ( ) ( ) .

    yj D j

    dbr bt

    j

    R T R T B DT +

    =

  • 8/6/2019 Application Layer Optimization for Efficient Video

    4/5

    2) Step 2:Find the optimal buffer sizes for which the distortion can

    be minimized, as given in section III.A. This sets an upper bound on the size of transcoder and decoder buffer for

    which the optimization is achieved.3) Step 3:

    For the given frame, after capping the transcoder and

    decoder buffer sizes to a fixed value determined in step 2,

    the new transcoding rates are calculated to ensure that the

    constraints are not violated. It is proposed here to further

    reduce the video transcoding rate if the channel condition is

    bad. This will not only help improve the loading on the

    network, but also smooth-out the transcoded video stream.

    For the moderate channel, it is proposed to use the

    calculated video bit-rate as is to take the full advantage of

    the current channel state. When the channel condition is

    good, the target bit-rate is increased to exploit the good

    channel condition for higher video quality.

    When the error correction mechanisms, e.g., joint forward

    error correction (FEC) and automatic repeat request (ARQ)[7] are used for video streaming over wireless networks, the

    packet transmission information is readily available at the

    data-link layer. In this paper, we use the cross-layer

    signaling strategy to convey the transmission and hence

    channel condition information to the application layer,

    where the transcoder utilizes this information for video

    transcoding. An algorithm for refining the calculated target

    transcoding rate is given as follows:

    Algorithm I: Refining the Calculated Target Transcoding

    Rate

    Input: number of transmission attempts=L, ( ) ( )ybgR T

    Output: ( ) ( )ybgR T

    Begin( )

    ( )

    ( )

    1

    ( ) ( )

    2

    ( ) ( )

    3

    ( )

    { / * c h a n n e l s t a te = * /

    ( ) 1 .2 ( )

    }

    { / * c h a n n e l s t at e = * /

    ( ) ( )

    }

    { / * c h a n n e l s t at e = * /

    y yb g b g

    y yb g b g

    ybg

    if L L

    G o o d

    R T R T

    e ls e i f L L

    o d e r a t e

    R T R T

    e ls e i f L L

    Ba d

    R

    =

    =

    =

    =

    =

    ( )

    ( ) 0 .8 ( )

    }

    yb gT R T=

    End

    whereL1,L2 andL3 are given in section II.A.5. Note that the

    multiplication factors in Algorithm I are empirically

    determined values that best suit the channel conditions. For

    the good channel condition, a larger value (>1.2) of the

    multiplication factor would lead to the disturbance in pre-

    calculated bit-budget allocation in H.264 encoder [1], hence

    should be avoided. Also, a lower value (

  • 8/6/2019 Application Layer Optimization for Efficient Video

    5/5

    packet-dropping at the data-link layer, but the rate reduction

    mechanism proposed in this paper reduces the video bit-rate,

    thereby lowering the probability of packet-dropping and

    hence an increase in PSNR. For the good channel condition(i.e., =10

    -4), a slight increase of about 0.1 dB to 0.2 dB can

    be seen in all the three video sequences, when rate control

    mechanism is used, due to the increase in video encoding

    bit-rate as given by Algorithm I.

    B. Dynamic Buffer StabilizationFrom Fig. 3, it is seen that, under moderate channel

    condition (=10-2), the average buffer requirement drops by

    almost an order of magnitude when both buffer and buffer

    plus rate control schemes are employed. The reduction in

    buffer size for the adaptive schemes is due to the fact that

    buffer sizes are now calculated in real-time for each video

    frame instead of being at a fixed value, as is the case whenthere is no control. Comparing only the two adaptive

    schemes, there is a small increase of about 20 bits for the

    case where the buffer plus rate control scheme is employed.

    This is attributed to the fact that in the case of availability of

    channel information, the transcoder gets another chance of

    increasing or decreasing the encoding rate, based on good or

    bad channel conditions, respectively. Under bad channel

    condition, the encoding rate drops thus giving less number

    of bits per frame, however the packet error probability

    increases, hence negating the effect of lower bit-rate for the

    decoder buffer. Under good channel condition, the increase

    in encoding bit-rate translates to a higher buffer

    requirement, but this is not a very significant increase when

    compared to the fixed buffer case.

    V. CONCLUSIONIn this paper, we have presented a technique for

    dynamically optimizing the application layer parameters and

    compared it against the case where no such scheme is

    implemented. It is shown in this paper that when the channel

    information is available at the application layer, the video

    quality improves by up to 1 dB, which translates to a better

    viewing experience. Additional saving of the application

    layer buffer by approximately an order of magnitude is also

    achieved, thereby decreasing the memory requirement.

    ACKNOWLEDGMENT

    The authors acknowledge the support of the University of

    Calgary, TRLabs and National Sciences and Engineering

    Research Council (NSERC) Canada for this research.

    REFERENCES

    [1] ITU-T and ISO/IEC JTC1, Advanced video coding for genericaudiovisual services, ITU-T Recommendation H.264 ISO/IEC

    14496 AVC, 2003.[2] M. Yokotsuka, Memory motivates cell-phone growth, Wireless

    Systems Design, vol. 9, no. 3, 2004, pp. 2730.[3] Z. Lei and N. D. Georganas, Adaptive video transcoding and

    streaming over wireless channels, Journal of System and Software,March 2004, pp. 253 270.

    [4] M. van der Schaar and D. S. Turaga, Cross-layer packetization andretransmission strategies for delay-sensitive wireless multimedia

    transmission, IEEE Transactions on Multimedia, vol.9, no.1, Jan.2007, pp.185-197.

    [5] J. Lee and M. Kang, Design of a dynamic bandwidth reallocationscheme for hot-spot video stream transmission over the IEEE 802.11

    WLAN, 2006 TENCON, IEEE Region 10 Conference, Nov. 2006.[6] V. Sgardoni, P. Ferre, A. Doufexi, A. Nix and D. Bull, Frame delay

    and loss analysis for video transmission over time-correlated802.11a/g channels, IEEE 18th International Symposium on

    Personal, Indoor and Mobile Radio Communications, PIMRC 2007,3-7 Sept. 2007.

    [7] A. Moid and A. O. Fapojuwo, An analytical model for optimumbyte-level and packet-level FEC assignment using buffer dynamics,

    Research Letters in Communications, Article ID 546184, 2008.

    [8] JM Reference Software, Available at:http://iphome.hhi.de/suehring/tml, Accessed on June 20, 2008.

    [9] Network Simulator (NS2), Available at: http://www.isi.edu/nsnam/ns,Accessed on May 03, 2007.

    Figure 2: Comparison of the average PSNR values

    0

    5000

    10000

    15000

    20000

    Foreman Container Akiyo

    Video Sequence

    DecoderB

    ufferSize(bits) No Control

    Buffer Control (w/o channel info)

    Buffer+Rate Control (w/ channel info)

    Figure 3: Comparison of decoder buffer sizes

    793