1362 IEEE TRANSACTIONS ON ... - antoniosargyriou.netantoniosargyriou.net/papers/jnl_2009_tmm.pdf ·...

1362 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 11, NO. 7, NOVEMBER 2009

Joint Source Coding and Network-SupportedDistributed Error Control for Video Streaming

in Wireless Multihop NetworksHaiyan Luo, Student Member, IEEE, Antonios Argyriou, Member, IEEE, Dalei Wu, Student Member, IEEE, and

Song Ci, Senior Member, IEEE

Abstract—Real-time video communication over wireless mul-tihop networks has gained significant interest in the last few years.In this paper, we focus our attentions on the problem of sourcecoding and link adaptation for packetized video streaming in wire-less multihop networks when network nodes are media-aware.We consider a system where source coding is employed at thevideo encoder by selecting the encoding mode of each individualmacro-block, while error control is exercised through applica-tion-layer retransmissions at each media-aware network node.For this system model, the contribution of each communicationlink on the end-to-end video distortion is considered separately inorder to achieve globally optimal source coding and ARQ errorcontrol. To reach the globally optimal solution, we formulate theproblem of joint source and distributed error control (JSDEC)and devise a low-complexity solution algorithm based on dynamicprogramming. Extensive experiments have been carried out onthe basis of H.264/AVC codec to demonstrate the effectiveness ofthe proposed algorithm over the existing joint source and channelcoding (JSCC) algorithm in terms of PSNR perceived at thedecoder under time-varying multihop wireless links.

Index Terms—Automatic repeat request (ARQ), joint source andchannel coding (JSCC), joint source and distributed error control(JSDEC), proxy, rate-distortion (R-D) model, real-time encoding,video streaming.

I. INTRODUCTION

R EAL-TIME video communication services, such as videotelephony, video conferencing, video gaming, and mobile

TV broadcasting, are considered very important applicationsfor wireless multihop networks. However, the dissemination ofpre-compressed or real-time video over wireless networks ischaracterized by several problems [1]. Most of the problems formedia streaming applications can be identified as the lack ofbandwidth guarantees, random packet losses, and delay jitter.

Manuscript received September 12, 2008; revised May 05, 2009. Firstpublished August 18, 2009; current version published October 16, 2009.This work was supported in part by the NSF under Grants CCF-0830493 andCNS-0707944 and in part by the Layman Foundation. The associate editorcoordinating the review of this manuscript and approving it for publication wasDr. S.-H. Gary Chan.

H. Luo, D. Wu, and S. Ci are with the Department of Computer andElectronics Engineering, University of Nebraska-Lincoln, Omaha, NE68182 USA (e-mail: [email protected]; [email protected];[email protected]).

A. Argyriou is with Philips Research, Eindhoven 5656 AE, The Netherlands(e-mail: [email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TMM.2009.2030639

Fig. 1. General scenario addressed by this paper consists of multiple wirelessstations (WSTs) that generate real-time encoded unicast bitstreams and they aresubsequently forwarded by the WSTs in the end-to-end wireless path.

Even though bandwidth is increasingly available for end users,high-quality multimedia streaming applications are still band-width-demanding. Furthermore, delay jitter and packet lossesmake the problem of video delivery more challenging, espe-cially in multihop wireless mesh networks. Most of the real-timevideo encoding and streaming applications (e.g., video confer-encing) usually employ a wide range of intelligent techniques atthe endpoints in order to enhance the user-perceived quality [2].

Nonetheless, a particular characteristic of existing andemerging networks, usually overlooked by video streamingapplications, is that an end-to-end path consists of severalinterconnected physical networks which are heterogeneous andhave asymmetric properties in terms of throughput, delay, andpacket loss. Congested last-mile wireline links (e.g., DSL links)or wireless access networks (e.g., WiFi) are some of the real-lifeexamples. In such networks, employing measurements or con-gestion control in the end-to-end fashion might not provideend-systems with a correct status of the network characteristics[3], [4]. For example, some wireless hops in the end-to-endpath as shown in Fig. 1 may face bad channel conditions. For avideo streaming application, a pure end-system implementationmight unnecessarily limit the choices of the video encoder orthe streaming algorithms with respect to the optimal streamingrate and the error control strategy along the end-to-end path.Therefore, proxy-based solutions are usually proposed to dealwith this situation. However, for real-time video encodingsystems, the majority of the current proxy-based streamingapproaches have mainly focused on wireless networks only atthe last hop [2], [5]. Moreover, even for the wireline networks,

1520-9210/$26.00 © 2009 IEEE

Authorized licensed use limited to: Philips. Downloaded on November 12, 2009 at 08:38 from IEEE Xplore. Restrictions apply.

LUO et al.: JOINT SOURCE CODING AND NETWORK-SUPPORTED DISTRIBUTED ERROR CONTROL 1363

most of the current research only makes use of a single proxy[6], [7].

To address the aforementioned problems in a systematicfashion, in this work we propose a novel mechanism to in-tegrate into a joint optimization framework the parametersthat affect the encoding and transmission of a video stream.Toward this end, we focus on generalizing the approach onthe basis of joint source and channel coding (JSCC) whichcan integrate both network/transport parameters and the sourcecharacteristics for improving the system performance [8]–[10].The objective of JSCC is to distribute the available channel ratebetween source and channel bits so that the decoder distortionis minimized. For JSCC to be optimal, it is imperative that thesender has an accurate estimation of the channel characteristics,such as the available bandwidth, round-trip-time (RTT), packetloss, burst/random errors, congestion, and so on. The moreaccurate this information is, the more efficient the resourceallocation with JSCC will be. However, the situation that wedescribed will essentially translate into sub-optimal JSCCallocation over several physical channels [4], [11].

We have recognized the importance of this situation in [3],where we showed that for streaming fine-granularity scalable(FGS) video, the problem of distributed source/channel codingactually corresponds to a flow control problem. In this paper,we generalize this problem since we focus on the more fun-damental issues of real-time video encoding and transmissionover multiple tandem-connected network elements. We use theterm proxy to refer to such network nodes that are able to per-form media-aware tasks, like error control. We examine the on-line adaptation of the encoding mode of individual macroblocks(source coding) and the level of error control. To emphasize thedifference with JSCC, our proposed algorithm is called jointsource and distributed error control (JSDEC). In this paper, weregard channel error control in terms of retransmission as oneform of channel coding. This is reasonable because, in gen-eral, retransmission is also a mechanism for error correction.To the best of our knowledge, this is the first work that con-siders the problem of exercising error resilient source codingand distributed error control in a joint optimization framework.Our target application scenario is video-conferencing, as Fig. 1indicates. Note that we are not concerned with trans-rating ortrans-coding techniques, which have been heavily studied in theliterature.

The rest of this paper is organized as follows. First, we ex-plain in detail the motivation behind our idea in Section II. Next,we formulate the end-to-end video distortion estimation modelin Section III. The detailed behavior and performance analysisof an individual ARQ proxy is presented in Section IV. We for-mulate a joint rate-distortion (R-D) optimization framework andpresent a solution algorithm in Section V. Section VI presentsin detail our comprehensive experimental results. A thoroughanalysis of related works can be found in Section VII. Finally,Section VIII concludes our work. A summary of the notationused in this paper can be found in Table I.

II. JOINT SOURCE AND DISTRIBUTED ERROR CONTROL FOR

VIDEO STREAMING APPLICATIONS

To illustrate more clearly the main concept behind thiswork, we present in Fig. 2 the R-D performance of a JSCC

TABLE IPAPER NOTATION

Fig. 2. Convex hull of the feasible R-D points for encoded video transmissionfor each hop individually, and connected in tandem.

video streaming system under three different multihop wirelesschannel realizations. The curves represent the convex-hull[12] of all feasible (rate, distortion) points for the same videosequence. More specifically, the two lower curves correspondto video transmission over two different channels that haveavailable channel rates and , respectively. When eachof these channels is considered separately, the tuples of theoptimal source-channel rates that are derived from the JSCCalgorithm are and ,respectively. However, when the channels are connected in



tandem, the form of the operational joint rate-distortion curveis different. The most important implication is that the globallyoptimal source-channel bit allocation is not the same anymore.If JSCC is still employed at the encoder, and we allow thesender to probe the end-to-end path, the bottleneck link in thiscase would determine which is the “optimal” source-channelrate [4], [11]. In case there is spare bandwidth at the second hop,error protection can be increased, and this can be applied at thediscretion of existing proxies that reside between the two links.Although simple, this solution is still sub-optimal, as is demon-strated in the figure, because the server can only select asthe optimal streaming rate. Therefore, an optimal streamingresource allocation algorithm should be proposed to be able toselect the globally optimal R-D operating point. Thus, JSDECis developed in this paper to find this point by achieving thereal “optimal” source-to-channel ratio for any given availablechannel rate. To calculate the solution to this problem, theencoder must have knowledge of both the available channelrate on the next hop (i.e., ) and packet erasure rate (PER)(i.e., ) of that hop. Another way to rephrase the problem athand is the following: The encoder should calculate sourceencoding options, enabling globally optimal rate allocationbetween source and channel bits for all the hops considered inthe end-to-end path.

III. VIDEO ENCODER

Both the distortion estimation and resource allocation algo-rithms are implemented at the streaming server and encoder.The encoder executes the algorithm for joint source andchannel allocation by striving to minimize the global distortion.To achieve this, the encoder calculates for each macroblock theestimated source distortion and channel distortion. The averagechannel distortion is calculated for each packet erasure channel

, based on the reported feedback from the correspondingproxy (packet erasure rate ). By using these estimates, theencoder calculates the optimal encoding mode for each indi-vidual macroblock (source coding), while the optimal numberof retransmissions for the corresponding transport packet(channel coding) is calculated locally by each proxy. There-fore, source-channel coding is applied jointly, i.e., the encoderdecides the encoding mode while it also indirectly determinesthe optimal rate dedicated to channel coding with ARQ. Animportant advantage of the joint source and distributed errorcontrol algorithm is that the proxy does not require the notifica-tion of the optimal rate for channel coding. Furthermore, eachproxy decides individually the maximum number of allowedretransmissions for a particular video packet, since it can de-duce the optimal channel coding rate. The relationships amongencoder, proxy, and ARQ in our proposed JSDEC system areillustrated in Fig. 3.

A. End-to-End Distortion Estimation

The first challenging task is to derive a model for the videodistortion at the decoder that considers the effect of intermediateerror control proxies. Many research efforts have been devel-oped to deal with distortion estimation for hybrid motion-com-pensated video coding and transmission over lossy channels

Fig. 3. Process of video encoding and transmission at the sender/encoder andan intermediate ARQ proxy.

[13]–[16]. In this type of video coders, each video frame is rep-resented in block-shaped units of the associated luminance andchrominance samples (16 16 pixel region) called macroblocks(MBs). In the H.264 codec, macroblocks can be both intra-coded or inter-coded from samples of previous frames [17].Intra-coding is performed in the spatial domain, by referring toneighboring samples of previously coded blocks which are tothe left and/or above the block to be predicted. Inter-coding isperformed with temporal prediction from samples of previousframes.

Many coding options exist for a single macroblock, and eachof them provides different rate-distortion characteristics. In thiswork, only pre-defined macroblock encoding modes are consid-ered, since error resilient source coding is applied by selectingthe encoding mode of each particular macroblock. It is crucialto allow the encoder to trade off bit rate with error resiliency atthe macroblock level. The recursive optimal per-pixel estimate(ROPE) algorithm has been adopted to calculate distortion re-cursively across frames [18], [19], meaning that the estimationof the expected distortion for a frame currently being encodedis derived by considering the total distortion introduced in pre-vious frames.

We consider a -frame video clip .During encoding, each video frame is divided into 16 16macroblocks (MBs), which are numbered in the scan order.In our implementation, the packets are constructed such thateach packet consists of a row of MBs and is independentlydecodable. The terms of row and packet sometimes are inter-changeably used in this paper. When a packet is lost duringtransmission in the network, we use the temporal-replacementerror concealment strategy. Therefore, the motion vector of amissing MB is estimated as the median of motion vectors ofthe nearest three MBs in the preceding row. If the previousrow is lost, too, the estimated motion vector is set to zero. Thepixels in the previous frame, which are pointed by the estimatedmotion vector, are used to replace the missing pixels in thecurrent frame.

Let be the total number of packets in one video frame andthe total number of pixels in one packet. Let us denote

the original value of pixel of the th packet in frame ,the corresponding encoder reconstructed pixel value, the re-constructed pixel value at the decoder, and the expecteddistortion at the receiver for pixel of the th packet in frame

. We use the expected mean-squared error (MSE) as distortionmetric, which is commonly used in the literature [28]. Then thetotal expected distortion for the entire video sequence can



be calculated by summing up the expected distortion of all thepixels

(1)

where

(2)

Since is unknown to the encoder, it can be thought as arandom variable. To compute , the first and second mo-ments of are needed. Readers interested in the calculationof and of the ROPE method can find moredetails in [28]. It is worth mentioning that the calculation of

and depends on the used error concealmentstrategy as well as the packetization scheme. Therefore, the ex-pected distortion can be accurately calculated by ROPE underinstantaneous network conditions, which then becomes the ob-jective function in the proposed optimization framework. Theonly parameter that these formulas require is the residual packeterror rate . This parameter is updated after each packet is en-coded, since the individual contribution of each path is also con-tinuously updated.

B. Computational Complexity Considerations

By adopting the ROPE algorithm, the expected video dis-tortion can be precisely computed for every pixel. However,this advantage of precise distortion approximation comes at thecost of a modicum increase of computational complexity. First,most of the computational overhead comes from the fact thatthe distortion and two moments of for both intra-mode andinter-mode of every pixel need to be calculated. As we discussedearlier, computational complexity is reduced due to the iden-tical error concealment regardless of the macroblock encodingmode. For each pixel in an inter-coded MB, we need 16 ad-dition/multiplication operations for calculating the moments of

, while for intra-coded MB, 11 addition/multiplication oper-ations are necessary for the same operation. Thus, this compu-tational complexity of ROPE is comparable to that of DCT op-erations [17]. Furthermore, the error concealment algorithm hasto be implemented at the encoder for every block, which mightbring additional computational overhead depending on how so-phisticated the concealment algorithm is. In addition, we alsoneed to store two moments as two floating-point numbers forevery pixel. However, this additional storage complexity is neg-ligible considering modern high-performance computers.

IV. ERROR CONTROL PROXY

Despite the importance of the distortion estimation loop, thebasic component of our architecture is the error control proxy.The functionality realized at the proxy is an application-layerdelay-constrained ARQ error control algorithm for videopackets. ARQ is exercised on a local scope only between two

Fig. 4. Two-state Markov chain channel model for a single network hop.

successive nodes (i.e., proxy, sender, or receiver). A negativeacknowledgment scheme is used for this purpose. In practice,the proxy is a network node that terminates RTP sessions [21].This is necessary since it has to monitor the sequence numberspace of each RTP flow. It is worth mentioning that RTP/RTCPwas originally designed for Internet applications, assuminga much lesser varying channel environment. Therefore, theRTCP reports are usually transmitted at relatively long inter-vals, leading to performance degradation when used in wirelessnetworks due to the much unstable channel quality.

Part of the available bandwidth at the proxy is allocatedto forward incoming video packets. The remaining of theavailable bandwidth is used for retransmitting packets locallystored at the proxy. The simplicity of the delay-constrainedARQ algorithm means that the computational overhead isvery low, which is a critical requirement in practice, since weenvision that the ARQ proxy can be implemented as part ofexisting media-aware network nodes [21]. Note that other moreadvanced coding schemes could be adopted at the intermediatenodes (e.g., Raptor codes [32]).

A. Packet Loss Rate

Based on the above design, we can calculate the impact of theARQ algorithm at each proxy on the aggregate packet loss rateand the latency experienced by the transmitted video packets.

The communications network considered in this work ispacket-switched, and the adopted packet loss model is an “era-sure channel”. We consider a Markov chain for characterizingtransitions from good to bad states of each individual channelas shown in Fig. 4. For this channel model, if the packet erasurerate of hop at the network layer is and the average numberof retransmissions for a particular video transport packetis , then the resulting residual packet loss rate is .Sometimes, packet loss is not caused by packet erasurerather the excessive delay for the hop . If the transmissiondelay for hop is , the above probability can be expressed as

. Then the overall packet loss rate due to packeterasure and packet delay is

(3)

An important parameter of this formula is the actual distributionof retransmissions for a given permissible range. This value canbe calculated using the adopted channel model. The probabilityof retransmissions for a successful delivery of packet is givenby

(4)



where is the maximum transmissions for source packeton link . Given and , we can also calculate the averagenumber of retransmissions per packet for hop as

(5)

Equation (4) accounts for the two possible reasons of packetloss in the network, namely, excessive delay and channel era-sure. In the literature, this method of calculating the residualPER was first introduced in the seminal work of [20]. Regarding(5), it captures the channel erasure effect as Bernoulli distribu-tion as well as the effect of truncated ARQ by , which is alsowidely used in the literature [35], [36]. Therefore, the averagepacket transmission delay for each hop is given by

(6)

where and are the forward and the round trip delayof hop , respectively. Thus, within the proposed framework,error control (i.e., channel coding) is exercised locally at eachproxy by enforcing the maximum number of retransmissions foreach specific packet and hop .

Therefore, we can express the overall end-to-end packet lossrate for tandem proxies as

(7)

B. Network Delay Model

The only parameter which needs to be estimated now is theforward delay of each of the tandem-connected links. In thiswork, we model the one-way network delay as a Gamma distri-bution, since this distribution captures the main reason of net-work packet delays, which is due to buffering in the wirelessnodes [22]. The probability density function of the Gamma dis-tribution with rightward shift , parameters and is given by

for

(8)where is the number of end-to-end hops, is the totalend-to-end processing time, and is the waiting time at arouter following the exponential distribution and can be mod-eled as an M/M/1 queue. From the networking aspect, thismodel can be perceived as if the packet traverses a network of

M/M/1 queues, where each of them has a mean processingtime plus waiting time and variance .Therefore, the forward trip delay has a meanand variance . Both and are calculated byperiodically estimating the mean and variance of the forwardand backward trip delays at the receiver and transmitting theirvalues to the sender [20].

V. OPTIMIZATION FRAMEWORK FOR SOURCE CODING

AND DISTRIBUTED CHANNEL CODING

In this section, we jointly optimize the source coding and dis-tributed error control parameters within our proposed frame-work. The goal is to minimize the perceived video distortion

for given available link capacity and packet error rate. First,we formulate the problem as a constrained minimum distor-tion problem. Then, we give the optimal solution using dynamicprogramming.

A. Problem Formulation

The implemented ROPE algorithm can accurately estimatethe expected distortion. Meanwhile, our rate-distortion opti-mization framework takes into account the expected distortiondue to compression at the source, error propagation acrossframes, and error concealment induced by packet losses atthe erasure channel to which each proxy is attached. We in-corporate the proposed distortion model into a rate-distortionoptimization framework, which will be used at the encoder fordetermining the encoding mode of each individual MB and thecorresponding optimal protection with ARQ. In this work, weconsider the bandwidth constraint and the packet loss rate ofeach link. Therefore, the distortion minimization problem hasto satisfy multiple channel rate constraints. Also, a group ofMBs can be coded and packed into the same transport packet,which means that this group of MBs will be subject to the sameerror protection decisions throughout the existing proxies in theend-to-end path.

In the following, we will define the rate-distortion optimiza-tion problem. Let and denote the vectors of sourcecoding and channel error control parameters for the th frame,respectively. Then, the end-to-end distortion minimizationproblem is

(9)

where and are the entire sets of available vectors ofand , respectively, and is the number of tandem-con-nected nodes. , denote the rates allocated to sourcecoding and error control, respectively. is the total availablechannel rate. Furthermore, the available channel rate canbe denoted in terms of channel bandwidth :

(10)

where is the time-varying channel SNR. Therefore, the fluc-tuating channel data rate is reflected. The rate constraint forretransmitted packets must be satisfied by every proxy in theend-to-end path. Recall that the average number of retransmis-sions at each proxy is and the residual network packet lossrate is . Therefore, the bit rate constraint that needs to besatisfied for frame at each proxy is

(11)

where is the size parity bits of packet . The first term in(11) denotes the source coding rate for frame , and the secondterm is the channel rate estimation for the retransmitted packetsof frame . What we have achieved in (11) is to decouple thetransmission rate constraints for the paths connected in tandem.To facilitate further algebraic manipulations, we denote the left-



hand side of (11) as that represents the total bit rate of sourceand channel coding (i.e., ). Therefore, theobjective function and the corresponding constraint of (9) canbe denoted as

(12)

B. Optimal Solution for the Minimum Distortion Problem

For simplicity, let us denote the parameter vector ofpacket in video frame as , where

is the index of the packet of the wholevideo clip. Clearly, in (12), any selected parameter vectorresulting in the total bit rate of source and channel coding tobe greater than is not in the optimal parameter vector

. Therefore, we can make use of thisobservation to refine the distortion expression as follows:

. (13)

In other words, the average distortion of a packet with bit ratelarger than the maximum allowable bit rate is set to infinity,meaning that a feasible solution, as defined in (12), will notresult in any bit rate greater than . Therefore, the minimumdistortion problem with rate constraint can be transformed intoan unconstrained optimization problem.

Furthermore, most decoder concealment strategies introducedependencies among packets. For example, if the concealmentalgorithm uses the motion vector of the MB received earlier toconceal the lost MB, then it would cause the calculation of theexpected distortion of the current packet to depend on its pre-vious packet. Without losing generality, we assume that the cur-rent packet will depend on the latest packets , dueto the concealment strategy. To solve the optimization problem,we define a cost function which represents theminimum average distortion up to and including the packet ,given that are the decision vectors for the packets

. Let be the total packet number of the videoclip, and we have . Therefore,represents the minimum total distortion of the whole video clip.Thus, solving (12) is equivalent to solving

(14)

The key observation for deriving an efficient algorithm is thefact that given control vectors forthe packets , and the cost function

, the selection of the next controlvector is independent of the selection of the previous controlvectors . This means that the cost functioncan be expressed recursively as

(15)

This recursive representation of the cost function makesthe future step of the optimization process independent ofits past steps, which consists of the foundation for dynamicprogramming.

The convexity of is shown in Section II as well as inthe literature [12]. Therefore, the problem can now be convertedinto a graph theory problem of finding the shortest path in a di-rected acyclic graph (DAG) [31]. The computational complexityof the algorithm is (where is the cardinalityof ), which depends directly on the value of . For most cases,

is a small number, so the algorithm is much more efficientthan an exhaustive search algorithm with exponential computa-tional complexity. The exact value of is decided by the en-coder behavior. If resources allow, the proposed algorithm canalways achieve the optimal solution. However, with the increaseof , the computational complexity increases exponentially. Inthis case, it is advisable to decrease the value to lower the com-plexity, which may lead to the achievement of a sub-optimalsolution.

VI. EXPERIMENTAL RESULTS

In this section, we design experiments to evaluate the perfor-mance enhancement of the proposed JSDEC system under time-varying network conditions in terms of both available channelrate and packet loss rate.

A topology similar to the one shown in Fig. 1 is usedfor experiments in this paper. The QCIF (176 144) se-quence “Foreman” is adopted for real-time encoding with theH.264/AVC JM12.2 codec [23]. The network topology is sim-ulated using NS-2. All frames except the first are encoded asP frames. The target frame rate is set to 30 frames per second.In this paper, the channel is assumed to be frequency-flat andtime-invariant during the transmission of a packet but may varyfrom packet to packet. The average SNR of each hop in a givenpath is , then the random link quality experienced by eachpacket is generated according to Rayleigh distribution, whoseprobability density function (PDF) is

(16)

where is the average received SNR.The propagation delay of each link is set to 10 . In the sim-

ulations, quantization step size (QP) is considered for sourcecoding parameter. PSNR is adopted to evaluate the distortionbetween the original sequence and the reconstructed sequenceat the receiver. Since the duration of the given video sequence isshort (300 frames), the initial startup delay at the decoder play-back buffer is set to five frames. In the case of a buffer underflow,re-buffering is performed until two more frames are received.We compare the proposed JSDEC system with a typical JSCCsystem where each proxy employs error control individually,without coordination with the encoder. In the adopted JSCCsystem, channel quality information (packet loss rate) is fedback to the source node for the optimization of the encoder be-havior by choosing the optimal source coding parameters. How-ever, this existing JSCC system is not able to control the ARQbehavior. Both systems adopt the same packetization schemeunder the same network environment to ensure fairness. In all



Fig. 5. Frame PSNR comparison between the proposed JSDEC system andthe JSCC system with constant packet loss rates. �� .

Fig. 6. Frame PSNR comparison between the proposed JSDEC system andthe JSCC system with constant packet loss rates. �� .

experiments, we set the number of frames stored at each proxyequal to four.

First, we present the experimental results for the proposedJSDEC system comparing with the existing JSCC system, whenthe packet loss rate is constant. The results for two tandem con-nected hops (i.e., one ARQ proxy) when and

are shown in Fig. 5 and Fig. 6, respectively. Wecompare the performance difference of the JSCC and JSDECsystems while the feedback sent from the proxy back to the en-coder is subject to a delay equal to the transmission delay of oneframe. The two links attached to the same proxy are configuredwith the same packet erasure rate of . The y-axiscorresponds to PSNR (dB) while the x-axis corresponds to theavailable channel bandwidth on the first hop of the path (i.e.,

).

In Fig. 5, the available channel bandwidth of the second hopis set to 0.3 MHz, (i.e., ) while the availablechannel bandwidth of the first hop changes from 0.2 MHzto 0.4 MHz at a step size of 50 KHz. Our simulation results in-dicate that the proposed JSDEC system achieves the largest per-formance gain (about 6 dB) at around .When is either increased or decreased from 0.3 MHz, theperformance gain is lower, but still significant compared withthe JSCC system. However, if the bandwidth difference betweenthe two available channels become even larger, the performancegain will decrease significantly. For instance, when is ateither 0.2 MHz or 0.4 MHz, there is only 1–2 dB PSNR perfor-mance increase. To explain this, when is fixed at 0.3 MHz,the much lower value of constrains the upper bound of thesource encoding rate. This indirectly allows most of the avail-able rate on the second channel to be used for error controland thus leads to minimal channel distortion. Similarly, if thevalue of is much higher than 0.3 MHz, the channel on thefirst hop introduces little channel distortion because of the sig-nificant spare bandwidth available for error control. Neverthe-less, when the network does not operate in such highly asym-metric as far as channel rates are concerned, the performanceimprovement of decoded video quality of JSDEC over JSCC ismore considerable because the sender can calculate an optimalsource rate from a wider range, not limited by any of the twochannels. That is to say, our proposed system is more proactivein allocating part of the available channel rate for error controlsince it can lead to improvement in the overall video quality.Therefore, the resources used by source coding and error con-trol are globally optimized in a joint and distributed way toachieve the best user-perceived quality. This conclusion is alsosupported by the simulation results that we will provide in thefollowing paragraphs.

Similarly, in Fig. 6, is set to 0.6 MHz, while variesfrom 0.2 MHz to 0.4 MHz at a step size of 50 KHz. In this figure,the performance gain achieved by using JSDEC over JSCC ismuch lower compared with that shown in Fig. 5. This is becausein Fig. 6, the available bandwidth of the channel of the secondhop is always much higher than that of the first one. Thus, thereis always enough rate on the channel of the second hop to beallocated for error control, which mitigates channel distortion.

To examine the system performance under different parame-ters, we then present in Figs. 7 and 8 experimental results for aconfiguration where the packet loss rates vary within the rangeof 1%–5%, while the channel bandwidths are kept constant at0.3 MHz. Similarly, we also assume that the two links in tandemare attached through a single proxy. In both figures, we set theaverage PER of the first hop to 2% and 5%, respectively,while changes from 1% to 5%.

Based on the results shown in these two figures, we can con-clude that if either or is low, the performance does not differmuch. To explain, low PER corresponds to few packet lossesand thus little channel distortion. However, when either of thesepacket erasure rates increases, channel distortion will also in-crease. More interestingly, when the discrepancies in the packeterasure rates are considerable, and especially under high PER,the performance gain achieved by using the proposed system ismore significant when compared with the JSCC system. This



Fig. 7. Frame PSNR comparison between the proposed JSDEC system and theJSCC system with constant bandwidth. �� .

Fig. 8. Frame PSNR comparison between the proposed JSDEC system and theJSCC system with constant bandwidth. �� .

result essentially means that it is more crucial to use JSDECwhen the two hops undergo significant difference in link quality,which also explains that the proposed system is particularlyuseful for wireless multihop scenarios.

Now we demonstrate another set of simulation results inFig. 9. In this case, and are both set to 0.3 MHz.We measure the bit rates allocated for error control under theJSDEC and JSCC systems when and onaverage, respectively, while changes from 1% to 5% at a stepsize of 1% for both cases. It can be observed from the figurethat with the increase of packet erasure rate, the resourcesallocated for error control always increase. More importantly,even under the same PER value, the proposed JSDEC alwaysallocates more resources for error control, which explains theperformance gains observed in Figs. 5–8. This more proactive

Fig. 9. Frame allocated channel coding rate comparison between the proposedJSDEC system and the JSCC system with constant channel rates. �� .

resource allocation strategy is used when PER differs signif-icantly across the tandem links, which is necessary given theneed for more proactive error control when increasing packetlosses lead to increasing channel distortion. However, the JSCCsystem is unable to account for these situations on the secondhop, and as a result, it cannot select a source coding rate thatenables the optimal error control on the second hop.

Furthermore, we consider a more complicated scenario,where multiple proxies (up to six) are connected in tandem.The available bandwidths in the network topology are set inthis way—each link connected in tandem is set to 0.3 MHz or0.4 MHz in a continuous and alternate way. Therefore, link 1,link 2, link 3, link 4, link 5, and link 6 are connected by proxy1, proxy 2, proxy 3, proxy 4, and proxy 5 to consist of a chainnetwork topology. The available bandwidths of link 1, link 2,link 3, link 4, link 5, and link 6 are set to 0.3 MHz, 0.4 MHz,0.3 MHz, 0.4 MHz, 0.3 MHz, and 0.4 MHz, respectively. Theaverage PER is set to 2% in the first round of simulation and3% in the second round. The PSNR performance enhancementis demonstrated in Fig. 10. The increasing number of proxiesincreases the end-to-end hops and packet loss rate as well,which in turn decreases the measured PSNR, due to the factthat an increasing number of erasure channels are used in thesimulations. Nevertheless, the proposed JSDEC system demon-strates the ability to minimize the perceived video distortionby jointly optimizing the resources allocated for source codingand error control. Moreover, as shown in this figure, evenwhen PER is low, the proposed system is still very effectiveon improving user-perceived video quality. Another importantobservation is that the JSDEC system can even achieve betterperformance than the JSCC system with a slightly lower PER.This observation especially verifies the effectiveness of theproposed system. To explain, in Fig. 10, when the number ofproxies is equal or less than four, the proposed JSDEC systemwith achieves a better performance gain than theJSCC system with , due to the fact that JSDEC ismore proactive to allocate resources to error control.



Fig. 10. Frame PSNR comparison between the proposed JSDEC system andthe JSCC system with multiple proxies connected in tandem.

Fig. 11. Quality comparison between the proposed JSDEC system and theJSCC system and standard H.264 codec. (a) Original. (b) JSCC. (c) JSCC.(d) Standard H.264.

To summarize, all these experimental results are geared to-wards demonstrating the performance enhancement of the pro-posed JSDEC system compared with the existing end-to-endJSCC system. Overall, our system is especially suited to an en-vironment of highly asymmetrical link qualities with availablechannel rate differences falling into a relatively small range.This is almost always true in wireless multihop environment,where the wireless bandwidth is usually the same for any givenwireless application, while the channel quality such as packeterasure rates are highly varying due to the unstable radio com-munication.

Finally, to illustrate the visual quality superiority of the pro-posed system, the 27th frame of the reconstructed sequence of“Foreman” is demonstrated in Fig. 11. The figure shows the dif-ferent reconstructed video frames achieved by using different

systems under the same simulation environment with that usedin Fig. 5. We can observe that the proposed JSDEC system pro-vides a much better user-perceived video quality.

VII. RELATED WORKS

Our work brings techniques of error resilient source codingand error control together, but in a distributed framework.Therefore, we will review mechanisms that deal with enhancingrobustness of video streaming with joint source/channel coding,and network-supported error control.

One of the most important characteristics of the proposedscheme is that the expected distortion is accurately calculatedby ROPE. This is different from the use of a distortion metric inthe literature. For example, in [29], the fixed distortion deduc-tion obtained by successfully decoding a video unit is adoptedas the objective function. Also, in [30], the resulting distortionfrom losing one or multiple video descriptions as the optimiza-tion metric is employed but the amount of the distortion is presetand fixed, which means that the codec cannot adapt to the instan-taneous network conditions. This is extremely important for theclass of applications in which we are interested.

Regarding the error control functionality, the most moderntechniques employ both application layer ARQ and forwarderror correction (FEC) for achieving optimal error control [24],[25]. Joint source and channel coding at the application layer hasalso been studied in an integrated framework when a real-timevideo encoder is employed [10], [26]. In these works that makeuse of hybrid ARQ and FEC error control mechanisms, it hasbeen shown that retransmissions are suitable for networks withlow delay, low packet loss probability, and low transmissionrate, while FEC is more suitable otherwise. Therefore, the afore-mentioned methods attempt to identify the best combination ofthe two. However, all these mechanisms consider end-to-end re-transmissions, which practically introduces unacceptable delayfor real-time interactive video communications.

Several research efforts moved this to the next level, andproposed the use of a proxy node for performing error controlacross asymmetric networks. For example, in [7], the authorspropose a method for pre-encoded video streaming through aproxy in a rate-distortion optimized way, with the assumptionthat all the interdependencies between the video data units areavailable. Clearly, this approach is computationally expensivewhen it is employed at several nodes in the end-to-end path.Another interesting work that considers multiple proxies canbe found in [34]. In that paper, the authors propose a schemethat employs FEC at intermediate nodes in the network. Thedisadvantage of that approach is that the constant overheadrequired by FEC results in the uncoordinated and thereforesub-optimal selection of the FEC parameters. In [27], theauthors provide an analysis that supports their claim that itis viable to use intermediate proxies for ARQ in real-timeInternet video streaming applications. We should also mentionanother recent trend in the area of channel coding for videostreaming applications, which is the use of rateless codes wherethe streaming server already proactively encodes the bitstream[32], [33]. However, even in these systems, the implication ofreal-time encoding is not considered.



VIII. CONCLUSIONS

In this paper, we have presented a new system for errorresilient source coding and distributed application-layer errorcontrol (JSDEC) suitable for video streaming in wireless mul-tihop networks. The proposed system has employed distributederror control by using a lightweight application-layer ARQscheme residing in the intermediate nodes. We have demon-strated that in order to achieve globally optimal joint source andchannel coding decisions for packet video transmission overwireless multihop networks, the end-to-end video distortionestimate must consider the contribution of each communica-tions link. We have derived a model for a path that consistsof multiple packet erasure channels connected in tandem.The optimization problem has been formulated to select theappropriate encoding mode for each individual macro-block, aswell as the number of retransmissions at each proxy for errorcontrol. The formulated problem has then been solved basedon dynamic programming, incurring moderate complexity atthe streaming server.

Experiments carried out with NS-2 and H.264/AVC have val-idated the efficacy of the proposed scheme for video quality im-provement. We have evaluated the proposed JSDEC system fortime-varying channels by using packet-level feedback amongthe nodes. The proposed system has been verified to be espe-cially suitable for scenarios where there are highly asymmetricpacket loss rates for the tandem links. Future work includes theinvestigation of the performance of the proposed system withmore advanced channel coding schemes and its scalability withrespect to multiple media sources.

REFERENCES

[1] Y. Wang, J. Ostermann, and Y.-Q. Zhang, Video Processing and Com-munications. Englewood Cliffs, NJ: Prentice-Hall, 2002.

[2] M. van der Schaar and P. Chou, Multimedia Over IP and Wireless Net-works. New York: Academic, 2007.

[3] A. Argyriou, “Distributed resource allocation for network-supportedFGS video streaming,” in Proc. Packet Video Workshop, Lausanne,Switzerland, Nov. 2007.

[4] J. Yan, K. Katrinis, M. May, and B. Plattner, “Media- and TCP-friendlycongestion control for scalable video streams,” IEEE Trans. Multi-media, vol. 8, no. 2, pp. 196–206, Apr. 2006.

[5] M. Chen and A. Zakhor, “Transmission protocols for streaming videoover wireless,” in Proc. IEEE Int. Conf. Image Processing (ICIP), 2004.

[6] L. Gao, Z.-L. Zhang, and D. Towsley, “Proxy-assisted techniques fordelivering continuous multimedia streams,” IEEE/ACM Trans. Netw.,vol. 11, no. 6, pp. 884–894, Dec. 2003.

[7] J. Chakareski and P. Chou, “Radio edge: Rate-distortion optimizedproxy-driven streaming from the network edge,” IEEE/ACM Trans.Netw., vol. 14, no. 6, pp. 1302–1312, Dec. 2006.

[8] G. Cheung and A. Zakhor, “Bit allocation for joint source/channelcoding of scalable video,” IEEE Trans. Image Process., vol. 9, no. 3,pp. 340–356, Mar. 2000.

[9] L. Kondi and A. Katsaggelos, “Joint source-channel coding for scal-able video using models of rate-distortion functions,” in Proc. ICASSP,2001.

[10] F. Zhai, Y. Eisenberg, T. N. Pappas, R. Berry, and A. K. Katsaggelos,“Rate-distortion optimized hybrid error control for real-time packe-tized video transmission,” IEEE Trans. Image Process., vol. 15, no. 1,pp. 40–53, Jan. 2006.

[11] M. Handley, S. Floyd, J. Pahdye, and J. Widmer, TCP Friendly RateControl (TFRC): Protocol Specification, RFC 3448, Jan. 2003.

[12] Q. Zhang, Z. Ji, W. Zhu, and Y.-Q. Zhang, “Power-minimized bit allo-cation for video communication over wireless channels,” IEEE Trans.Circuits Syst. Video Technol., vol. 12, no. 6, pp. 398–410, Jun. 2002.

[13] R. Zhang, S. L. Regunathan, and K. Rose, “Video coding with optimalinter/intra-mode switching for packet loss resilience,” IEEE J. Select.Areas Commun., vol. 18, no. 6, pp. 966–976, Jun. 2000.

[14] D. Wu, T. Hou, W. Zhu, H.-J. Lee, T. Chiang, Y.-Q. Zhang, and H.J. Chao, “On end-to-end architecture for transporting MPEG-4 videoover the internet,” IEEE Trans. Circuits Syst. Video Technol., vol. 10,no. 6, pp. 923–941, Sep. 2000.

[15] G. Cote, S. Shirani, and F. Kossentini, “Optimal mode selection andsynchronization for robust video communications over error prone net-works,” IEEE J. Select. Areas Commun., vol. 18, no. 6, pp. 952–965,Jun. 2000.

[16] A. Argyriou, “Real-time and rate-distortion optimized video streamingwith TCP,” Elsevier Signal Process.: Image Commun., vol. 22, no. 4,pp. 374–388, Apr. 2007.

[17] Draft ITU-T Recommendation and Final Draft, International Standardof Joint Video Specification (ITU-T Rec. H.264—ISO/IEC 14496-10AVC), May 2003. [Online]. Available: ftp://ftp.imtc-files.org/jvt-ex-perts/2003-03-Pattaya/JVT-G50r1.zip.

[18] Z. He, J. Cai, and C. W. Chen, “Joint source channel rate-distortionanalysis for adaptive mode selection and rate control in wireless videocoding,” IEEE Trans. Circuits Syst. Video Technol., vol. 12, no. 6, pp.511–523, Jun. 2002.

[19] Y. Zhang, W. Gao, Y. Lu, Q. Huang, and D. Zhao, “Joint source-channel rate-distortion optimization for H.264 video coding over error-prone networks,” IEEE Trans. Multimedia, vol. 9, no. 3, pp. 445–454,Apr. 2007.

[20] P. A. Chou and Z. Miao, Rate-Distortion Optimized Streaming of Pack-etized Media, Microsoft Research Tech. Rep. MSR-TR-2001-35, 2001.

[21] M. Westerlund and S. Wenger, RTP Topologies, draft-ietf-avt-topolo-gies-05.txt, Jul. 2007.

[22] A. Mukherjee, On the Dynamics and Significance of Low FrequencyComponents of Internet Load, Univ. Pennsylvania, Tech. Rep.MS-CIS-92-83, 1992.

[23] JVT Reference Software. [Online]. Available: http://bs.hhi.de/suehring/tml/download/.

[24] F. Hartanto and H. R. Sirisena, “Hybrid error control mechanism forvideo transmission in the wireless IP networks,” in Proc. IEEE 10thWorkshop Local and Metropolitan Area Networks (LANMAN), 1999.

[25] H. Seferoglu, Y. Altunbasak, O. Gurbuz, and O. Ercetin, “Rate distor-tion optimized joint ARQ-FEC scheme for real-time wireless multi-media,” in Proc. ICIP, 2005.

[26] F. Zhai, “Optimal cross-layer resource allocation for real-time videotransmission over packet lossy networks,” Ph.D. dissertation, North-western Univ., Evanston, IL, 2004.

[27] A. C. Begen and Y. Altunbasak, “An adaptive media-aware retrans-mission timeout estimation method for low-delay packet video,” IEEETrans. Multimedia, vol. 9, no. 2, pp. 332–347, Feb. 2007.

[28] R. Zhang, S. L. Regunathan, and K. Rose, “Video coding with optimalinter/intra-mode switching for packet loss resilience,” IEEE J. Select.Areas Commun., vol. 18, no. 6, pp. 966–976, Jul. 2003.

[29] Y. Andreopoulos, N. Mastronade, and M. van der Schaar, “Cross-layeroptimized video streaming over wireless multi-hop mesh networks,”IEEE J. Select. Areas Commun., vol. 24, no. 11, pp. 2104–2115, Nov.2006.

[30] S. Kompella, S. Mao, Y. Hou, and H. Sherali, “Cross-layer optimizedmultipath routing for video communications in wireless networks,”IEEE J. Select. Areas Commun., vol. 25, no. 4, pp. 831–840, May2007.

[31] G. M. Schuster and A. K. Katsaggelos, Rate-Distortion BasedVideo Compression: Optimal Video Frame Compression and ObjectBoundary Encoding. Norwell, MA: Kluwer, 1997.

[32] N. Thomos and P. Frossard, “Collaborative video streaming withRaptor network coding,” in Proc. IEEE Int. Conf. Multimedia andExpo (ICME) 2008, Hannover, Germany, accepted for publication.

[33] D. Nguyen, T. Nguyen, and X. Yang, “Multimedia wireless transmis-sion with network coding,” in Proc. Packet Video 2007, Lausanne,Switzerland, Nov. 2007.

[34] Y. Shan, I. Bajic, S. Kalyanaraman, and J. Woods, “Overlay multi-hopFEC scheme for video streaming over peer-to-peer networks,” in Proc.Int. Conf. Image Processing (ICIP), Oct. 2004.

[35] D. Bertseka and R. Gallager, Data Networks. Upper Saddle River,NJ: Prentice-Hall, 1987.

[36] O. B. Akan and I. F. Akyildiz, “ARC: The analytical rate controlscheme for real-time traffic in wireless networks,” IEEE/ACM Trans.Netw., vol. 12, no. 4, pp. 634–644, Aug. 2004.



Haiyan Luo (S’09) received the B.S. degree fromDalian Jiaotong University (formerly Dalian RailwayInstitute), Dalian, China, and the M.S. degree fromDalian University of Technology. He is currently pur-suing the Ph.D. degree in the Computer and Elec-tronics Engineering Department at the University ofNebraska-Lincoln.

His research interests focus on networking and dis-tributed system. Previously, Mr. Luo worked for BellLaboratories, Lucent Technologies as a Member ofTechnical Staff.

Mr. Luo has served as a referee for IEEE TRANSACTIONS ON WIRELESS

COMMUNICATION, IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, andseveral conferences such as IEEE GLOBECOM and IEEE ICC.

Antonios Argyriou (S’00–M’07) received theDiploma in electrical and computer engineeringfrom Democritus University of Thrace, Thrace,Greece, in 2001 and the M.S. and Ph.D. degrees inelectrical and computer engineering as a Fulbrightscholar from the Georgia Institute of Technology,Atlanta, in 2003 and 2005, respectively.

Currently, he is a Research Scientist at PhilipsResearch, Eindhoven, The Netherlands. From 2004until 2005, he was also a Research Engineer at Soft.Networks, Atlanta, and was involved in the design

of MPEG-2 and H.264 video compression algorithms. His current researchinterests include wireless networks and networked multimedia systems. He haspublished more than 30 papers in international journals and conferences.

Dalei Wu (S’05) received the B.S. and M.Eng. de-grees in electrical engineering from Shandong Uni-versity, Jinan, China, in 2001 and 2004, respectively.He is currently pursuing the Ph.D. degree in the De-partment of Computer and Electronics Engineeringat the University of Nebraska-Lincoln.

His research interests include cross-layer designand optimization over wireless networks, wirelessmultimedia communications, large-scale systemmodeling and analysis, and approximate dynamicprogramming.

Song Ci (S’98–M’02–SM’06) is an Assistant Pro-fessor of computer and electronics engineering at theUniversity of Nebraska-Lincoln. He is the Directorof the Intelligent Ubiquitous Computing Lab (iU-biComp Lab) and holds a courtesy appointment ofUNL Ph.D. in the Biomedical Engineering Program.He is also affiliated with Nebraska BiomechanicsCore Facility at the University of Nebraska at Omahaand Center for Advanced Surgical Technology(CAST) at University of Nebraska Medical Center,Omaha, NE. His research interests include dynamic

complex system modeling and optimization, green computing and powermanagement, dynamically reconfigurable embedded system, content-awarequality-driven cross-layer optimized multimedia over wireless, cognitivenetwork management and service-oriented architecture, and cyber-enablee-healthcare.

Prof. Ci is an Associate Editor of the IEEE TRANSACTIONS ON VEHICULAR

TECHNOLOGY (TVT) and serves as a Guest Editor of the IEEE TRANSACTIONS

ON MULTIMEDIA Special Issue on Quality-Driven Cross-Layer Design for Mul-timedia Communications; Guest Editor of the IEEE Network Magazine SpecialIssue on Wireless Mesh Networks: Applications, Architectures and Protocols;and Associate Editor of four other international journals. He serves as the TPCco-chair of IEEE ICCCN 2009 and as the TPC member for numerous confer-ences. He is the Vice Chair of the Computer Society of the IEEE Nebraska Sec-tion and is a member of the ACM and the ASHRAE. He is the SIG chair of theQuality-Driven Cross-Layer Design of the Multimedia Communications Tech-nical Committee (MMTC) of the IEEE Communications Society (COMSOC).


1362 IEEE TRANSACTIONS ON ... - antoniosargyriou.netantoniosargyriou.net/papers/jnl_2009_tmm.pdf ·...

Documents

Transcript of 1362 IEEE TRANSACTIONS ON ... - antoniosargyriou.netantoniosargyriou.net/papers/jnl_2009_tmm.pdf ·...