[IEEE 2012 18th IEEE International Conference on Networks (ICON) - Singapore, Singapore...

Efficient Rate-Quantization Model for Frame Level Rate Control in Spatially Scalable Video Coding

Xuan Jing, Jo Yew Tham, Yu Wang, Kwong Huang Goh, and Wei Siong Lee Institute for Infocomm Research, Agency for Science, Technology and Research (A*STAR), Singapore

E-mail: {xjing, jytham, yuwang, khgoh, wslee}@i2r.a-star.edu.sg

Abstract— This paper addresses the quantization parameter (QP) selection problem in H.264 spatially scalable video coding (SVC). For frame level rate control in SVC, it is important to have an accurate QP selection scheme such that the target bit rate of each coding layer will be achieved. In this paper, we present an adaptive rate-quantization (R-Q) model to select the appropriate QP for each inter frame in spatial enhancement layers according to the target bit rate. The proposed algorithm introduces an efficient coding complexity estimation method by taking into consideration the inter-layer dependency between different spatial layers. Based on the coding complexity information, the R-Q model parameters can be adaptively updated. Experimental results demonstrate that compared to the traditional method, the proposed method provides better estimation accuracy for bit rate in terms of target bits mismatch error and thus it is very desirable for H.264/SVC rate control applications.

Keywords: Rate control, rate-quantization model, scalable video

coding (SVC)

I. INTRODUCTION As an extension of the H.264/AVC standard, the H.264 Scalable Video Coding (H.264/SVC) [1] achieves improved bitstream adaptability and coding efficiency than previous scalable video coding standards. By providing scalability in temporal, spatial and quality layers, the H.264/SVC standard allows the encoder to encode the input video once but enables the decoder to extract and decode sub-streams from the high-quality bitstream. Therefore the diversity of the demand for a video communication system can be met such as the diverse network conditions, various display resolutions and different computational complexity requirements. For spatially scalable video coding, the spatial layer with the smallest resolution is called the base layer (BL) and it can be encoded to the H.264/AVC compatible bitstream. For other enhancement layers (EL) with larger resolution, instead of independently encoding each enhancement layer using H.264/AVC encoder, the H.264/SVC encoder employs inter-layer prediction techniques to improve the coding efficiency. In particular, the prediction residual information, motion and mode information from the base layer can be adaptively used in the enhancement layer prediction. As a result, better compression can be achieved by exploiting both inter-layer redundancies between neighboring spatial layers and the intra-layer redundancies within the same spatial layer.

Rate control plays an important role in compressed video communications [2]. It regulates the output rate of the coded

bitstream to achieve the best tradeoff between the video quality and the available channel bandwidth. Generally, according to the buffer status and the available bandwidth constraints, the rate control module first allocates a certain number of target bits for the current coding unit, e.g., a frame or a macroblock (MB). Then based on the R-Q model, a proper quantization parameter is determined for coding this frame or MB. Based on the assumption that the video signal is Laplacian distributed, the quadratic R-Q model is proposed by Chiang et al. [3] and it has been widely used in MPEG4 and H.264/AVC video coding. He et al. [4] proposed a linear �-domain source model to determine the quantization parameter. However, since this model requires actual statistics of the DCT coefficients, it is not suitable for single-pass rate control for H.264/AVC because of the chicken and egg dilemma [6][7]. Li et al. in JVT-G012 [6] have proposed an adaptive rate control framework for H.264/AVC. Specifically, a single-pass rate control method based on the classic quadratic R-Q model is used, and a linear model for mean absolute difference (MAD) prediction is employed to solve the dilemma. Moreover, this scheme has also been adopted in JVT-W043 [5] for H.264/SVC base layer rate control. Although the rate control techniques for H.264/AVC can be directly used in H.264/SVC base layer, their efficiency will be affected when directly applied to enhancement layers because the inter-layer dependency is not considered in the original schemes. The rate control for H.264/SVC enhancement layers has not been extensively investigated and only few works can be found in the literature [9][10][11]. Liu et al. [9] extended the base layer rate control algorithm [5] to the enhancement layer by taking inter-layer dependency into consideration and using a more adaptive MAD prediction method for the R-Q model. Hu et al. [10] proposed a two-stage rate control scheme which requires pre-encoding of the frame to get the actual bits information for the second stage R-Q model determination and final quantization. Similarly, Liu et al. [11] introduced a multi-pass method to estimate Cauchy distribution based R-Q model [8] parameters for coding each frame. Due to its high computational complexity, it is more suitable for offline video coding applications. In this paper, we propose an adaptive R-Q model for spatial enhancement layer rate control in H.264/SVC. Our goal is to select a proper QP for each frame in the spatial enhancement layer such that the target bit rate can be accurately met. Considering the fact that the R-Q characteristics of spatial enhancement layer is highly related to the base layer quality and the quality of reference frames in the same layer, a novel coding complexity factor is introduced without any pre-

978-1-4673-4523-1/12/$31.00 ©2012 IEEE ICON 2012339

encoding process. We further model the R-Q relationship based on the Cauchy distribution rate-distortion analysis. The most critical part to the model accuracy is the model parameter estimation. Instead of training the model parameters by coding each frame multiple times, we propose to use a fixed initial parameter for the R-Q model with adaptive parameter updating scheme based on the coding complexity factor. Since our method is single-pass with low computational complexity, it is very suitable for real-time applications. The remainder of this paper is organized as follows. Section II presents the background and motivation of this work. The development of the proposed adaptive R-Q model will be described in Section III. Section IV provides the simulation results showing the performance of the proposed algorithm. Finally, we draw a conclusion in Section V.

II. BACKGROUND H.264/AVC provides significant coding efficiency

improvement compared with all previous standards and is now widely used in industry products and various applications. As an extension of this standard, H.264/SVC has reused many key features of H.264/AVC. One of the main features is the rate-distortion-optimized (RDO) motion estimation and mode decision. In particular, the best coding mode for each MB is selected by choosing the one which can minimize the overall Lagrangian cost function:

� � � � � �� (1) where and are the distortion and the number of bits for coding this MB. Note that, the QP is required when calculating the Lagrangian multiplier � for the current MB before it is actually encoded. However, the residual information for R-Q modeling is not available without actual encoding. This is a typical chicken and egg dilemma in rate control. One possible solution to this problem is using multi-pass based schemes [11] but at the cost of high computational complexity. Another single-pass solution will be estimating the R-Q model parameters based on previous coding results. Our model in this work belongs to the latter category.

In spatial scalable coding, in order to improve coding efficiency in comparison to encoding each spatial layer independently, inter-layer prediction mechanisms are adopted in H.264/SVC. Fig. 1 illustrates an example of inter-layer prediction in spatial scalable coding. The main purpose of designing inter-layer prediction tools is to use as much lower

layer information as possible such that the redundancies between consecutive spatial layers can be fully exploited [1]. Because the inter-layer prediction mechanisms are switchable, the H.264/SVC encoder can adaptively choose between inter-layer and intra-layer prediction based on the rate-distortion optimization framework. In other words, when designing an R-Q model for the spatial enhancement layer, the previous coding results from both the lower layer and the same layer should be taken into consideration in order to achieve better estimation accuracy. In next section, we will discuss in details the development of our proposed R-Q model which is motivated by this idea.

III. THE PROPOSED ADAPTIVE R-Q MODEL In rate control, the R-Q relationship plays a key role in

determination of QP for coding the picture. To investigate the impact of reference pictures quality on the R-Q characteristics of the spatial enhancement layer, a set of experiments have been conducted using the H.264/SVC JSVM reference software [12]. In the experiments, test videos with resolution of QCIF-CIF, CIF-4CIF were used for two spatial layer coding cases. For both base layer and enhancement layer, three target peak signal-to-noise ratios (PSNR) for reconstructed picture quality were set, i.e., PSNR=28, 34, 40 dB, which indicate the video quality of low, medium and high respectively. Therefore, there are a number of combinations in coding each video

Fig. 1 Inter-layer prediction structure in spatial scalable coding.

(a)

(b)

Fig. 2 R-Q relationship of the 5th frame in the spatial EL of Soccer (4CIF) under various target qualities of BL and EL. (a) target BL quality fixed at PSNR=34 dB and (b) target EL quality fixed at PSNR=34 dB

28 30 32 34 36 38 400

0.5

1

1.5

2

2.5

3x 105

QP

Bitr

ate

Ref-EL-PSNR=28 dBRef-EL-PSNR=34 dBRef-EL-PSNR=40 dB

28 30 32 34 36 38 400

0.5

1

1.5

2

2.5

3

3.5x 105

QP

Bitr

ate

Ref-BL-PSNR=28 dBRef-BL-PSNR=34 dBRef-BL-PSNR=40 dB

340

sequence. To achieve the target PSNR for respective spatial layer, we used multi-pass approach to exhaustively encode each picture using all possible QPs and finally coded the picture with the QP which can meet the target PSNR. Fig. 2 shows the R-Q curves of the 5th frame of Soccer in spatial enhancement layer (4CIF). Specifically, Fig. 2(a) illustrates the case when the reference frame’s quality in lower layer (base layer) is fixed and the reference frame’s quality in the same layer (enhancement layer) varies. In contrast, the R-Q results at different base layer quality but with the same enhancement layer quality are shown in Fig. 2(b). As it can be seen from these figures, the quality of reference pictures in either base layer or enhancement layer has a significant impact on the R-Q behavior of the frame in spatial enhancement layer. Generally speaking, the better the reference picture quality of either spatial layer, the smaller the output bit rate of the spatial enhancement layer for a constant QP. This observation is consistent with previous analysis of the inter-layer prediction mechanism. Because when the reference pictures’ quality in either spatial layer is high, usually better prediction of the current frame in the enhancement layer can be achieved through adaptive inter-layer prediction. As a result, the energy of residual frame will be smaller leading to a smaller output bit rate given the same quantization and vice versa.

Based on the above observations, we propose to use a combined reference quality factor (qf) as a coding complexity measure for the spatial enhancement layer which can be defined by:

� �� (2) where � �� is the reference quality factor for the ith frame in the spatial enhancement layer. When there are multiple enhancement layers, (2) can be easily extended by: � �� (3) where � �� denotes the reference quality factor for the ith frame in the kth spatial layer. Fig 2. implies that there is a high correlation between the output bit rate and the qf of a given frame in the spatial enhancement layer. To further illustrate this relationship, Fig. 3 shows the scatter plots of bit rate versus qf for Soccer and City. Obviously, there exists a strong negative relation between these two variables. From this result, we can assume that for a fixed QP, the output bit rate of

one frame in spatial enhancement layer is approximately inversely proportional to its reference quality factor qf.

In the previous video coding standards, the QP is directly used as a scale factor which controls the output bit rate and the picture quality because the quantization step size has a linear relation with QP, e.g., �� . However, this linear relation no longer exists in the H.264/AVC compatible coding framework. Instead, the nonlinear relation between QP and �� in H.264/SVC is formulated as �� !"�#$%&'. As the �� is more directly related to bit rate, we model the output bit rate as a function of �� in our proposed scheme. Through our extensive experimental results, we observed that the bit rate of a frame in the spatial enhancement layer can be modeled by: � ��$ � ( ��

) �* (4) where ( + , and - . , are model parameters that depend on the picture content, and * is a coding complexity scale factor which can be determined by: * � � �/ (5) where qf is the reference quality factor and 0 + , is a constant. In [8], Kamaci et al. have empirically justified that the Cauchy density performs better than the traditional Laplacian density for estimating the distribution of DCT coefficients. Furthermore, they used a simplified power function to approximate the Cauchy entropy. We also adopt this function as a part of our R-Q model with the combination of * as an important scale factor. In our model (5), 0 was empirically fixed to 3 based on our experimental observations. Fig. 4 shows the relationship between the bit rate and the coding complexity scale factor. As we can see from this figure, the output bit rate is proportional to the scale factor. Thus our R-Q model can adaptively estimate the bit rate of a frame in spatial enhancement layer based on its coding complexity given the corresponding Cauchy density distribution parameters. Fig. 5 illustrates an example of curve fitting result for City. It shows that the proposed R-Q model can achieve accurate estimation of the actual bit rate. However, we also observed that it is difficult to use one model with fixed parameters for all video sequences. Ideally, one can determine the two parameters (( and - ) by solving a set of equations based on the Cauchy

(a) (b)

Fig. 3 Scatter plot of bit rate versus reference quality factor qf at QP=32. (a) Soccer and (b) City

55 60 65 70 75 80 851

1.5

2

2.5

3

3.5

4x 105

Ref Quality Factor

Bitr

ate

55 60 65 70 75 80 850.5

1

1.5

2

2.5

3

3.5x 105

Ref Quality Factor

Bitr

ate

341

density distribution property. Because of the chicken and egg dilemma, we do not have access to the actual characteristics of the transform coefficients in advance. Therefore, it is desired to estimate and update the parameters based on previous coding results. Previously for H.264/AVC single layer coding, the model parameter - is specified to be in the range of -1.2 to -1.6 for P frames [8]. Based on our extensive experiments, we have found that the typical values of - for the spatial enhancement layer are in the range of -2.6 to -2.9 and generally - � 1�23 is a moderate value which works well for all the test sequences. For simplicity, in our R-Q model the value of - is fixed to -2.8 and we only have to update the other parameter 4 during our model updating procedure using

(�5�� 67�8

!9:;<$7�8= >?7�8

(6)

where �� , *�� and ��$�� are the actual output bit rate, the coding complexity scale factor and the quantization step size for the ith frame in the kth spatial layer. By solving (4)(5)(6), the quantization parameter for each frame in the spatial enhancement layer can be adaptively determined without any pre-encoding process. Because the computational complexity of our R-Q model is much lower compared to the traditional multi-pass schemes, it is more desirable for real-time H.264/SVC video coding applications.

IV. SIMULATION RESULTS The proposed algorithm has been implemented in the

H.264/SVC reference software JSVM 9.19 [12] for the spatial enhancement layer rate control. For base layer rate control method, the default JVT-W043 [5] scheme was adopted. For performance comparison, the benchmark method in [9] was implemented for spatial enhancement layer coding in JSVM. The first 150 frames of seven benchmark video sequences with various motion characteristics were used in the test as listed in Table I. In the experiments, we enabled two spatial layers e.g., one base layer and one enhancement layer where the resolution ratio between these two layers is 2. The frame coding structure was that only the first frame in each spatial layer was intra coded and all the other frames in both layers were coded as P-frames. In addition, we used the configuration with one reference frame, CABAC entropy coding, and adaptive inter-layer prediction mode for enhancement layer whereas all other encoding parameters were set as default of the JSVM software.

In order to test the rate control capabilities of both algorithms, various target bit rates were set for spatial layers with different resolutions. Note that, we adopted the default rate control scheme in JSVM software in base layer rate control for both algorithms, hence we only compare the R-Q model accuracy for the enhancement layer. We set the constant target bits for each frame in the enhancement layer to @�ABC�� %D , where � and D are the target bit rate and frame rate for the enhancement layer respectively. To measure the accuracy of bit achievement, we use the frame bits mismatch error as a criterion which is defined as follows:

E � �

FG H@�ABC�� I$ 1 @AJ�KAL I$H%

F�M� @�ABC�� I$ (7)

where � is the total number of frames and @�ABC�� I$ and @AJ�KAL I$ are the number of target bits and the number of actual output bits of ith frame. Table I tabulates the average value of E for different test video sequences. From this table, we can see that the bit achievement accuracy has been

(a)

(b)

Fig. 4 Scatter plot of bit rate versus coding complexity scale factor N at QP = 32. (a) Soccer (b) City

2 2.5 3 3.5 4 4.5 5 5.5x 10-6

1.5

2

2.5

3

3.5

x 105

1/(RefQualityFactor)3

Bitr

ate

actual datalinear fit

2 2.5 3 3.5 4 4.5 5x 10-6

1

1.5

2

2.5

3

x 105

1/(RefQualityFactor)3

Bitr

ate

actual datalinear fit

Fig. 5 R-Q estimation for spatial EL of City (4CIF) when qf=68

20 25 30 35 40 45 50 55 60

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Qstep

Bitr

ate

(bpp

)

actual ratecurve fitting

342

consistently improved for all test videos. For some tests, e.g., Foreman EL at 256kbps, our model can greatly reduce the bits mismatch error from 23.5% to 15.3% compared to the benchmark method. On average, the bits mismatch error can be reduced from 20.3% to 16.2%. This is due to the fact that our R-Q model is able to select more accurate QPs given the same target bit rate. Fig. 6 shows the frame-by-frame bit achievement results for Foreman. As it can be seen, the output bits of the proposed model are closer to the target bit rate with smaller fluctuations compared to the traditional method. Therefore, the proposed model is more robust and efficient for H.264/SVC rate control applications.

V. CONCLUSIONS

This paper presented an adaptive R-Q model for spatial enhancement layer rate control in H.264/SVC. We have investigated the relationship between the R-Q characteristics of enhancement layer and the quality of the base layer. By exploiting this inter-layer dependency, a novel coding complexity factor was introduced which takes into consideration the quality of reference frames in both base layer and the enhancement layer. In addition, the R-Q model parameters can be efficiently updated based on the coding complexity information. The computational complexity overhead is negligible because no pre-encoding process is required compared to the traditional multi-pass scheme. Simulation results show that the proposed R-Q model achieves better estimation accuracy for bit rate in terms of target bits mismatch error. In other words, the proposed R-Q model can select more accurate QP thus it is very desirable for H.264/SVC rate control applications. Future work will focus on how to optimize the bit allocation among different spatial layers based on the proposed model in order to achieve better R-D performance of the H.264/SVC encoder.

REFERENCES [1] H. Schwarz, D. Marpe, and T. Wiegand, “Overview of the

scalable video coding extension of the H.264/AVC standard,” IEEE Trans. Circuits Syst. Video Technol., vol. 17, no. 9, pp. 1103-1120, Sep. 2007.

[2] Z. Chen and K.N. Ngan, “Recent advances in rate control for video coding,” Signal Process.: Image Commun., vol. 22, pp. 19-38, Jan. 2007.

[3] T. Chiang and Y.Q. Zhang, “A new rate control scheme using quadratic rate distortion model,” IEEE Trans. Circuits Syst. Video Technol., vol. 7, no. 2, pp. 287-311, Apr. 1997.

[4] Z. He, Y. Kim, and S.K. Mitra, “Low-delay rate control for DCT video coding via � -domain source modeling,” IEEE Trans. Circuits Syst. Video Technol., vol. 11, no. 8, pp. 928-940, Aug. 2001.

[5] A. Leontaris and A.M. Tourapis, Rate Control for the Joint Scalable Video Model (JSVM), document JVT-W043, Joint Video Team, San Jose, CA, Apr. 2007.

[6] Z.G. Li, F. Pan, K.P. Lim, G.N. Feng, X. Lin, and S. Rahardja, “Adaptive basic unit layer rate control for JVT,” JVT-G012, Pattaya, Thailand, Mar. 2003.

[7] S. Ma, W. Gao, and Y. Lu, “Rate-distortion analysis for H.264/AVC video coding and its application to rate control,” IEEE Trans. Circuits Syst. Video Technol., vol. 15, no. 12, pp. 1533-1544, Dec. 2005.

[8] N. Kamaci, Y. Altunbasak, and R.M. Mersereau, “Frame bit allocation for the H.264/AVC video coder via Cauchy-Density-Based rate and distortion models,” IEEE Trans. Circuits Syst. Video Technol., vol. 15, no. 8, pp. 994-1006, Aug. 2005.

[9] Y. Liu, Z.G. Li, and Y.C. Soh, “Rate control of H.264/AVC scalable extension,” IEEE Trans. Circuits Syst. Video Technol., vol. 18, no. 1, pp. 116-121, Jan. 2008.

[10] S. Hu, H. Wang, S. Kwong, and C.-C. J. Kuo, “Novel rate-quantization model-based rate control with adaptive initialization for spatial scalable video coding,” IEEE Trans. Industrial Electronics, vol. 59, no. 3, pp. 1673-1684, Mar. 2012.

[11] J. Liu, Y. Cho, Z. Guo, and C.-C. J. Kuo, “Bit allocation for spatial scalability coding of H.264/SVC with dependent rate-distortion analysis,” IEEE Trans. Circuits Syst. Video Technol., vol. 20, no. 7, pp. 967-981, Jul. 2010.

[12] Joint Scalable Video Model JSVM 9.19 Software Package, CVS server for JSVM software, Jan. 2010.

TABLE I OVERALL R-Q MODEL ACCURACY FOR SPATIAL ENHANCEMENT LAYER RATE CONTROL

Sequence EL resolution

Target bit rate (kbps) at 30fps

EL frame bits mismatch error M (%)

BL EL [9] Proposed

Foreman CIF 64 256 23.5 15.3 128 512 19.1 14.6

News CIF 64 256 32.5 30.7 128 512 29.7 29.6

Coastguard CIF 64 256 24 22.2 128 512 24.4 14.7

City 4CIF 256 1024 19.8 19.2 512 2048 34.5 14.7

Harbour 4CIF 256 1024 14.7 13.9 512 2048 9.4 7.4

Ice 4CIF 256 1024 8.9 7.6 512 2048 8.8 7.1

Soccer 4CIF 256 1024 17.7 16.4 512 2048 16.9 13.5

Average 20.3 16.2

Fig. 6 Frame by frame bit achievement results for Foreman EL at 256kbps

0 50 100 1500.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8x 104

Frame Number

Bit

TargetProposed[9]

343

[IEEE 2012 18th IEEE International Conference on Networks (ICON) - Singapore, Singapore...

Documents

Transcript of [IEEE 2012 18th IEEE International Conference on Networks (ICON) - Singapore, Singapore...