Study and Optimization of the Deblocking Filter in H.265

23
1 Study and Optimization of the Deblocking Filter in H.265 Valay Shah Supervising Professor: Dr. K. R. Rao Project Report for EE-5359

Transcript of Study and Optimization of the Deblocking Filter in H.265

Page 1: Study and Optimization of the Deblocking Filter in H.265

1

Study and Optimization of the Deblocking

Filter in H.265

Valay Shah

Supervising Professor: Dr. K. R. Rao

Project Report for EE-5359

Page 2: Study and Optimization of the Deblocking Filter in H.265

2

Acronyms (Alphabetical Order)

AVC – Advanced video coding

AMP – Asymmetric motion partitioning

CB – Coding block

CTB – Coding tree block

CTU – Coding tree unit

CU – Coding unit

dB – Decibel

DBF – Deblocking filter

GBIM – Generalized block edge impairment matric

HM – HEVC test model

HEVC – High efficiency video coding

ITU-T – International telecommunication union – telecommunication standardization sector

ISO/IEC – International organization for standardization/International electrotechnical

commission

kbps - kilo bits per second

LCU – Largest coding unit

MSE – Mean square error

PB – Prediction block

PSNR – Peak signal to noise ratio

QP – Quantization parameter

SAO – Sample adaptive offset

SNR – Signal to noise ratio

SPS – Sequence parameter set

TB – Transform block

Page 3: Study and Optimization of the Deblocking Filter in H.265

3

Abstract: H.265 or High Efficiency Video Coding (HEVC) has been developed with a goal to

achieve significant compression relative to existing standards in the range of 50% bit rate

reduction for the same perceptual video quality [1]. HEVC has been designed to address nearly

all existing applications of H.264/AVC and additionally to focus on two key issues: (a) increased

video resolution and (b) increased use of parallel processing architectures. The significant

reduction in the bit rate comes at the cost of increased coding algorithm complexity and hence

increased processing time and higher hardware cost [1]. The objective of this project is twofold:

(a) to study the working of in-loop deblocking filter and (b) propose the modification which can

improve the performance of HEVC codec.

Overview of HEVC

The HEVC standard is designed to achieve multiple goals, including coding efficiency, ease of

transport system integration and data loss resilience, as well as implement parallel processing

architectures. In order to implement the parallel processing one needs to understand the HEVC

design and capability thoroughly [1]. The block diagram of typical HEVC encoder is shown in

figure 1. The following subsection describes the key elements of picture partitioning which is

the most powerful technique compared to its predecessors and at the same time most complex

and time consuming.

a) Division of Picture into Coding Tree Units

A picture is partitioned into coding tree units (CTUs), which each contain luma coding tree blocks (CTBs) and chroma coding tree blocks (CTBs). A luma CTB covers a rectangular picture

area of L×L samples of the luma component and the corresponding chroma CTBs cover each

L/2×L/2 samples of each of the two chroma components [1]. The value of L may be equal to 16, 32, or 64 as determined by an encoded syntax element specified in the sequence parameter

set (SPS). Compared with the traditional macroblock using a fixed array size of 16×16 luma samples, as used by all previous ITU-T and ISO/IEC JTC 1 video coding standards since H.261 [15], HEVC supports variable-size CTBs selected according to needs of encoders in terms of memory and computational requirements. The use of larger CTBs as opposed to the previous standards is particularly beneficial when encoding high-resolution video content. The luma CTB and the two chroma CTBs together with the associated syntax form a CTU. The CTU is the basic processing unit used in the standard to specify the decoding process [1].

Page 4: Study and Optimization of the Deblocking Filter in H.265

4

Fig. 1: Typical HEVC video encoder with decoder modeling elements shaded in light gray [1].

b) Division of Coding Tree Block into Coding Block

The blocks specified as luma and chroma CTBs can be directly used as coding blocks (CBs) or can

be further partitioned into multiple coding blocks. Partitioning is achieved using tree structures

[1]. The tree partitioning in HEVC is generally applied simultaneously to both luma and chroma,

although exceptions apply when certain minimum sizes are reached for chroma. The CTU

contains a quadtree syntax that allows for splitting the CBs to a selected appropriate size based

on the signal characteristics of the region that is covered by the CTB [1]. The quadtree splitting

process can be repeated until the size for a luma CB reaches a minimum allowed luma CB size

that is selected by the encoder using syntax in the SPS and is always 8×8 or larger (in units of

luma samples) [1]. The boundaries of the picture are defined in form of the minimum allowed

luma CB size. As a result, at the right and bottom edges of the picture, some CTUs may cover

regions that are partly outside the boundaries of the picture. This condition is detected by the

decoder, and the CTU quadtree is implicitly split as necessary to reduce the CB size to the point

where the entire CB will fit into the picture [1].

c) Prediction Blocks and Prediction Units

The prediction mode for the coding unit (CU) is signaled as either being intra or inter, depending on whether it uses intrapicture (spatial) prediction or interpicture (temporal) prediction. When the prediction mode is signaled as intra, the prediction block (PB) size, which is the block size at which the intrapicture prediction mode is established and is same as the CB

Page 5: Study and Optimization of the Deblocking Filter in H.265

5

size for all block sizes except for the smallest CB size that is allowed in the bitstream. Whereas in the case of interprediction, a flag is present that indicates whether the CB is split into four PB quadrants that each have their own intrapicture prediction mode [1]. The reason for allowing

this split is to enable distinct intrapicture prediction mode selections for blocks as small as 4×4

in size. When the luma intrapicture prediction operates with 4×4 blocks, the chroma

intrapicture prediction also uses 4×4 blocks [1]. The actual region size at which the intrapicture prediction operates depends on the residual coding partitioning that is described as follows: when the prediction mode is signaled as inter, it is specified whether the luma and chroma CBs are split into one, two, or four PBs. The splitting into four PBs is allowed only when the CB size is equal to the minimum allowed CB size, using an equivalent type of splitting as could otherwise be performed at the CB level of the design rather than at the PB level. When a CB is split into four PBs, each PB covers a quadrant of the CB. When a CB is split into two PBs, six types of this splitting are possible. The partitioning possibilities for interpicture-predicted CBs are depicted in figure 2. The upper partitions illustrate the cases of not splitting the CB of size

M×M, of splitting the CB into two PBs of size M×M/2 or M/2×M, or splitting it into four PBs

of size M/2×M/2. The lower four partition types in figure 1 are referred to as asymmetric motion partitioning (AMP), and are only allowed when M is 16 or larger for luma [1]. One PB of the asymmetric partition has the height or width M/4 and width or height M, respectively, and the other PB fills the rest of the CB by having a height or width of 3M/4 and width or height M. Each interpicture-predicted PB is assigned one or two motion vectors and reference picture

indices. To minimize worst-case memory bandwidth, PBs of luma size 4×4 are not allowed for

interpicture prediction, and PBs of luma sizes 4×8 and 8×4 are restricted to unipredictive coding. The interpicture prediction process is further described as follows. The luma and chroma PBs, together with the associated prediction syntax, form the PU [1].

Fig. 2: Modes of splitting a coding block (CB) into prediction blocks (PBs), subject to certain

rules. For intrapicture prediction CBs, only M×M and M/2×M/2 are supported [1].

Page 6: Study and Optimization of the Deblocking Filter in H.265

6

Fig. 3: Subdivision of a CTB into coding blocks (CBs) and transform blocks (TBs). Solid line indicate CB boundaries ad dotted line indicate TB boundaries. (a) CTB with its partitioning and (b) corresponding quadtree [1].

d) Video Coding Layer

The video coding layer of HEVC employs the same hybrid approach of inter-/intrapicture

prediction and 2-D transform coding which are also used in all video compression standards

since H.261 [15]. Figure 1 depicts the block diagram of a hybrid video encoder which can create

the output bitstream for HEVC standard [1]. In order to generate the bitstream the picture is

first split into block shaped regions, with the exact block partitioning being conveyed to the

decoder.

The first picture of the video sequence and the first picture at each clean random access point

in a video sequence is coded using only intra-picture prediction technique. For all the remaining

pictures of the sequence inter-picture prediction modes are typically used for most blocks. The

picture partitioning in HEVC is done using coding tree units (CTU) and coding tree blocks (CTB),

coding units (CU) and coding blocks (CB), prediction units (PU) and prediction blocks (PB) and

transform units (TU) and transform blocks (TB). The maximum size of the coding layer in

previous standards was the macroblock, containing a 16×16 block of luma samples and in the

case of 4:2:0 color sampling, two corresponding 8×8 blocks of chroma samples. Whereas in

HEVC standard it is called the coding block units of size L×L where L=16, 32 or 64 samples with

the larger sizes typically enabling better compression. This CTB can further be divided into

smaller blocks using a tree structure and quadtree-like signaling.

These prediction units increase the complexity of the encoder algorithm and hence require

more processing time. Some researchers have suggested that optimization of the deblocking

filter algorithm can lead to reduction in processing time [2]. The couple of ways by which this

algorithm can be modified is explained in the following sections.

Page 7: Study and Optimization of the Deblocking Filter in H.265

7

I. HEVC Deblocking Filter Design

The deblocking filter of HEVC is similar to H.264/AVC [17] and is implemented in the inter-

prediction loop. However, the design is simplified in regard to its decision making and filtering

processes which makes parallel processing easier. In HEVC, deblocking filter (DBF) followed by

an SAO filer, are applied to the reconstructed samples before writing them into the decoded

picture buffer in the decoder loop. The DBF is supposed to reduce the blocking artifacts due to

block-based coding and it is only applied to the samples located at block boundaries [1]. The

detailed deblocking filtering process of HEVC is explained in the next subsection.

a. Working of a Deblocking Filter

The deblocking filter is applied to all the samples adjacent to a prediction unit (PU) or transform

unit (TU) boundary except when the boundary is also the picture boundary or whenever the

deblocking is disabled across slice or tile boundaries. This option is signaled by the encoder. This

is pictorially shown in figure 4.

Fig. 4: Schematic showing the edges of PU, TU and picture boundary [2].

The reason for including both the PU and TU boundaries is, PU boundaries are not always

aligned with the TU boundaries in some cases of inter-picture predicted coding blocks (CB). The

syntax elements that control the deblocking filtering across the slice and tile boundaries is

situated in the SPS and slice headers. In HEVC the deblocking filter is applied to the edges that

are aligned on an 8×8 sample grid, for both the luma and chroma samples instead of a 4×4

sample grid basis as was used in H.264/AVC. This restriction reduces the worst case

computational complexity without noticeable degradation of the image visual quality. It also

helps in improving parallel processing operation by preventing cascading interactions between

nearby filtering operations [1].

Similar to H.264/AVC scheme, the strength of the deblocking filter is controlled by the values of

several syntax elements. However three out of five strengths are only used. For example as

Page 8: Study and Optimization of the Deblocking Filter in H.265

8

shown in figure 5 given that P and Q are two adjacent blocks with a common 4×4 grid

boundary, the filter strength of 2 is assigned when one of the block is predicted using intra-

picture prediction. Otherwise, the filter strength of 1 is assigned when any of the following are

met [1].

(i) P or Q has at least one nonzero transform coefficient.

(ii) The reference indices of P and Q are not equal.

(iii) The motion vectors of P and Q are not equal.

(iv) The difference between a motion vector component of P and Q is greater than or equal

to one integer sample.

The filter strength of 0 is assigned if none of the above conditions are met. In other words, the

deblocking process is not applied. Figure 5 depicts an example filtering decision of vertical edge

and pixel samples.

Fig. 5: Filtering decision example for HEVC [2].

According, to the filter strength and the average quantization parameter of P and Q, two

thresholds, tC and β, are determined from the predefined tables. One of the three cases, no

filtering, strong filtering and weak filtering is chosen based on the β value for luma samples. The

computational complexity is reduced by sharing this decision across four luma rows or columns

using the first and the last rows or columns. For chroma samples there are only two cases: no

filtering and normal filtering. When the filter strength is greater than 1 normal filtering is

applied. The filtering process is then performed using the control variables tC and β [1].

In HEVC, horizontal filtering for vertical edges for the entire image is performed followed by the

filtering of horizontal edges. This is why HEVC deblocking is also called parallel de-blocking. This

specific order enables either multiple horizontal filtering or vertical filtering processes to be

applied in parallel threads, or can still be implemented on a CTB by CTB basis with only a small

processing latency [1]. The detailed filtering process is explained in figure 6 [2].

Page 9: Study and Optimization of the Deblocking Filter in H.265

9

Fig. 6: Detailed explanation of deblocking filtering procedure in HEVC [2].

As per the basic ordering principle of HEVC, the right most horizontal edges in the current LCU

cannot be processed before the leftmost vertical edges of next LCU is processed. For example in

figure 6 the filtering on edge 21 and 22 will be done after edge 17 through 20 is completed.

From the time slot it is easy to see that the filtering for #n+1, #n and #n-1 LCU is not sequential

but alternative, which introduce 3 drawbacks as explained below:

(i) The control of the filtering is complex and the hardware cost in control part is large.

Usually, the control part cost is larger than the filtering computational part, so the

control complexity is very critical for the hardware design.

(ii) The filtering of one LCU involves the data from left, right and upper neighboring

LCUs. The cost of buffers or memory accesses will be increased.

(iii) There is latency in the process of current LCU. In other words, the filtering of current

LCU cannot be completed before the data of next LCU is available. This will decrease

the throughput of the whole decoding system.

Page 10: Study and Optimization of the Deblocking Filter in H.265

10

b. Filtering Operations

(i) Normal Filtering Operations

The filter will be active when a picture contains an inclined surface (or linear ramp surface) that

crosses a block boundary. In such cases, the signal will not be modified by the normal

deblocking filtering operations. In a normal filtering mode for a segment of four lines as shown

in figure 7, filtering operations are applied for each line [4]. The detailed math for calculating

the filtered pixel values across the block boundary is described in [4].

Fig. 7: Four-pixel long vertical block boundary formed by the adjacent blocks P and Q. Deblocking decisions are based on lines marked with the dashed line (lines 0 and 3) [4].

c. Sequence and Picture Level Adaptivity

Different video sequences in general have different characteristics; deblocking strength can be adjusted on a sequence and even on a picture basis. As mentioned earlier, the main sources of blocking artifacts are block transforms and quantization. Therefore, blocking artifact severity depends, to a large extent, on the quantization parameter QP. Therefore, in the deblocking filtering decisions, the QP value is taken into account. Thresholds β and tC depend on the average QP value of two neighboring blocks with common block edge [16] and are typically stored in corresponding tables. The dependence of these parameters on QP is shown in figures (8) – (9) [4]. The parameter β controls what edges are filtered, controls the selection between the normal and strong filter, and controls how many pixels from the block boundary are modified in the normal filtering operation. One can observe that the value of β increases with QP. Therefore, deblocking is enabled more frequently at high QP values compared to low QP values, high QP values correspond to coarse, and low QP values correspond to fine quantization. One can also see that the deblocking operation is effectively disabled for low QP values by setting one or both of β and tC to zero [4]. The parameter tC controls the selection between the normal and strong filter and determines the maximum absolute value of modifications that are allowed for the pixel values for a certain QP for both normal and strong filtering operations. This helps adaptively limit the amount of blurriness introduced by the deblocking filtering. The deblocking parameters tC and β provide adaptivity according to the QP and prediction type. However, different sequences or parts of the same sequence may have

Page 11: Study and Optimization of the Deblocking Filter in H.265

11

different characteristics. It may be important for content providers to change the amount of deblocking filtering on the sequence or even on a slice or picture basis. Therefore, deblocking adjustment parameters can be sent in the slice header or picture parameters set (PPS) to control the amount of deblocking filtering applied. The corresponding parameters are tc−offset−div2 and beta−offset−div2 [15]. These parameters specify the offsets (divided by two) that are added to the QP value before determining the β and tC values. The parameter beta−offset−div2 adjusts the number of pixels to which the deblocking filtering is applied, whereas parameter tc−offset−div2 adjusts the amount of filtering that can be applied to those pixels, as well as detection of natural edges [4].

Fig. 8: Dependence of β on QP [4].

Fig. 9: Dependence of tC on QP [4].

Page 12: Study and Optimization of the Deblocking Filter in H.265

12

II. Methods to Optimize Deblocking Operation

Several ways have been suggested by different authors to decrease either the complexity of

deblocking filtering or the time it takes to filter out the artifacts introduced by coding unit

boundaries. Some of them have been discussed in this section.

a. Unified Cross-Based Approach

A novel processing order is proposed by Li et al where the blocks are chosen and combined to

form a processing unit which is shown in figure 10 [2]. This is termed as unified-cross unit which

is different from LCU. This unit is symmetric and the edges need to be filtered are arranged in

several crosses. The benefit of this approach is that the unified-cross units are independent of

each other. The processing order for the unified-cross unit is shown in figure 11 [2].

Fig. 10: Different blocks are chosen to combine a processing unit called unified-cross unit

which is different than LCU approach [2].

Page 13: Study and Optimization of the Deblocking Filter in H.265

13

Fig. 11: Unified-cross based processing [2].

The advantages of implementing the unified-cross based processing is it can implement the

parallel processing in true sense which results in decreased computing time and less hardware

requirements. This method seems to be efficient but since it requires new hardware to be built

in order to implement the given algorithm, it would be out of scope for the current project.

b. Low Complexity Deblocking Filter Perceptual Optimization For The HEVC Codec Approach

The other technique of reducing the complexity and time consumption by the deblocking filter is suggested by Naccari et al [3]. The deblocking filter in HEVC provides two offsets to vary the amount of filtering for each image area. The perceptual optimization is performed by varying these two offsets to minimize generalized block edge impairment metric (GBIM). A low complexity deblocking filter offsets perceptual optimization is proposed to improve the GBIM quality while reducing the computational resources significantly that would have been required be a brute force approach where all possible offset values would be comprehensively tested [3]. The proposed GBIM extension comprises of two terms: (i) perceptually weighted block edge pixel difference, Mh (Mv) which basically represents the norm of the horizontal (or vertical) block edge pixel differences, weighted by the perceptual weight wp. (ii) Perceptually weighted non block edge average difference, Eh (Ev) which represents the norm of the average for those pixels between horizontal (or vertical) block edges. The frame level GBIMf is calculated by using (1) [3].

Page 14: Study and Optimization of the Deblocking Filter in H.265

14

)(5.0)(5.0v

v

h

hf

E

M

E

MGBIM (1)

III. Scope Of This Project

The main objective of this project is to reduce the processing time by modifying the deblocking

algorithm. This can be achieved by implementing various algorithms as suggested by experts in

the literature on HEVC [2]-[5]. The output can be compared by using various test sequences

suggested and made available by HEVC standard development committee. Some of the ways

that will be implemented on the HEVC test codec also known as HM (HEVC Test Model) code in

this project are listed below:

(i) To understand and implement the unified-cross based processing in deblocking

filtering unit in HM codec [2].

(ii) To understand and implement a low complexity offsets perceptual optimization for

deblocking filtering unit in HM codec [3].

(iii) To understand and implement the skipping mode technique in order to decrease

edge processing thereby reducing the power consumption in deblocking filtering

unit in HM codec.

(iv) To compare the HEVC performance after implementing (i)-(iii) in HM codec with

H.264/AVC codec performance using processor clock cycle, mean square error (MSE)

and signal to noise ratio (SNR).

IV. Simulation Setup and Test Sequences Details

System description that was used to simulate the test sequences:

System type: 64-bit Windows operating system

Processor: Intel Core i3-2330M CPU @ 2.20 GHz

System memory (RAM): 4.00 GB (3.90 GB usable)

The standard test sequences provided by the HEVC JCT-VC [19] were encoded using

encoder_intra_main.cfg configuration file in two different ways: (a) by disabling the deblocking

filter and (b) varying the deblocking parameter to get the best results in terms of PSNR in dB,

total time needed in seconds for encoding and bit-rate in kbps. These results were obtained by

running four test sequences as listed in table 1 and the pictures of these test sequences are

shown in figure 12 through 15.

Page 15: Study and Optimization of the Deblocking Filter in H.265

15

Sequence # Test Sequence Resolution (megapixels)

Frequency (Hz)

1 BasketBall.yuv 832×480 50

2 BQMall.yuv 832×480 60

3 RaceHorses.yuv 416×240 30

4 Kirsten&Sara.yuv 1280x720 60

Table 1: List of test sequences used to generate results with description.

Fig. 12: Picture of test sequence Basketball.yuv with resolution 832×480 with the frame rate =

50 frames/sec.

Fig. 13: Picture of test sequence BQmall.yuv with resolution 832×480 with the frame rate = 60

frames/sec.

Page 16: Study and Optimization of the Deblocking Filter in H.265

16

Fig. 14: Picture of test sequence Racehorses.yuv with resolution 416×240 with the frame rate =

30 frames/sec.

Fig. 15: Picture of test sequence Kirsten&Sara.yuv with resolution 1280×720 with the frame rate

= 60 frames/sec.

Equation 1 is used to calculate the overall PSNR of the image using the values of Y-PSNR, U-

PSNR and V-PSNR.

PSNR

VPSNR

UPSNR

Y68

1PSNR (1)

Page 17: Study and Optimization of the Deblocking Filter in H.265

17

V. Results

Fig. 16: Total encoding time taken by sequences 1 (Basketball.yuv), 2 (BQmall.yuv) and 3

(Racehorses.yuv) using encoder_intramain.cfg for 10 frames.

Fig. 17: Bit-rate in kbps for the sequences 1 (Basketball.yuv), 2 (BQmall.yuv) and 3

(Racehorses.yuv) when using encoder_intramain.cfg for 10 frames.

Page 18: Study and Optimization of the Deblocking Filter in H.265

18

Fig. 18: PSNR in dB for the sequences 1 (Basketball.yuv), 2 (BQmall.yuv) and 3 (Racehorses.yuv)

when using encoder_intramain.cfg for 10 frames.

Fig. 19: Total encoding time in seconds taken by sequences 1 (Basketball.yuv), 2 (BQmall.yuv)

and 3 (Racehorses.yuv) using encoder_lowdelay.cfg for 10 frames.

Page 19: Study and Optimization of the Deblocking Filter in H.265

19

Fig. 20: Bit-rate in kbps for the sequences 1 (Basketball.yuv), 2 (BQmall.yuv) and 3

(Racehorses.yuv) when using encoder_lowdelay.cfg for 10 frames.

Fig. 21: PSNR in dB for the sequences 1 (Basketball.yuv), 2 (BQmall.yuv) and 3 (Racehorses.yuv)

when using encoder_lowdelay.cfg for 10 frames.

Page 20: Study and Optimization of the Deblocking Filter in H.265

20

Fig. 22: Total encoding time in seconds taken by sequences 1 (Basketball.yuv), 2 (BQmall.yuv)

and 3 (Racehorses.yuv) using encoder_randomaccess.cfg for 10 frames.

Fig. 23: Bit-rate in kbps for the sequences 1 (Basketball.yuv), 2 (BQmall.yuv) and 3

(Racehorses.yuv) when using encoder_randomaccess.cfg for 10 frames.

Page 21: Study and Optimization of the Deblocking Filter in H.265

21

Fig. 24: PSNR in dB for the sequences 1 (Basketball.yuv), 2 (BQmall.yuv) and 3 (Racehorses.yuv)

when using encoder_randomaccess.cfg for 10 frames.

VI. Conclusions

It is apparent from the results as shown in figure 16 through 24 that the total encoding time has

decreased without any significant change in bit-rate or PSNR of the image by optimizing the

deblocking filter parameters. Hence, there are two benefits of applying the deblocking filter, (i)

it will help remove the blocking artifacts from the reconstructed image and (ii) it will help

decrease the total encoding time of the signal. The performance can still be improved by

including the Quantization Parameter (QP) effect on the deblocking filter parameters

optimization.

Lastly, it was observed that the simulation results are dependent on the system on which they

are performed. Hence, any system with different computational power than the one used in

this work will yield different numbers but the underlying principle will still remain the same.

VII. Future Work

The future work comprises of performing simulations on rest of the configuration (.cfg) files and

other test sequences. Quantization parameter (QP) is another key parameter which can affect

the encoding time and quality of the image.

Page 22: Study and Optimization of the Deblocking Filter in H.265

22

VIII. References

[1] G. J. Sullivan et al, “Overview of the High Efficiency Video Coding (HEVC) Standard”, IEEE

Transactions on Circuits and Systems for Video Technology, vol. 22, no. 12, pp. 1649-1668, Dec.

2012.

[2] M. Li et al, “De-blocking Filter Design for HEVC and H.264/AVC”, Advances in Multimedia

Information Processing – PCM 2012, LNCS, vol. 7674, pp. 273–284, Dec. 2012.

[3] M. Naccari et al, “Low Complexity Deblocking Filter Perceptual Optimization For The HEVC

Codec”, 18th IEEE International Conference on Image Processing (ICIP), pp. 737-740, Sep. 2011.

[4] A. Norkin et al, “HEVC Deblocking Filter”, IEEE Transactions on Circuits and Systems for

Video Technology, vol. 22, no. 12, pp. 1746-1754, Dec. 2012.

[5] A. J. Honrubia, J. L. Martínez and P. Cuenca, “HEVC: A Review, Trends and Challenges”,

Instituto de Investigación en Informática de Albacete, Spain.

[6] Ohm et al, “Comparison of the Coding Efficiency of Video Coding Standards – Including High

Efficiency Video Coding (HEVC)”, IEEE Transactions on Circuits and Systems for Video

Technology, vol. 22, no.12, pp. 1669-1684, Dec. 2010.

[7] C. Man-Yau and S. Wan-Chi, “Computationally-Scalable Motion Estimation Algorithm for

H.264/AVC Video Coding”, IEEE Transactions on Consumer Electronics, vol. 56, no. 2, pp. 895-

903, May 2010.

[8] J. Ren, N. Kehtarnavaz, and M. Budagavi, “Computationally Efficient Mode Selection in

H.264/AVC Video Coding”, IEEE Transactions on Consumer Electronics, vol. 54, no. 2, pp. 877-

886, May 2008.

[9] K. R. Rao, “High Efficiency Video Coding”, Chapter 5 – soon to be published.

[10] P. List et al, “Adaptive deblocking filter”, IEEE Transactions on Circuits and Systems for

Video Technology, vol. 13, no.7, pp. 614-619, July 2003.

[11] K. Xu and C. S. Choy, “A Five-Stage Pipeline, 204 Cycles/MB, Single-Port SRAM-Based

Deblocking Filter for H.264/AVC”, IEEE Transactions on Circuits and Systems, vol. 18, no. 3, pp.

363–374, March 2008.

[12] F. Tobajas et al, “An Efficient Double-Filter Hardware Architecture for H.264/AVC De-

blocking Filtering”, IEEE Transactions on Consumer Electronics, vol. 54, no. 1, pp. 131-139, Feb.

2008.

Page 23: Study and Optimization of the Deblocking Filter in H.265

23

[13] Y. C. Lin et al, “A Two-Result-Per-Cycle De-Blocking Filter Architecture for QFHD H.264/AVC

Decoder”, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 17, no. 6, pp.

838-843, June 2009.

[14] D. Zhou et al, “A 48 Cycles/MB H.264/AVC De-blocking Filter Architecture for Ultra High

Definition Applications”, IEICE Transactions Fundamentals, vol. E92-A, no. 12, pp. 3203-3210,

Dec. 2009.

[15] T. Wiegand et al, “Overview of the H.264/AVC Video Coding Standard”, IEEE Transactions

on Circuits and Systems for Video Technology, vol. 13, issue 7, pp. 560-576, July 2003.

[16] JM software download for H.264/AVC: http://iphome.hhi.de/suehring/tml/

[17] HM codec download for H.265:

https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/branches/

[18] HEVC standard test video sequences:

ftp://ftp.tnt.uni-hannover.de/testsequences