Research and optimization of a H.264AVC motion estimation

51
IT 13 019 Examensarbete 30 hp Mars 2013 Research and optimization of a H.264AVC motion estimation algorithm based on a 3G network Ou Yu Institutionen för informationsteknologi Department of Information Technology

Transcript of Research and optimization of a H.264AVC motion estimation

IT 13 019

Examensarbete 30 hpMars 2013

Research and optimization of a H.264AVC motion estimation algorithm based on a 3G network

Ou Yu

Institutionen för informationsteknologiDepartment of Information Technology

Teknisk- naturvetenskaplig fakultet UTH-enheten Besöksadress: Ångströmlaboratoriet Lägerhyddsvägen 1 Hus 4, Plan 0 Postadress: Box 536 751 21 Uppsala Telefon: 018 – 471 30 03 Telefax: 018 – 471 30 00 Hemsida: http://www.teknat.uu.se/student

Abstract

Research and optimization of a H.264AVC motionestimation algorithm based on a 3G network

Ou Yu

The new video codec standard H.264/AVC is jointly developed by ISO/IEC MovingPicture Expert Group MPEG and ITU-T Video Coding Experts Group [1] [2], VCEG.It has higher coding efficiency than the MPEG-4, thus could be applied to highdefinition application in low bit-rate wireless environment.[3] However H.264/AVChas harsh requirement on the hardware, basically due to the complexity of thealgorithms it used. And end devices, e.g. smart phones usually do not have sufficientcomputing capability, also it is restricted by limited battery power. As a result, it iscrucial to reduce the computing complexity of H.264/AVC codec, and in the sametime, keep the video quality unharmed.

After the analysis of the H.264/AVC coding algorithm, it can be found that ME(motion estimation) consumes the biggest part of the computing power. So in orderto adopt H.264/AVC to real-time, low bit-rate video application, it is very importantto optimize ME algorithm. In this thesis, basic knowledge and key technology ofH.264/AVC is introduced in the first place. Then it systematically illustrate theexisting block-matching ME algorithms, both the algorithm flow and differenttechnology involved, also the pros and cons of each. In the next part, a very famousalgorithm UMHexagonS, now accepted by ITU-T, is introduced in detail, and theauthor explain in different aspects why this algorithm could gain more efficiency overothers. And on the base of the analysis, the author proposes some improvement tothe UMHexagonS, taking thoughts of some classic ME algorithms into it. In the lastphase of the thesis, both Subjective quality assessment experiment and objectivequality assessment experiment are used to examine the performance of the improvedalgorithm. It has been shown by experiments that the improved ME algorithmrequires less computing power than UMHexagonS, while keeping video quality at thesame level. The improved algorithm could be used in a wireless environment such as a3G network.

Tryckt av: Reprocentralen ITCIT 13 019Examinator: Ivan ChristoffÄmnesgranskare: Ivan ChristoffHandledare: Huijuan Zhang

Table of Contents

1 Introduction ...................................................................................................................... 1

1.1 Background ............................................................................................................ 1

1.2 Related Works ........................................................................................................ 2

1.3 Thesis Outline ......................................................................................................... 4

2 Principle of Block-Based Motion Estimation Algorithm ................................................. 5

2.1 Introduction of the H.264/AVC Standard ............................................................... 5

2.2 Encoder and Decoder of the H.264/AVC Standard................................................ 6

2.3 Motion Estimation Theory ..................................................................................... 7

2.3.1 Basic concept on Motion Estimation ........................................................... 8

2.3.2 Key Principle of Motion Estimation ........................................................... 11

2.3.3 The Matching Criterion .............................................................................. 12

2.4 Summary .............................................................................................................. 13

3 Analysis of the Classic Motion Estimation Algorithm .................................................... 15

3.1 Full Search............................................................................................................ 15

3.2 Three Step Search ................................................................................................. 16

3.3 Four Step Search .................................................................................................. 17

3.4 Block Based Gradient Descent Search ................................................................. 18

3.5 Diamond Search ................................................................................................... 18

3.6 Motion Vector Field Adaptive Search ................................................................... 19

3.7 Summary .............................................................................................................. 21

4 Analysis and Optimization of Unsymmetrical Multi Hexagon Search ........................... 23

4.1 Analysis of Unsymmetrical Multi Hexagon Search ............................................. 23

4.2 Optimization of Unsymmetrical Multi Hexagon Search ..................................... 27

4.3 Optimization Details of Unsymmetrical Multi Hexagon Search ......................... 28

4.3.1 Optimization on Early Termination ............................................................ 28

4.3.2 Adoption of Movement Intensity ................................................................ 30

4.4 Implementation of the Algorithm ......................................................................... 32

4.4.1 Starting Point Prediction ............................................................................. 32

4.4.2 Search Pattern ............................................................................................. 33

4.4.3 Search Process ............................................................................................ 34

4.5 Summary .............................................................................................................. 34

5 Experiment Proof of Improved Algorithm ..................................................................... 37

5.1 Objective Quality Assessment based on JM Model ............................................. 37

5.1.1 Design of the Experiment ........................................................................... 37

5.1.2 Analysis of the Result ................................................................................. 38

5.2 Subjective Quality Assessment Using Double Stimulus Impairment Scale ........ 40

5.2.1 Design of Experiment ................................................................................. 40

5.2.2 Analysis of the Result ................................................................................. 41

5.3 Summary .............................................................................................................. 42

6 Conclusion ...................................................................................................................... 43

References ......................................................................................................................... 44

1

Chapter1 Introduction

1.1 Background

As more and more Telecommunication Service Providers (TSP) start to promote their

3G network business, the coverage of 3G network is rocketing up in China.[4] And it

is expected that two years from now, 9 cellphones out of 10 will use 3G network. By

the mean time streaming video, one of the most distinguishing features of 3G network,

will be the battle field for Telecommunication Service Providers to gain their

maximum benefits.

Traditional online streaming media usually uses early coding method. The size of the

media file is quite small, and correspondingly, the image quality is quite vague. Now,

the new generation of high-definition coding standard, H.264/AVC video coding

standard can provide better video quality in the same bit rate. And network abstraction

layer is added in the coding standard, to make it more convenient for the production

of Internet streaming application. Therefore, the technology is quite suitable for 3G

mobile multimedia usage.

However, to apply H.264/AVC coding standard efficiently on 3G platform, there are

mainly four obstacles need to be conquered, including power-consumption control,

error control, transmission rate control, compression efficiency.

The main power control problem to be solved is how to reduce the power that

streaming media tasks require, so as to extend battery life. In a mobile wireless video

transmission system, the mobile terminal not only needs to do decoding to receive

video, and also needs to encode to send video. So power control problem can be

divided into two aspects. [5]

Fault-tolerant technology is an essential part in wireless video transmission. Due to

the QoS (Quality of Service) of 3G wireless channel, fault-tolerant technology is very

crucial to ensure the accuracy and completeness of data transmission. It usually

includes lost data recovery and concealment.

Video encoding is also crucial for video communication, because the wireless channel

bandwidth is limited. And there should be balance between video content size and

output quality, and also ensure good and stable quality on the receiving end,

Compression efficiency is an important video encoding parameter. Better

2

compression efficiency means better video quality under same video file size. But at

the same time, better compression efficiency is gained at the cost of higher

complexity of coding algorithm, which leads to more computing power consumption.

For now, the CPU, memory and other hardware of an ordinary 3G phone cannot

compete with mainstream personal computer. So if we do not adjust video coding

algorithm, it will certainly not meet the mobile video platforms in 3G mobile use, not

to mention real-time encoding applications, such as video calls.

This thesis put its focus on the compression efficiency, trying to find a balanced way

to simplify the complexity of the algorithm, and in the same time not harm the video

quality significantly.

1.2 Related Works

The data compression of video signals is carried out by reducing the redundant signals.

Video signals contain two types of redundancy: statistical redundancy (also known as

spatial redundancy) and human visual redundancy. Spatial redundancy, or geometric

redundancy, is caused by the correlation between adjacent pixels. This kind of

redundancy can be erased by changing the mapping rule of the relevant pixels. For

example, if the background of the video has only one color, there will be a lot of

spatial redundancy. The other kind of redundancy, psychological redundancy, which

is caused by the nature of human visual system. Because human visual system is not

sensitive to a number of frequency component in the video. For example, human

cannot notice slightly color changes in the video. For both kinds of redundancy, the

greater the amount of redundancy is, the higher possibility of compressibility will be.

In current situation, the bandwidth of 3G network is still a bottle-neck for high quality

video transmission. Therefore, how to improve the compression efficiency of video

coding is very crucial. And many research papers are focusing on how to reduce the

complexity of coding, while ensuring the quality of coding. And most of the papers

mentioned fast motion estimation algorithm. During video compression period, more

than half of the time is spent on motion estimation (ME). The basic idea of motion

estimation is firstly to divide each frame into non-overlapping macroblocks. And then

each macroblock can find its reference macroblock in certain frames. By doing this, a

large amount of residual can be removed.

Block-matching motion estimation has its advantage over other ME algorithms,

3

including recursive estimation, Bayesian estimation and optical flow method. Because

the concept of this algorithm is more straight forward, and it is easy to implement. So

now many researchers put their energy on block-matching motion estimation, trying

to make a breakthrough in video compression efficiency. There are several classic

block-based motion estimation algorithms. Theoretically, full search algorithm (FS)

[6] is the most accurate block-matching algorithm, because it searches all the block

pixel-by-pixel to get the best motion vector (MV). But limited by the high

computational complexity, the full search is not the ideal method for real-time usage.

Later, some fast search algorithms are made to meet real-time requirement.

Three-Step Search (TSS) [7]reduces the amount of computation by reducing the

number of search pixel, but its relatively large initial search step impairs the

performance. Some many other algorithms, New Three-Step Search (NTSS)[8], New

Four-Step Search (NFSS)[9], Block-Based Gradient Descent Search (BBGDS) [10]

[11]take use of the motion vector distribution offset, greatly improved the speed and

efficiency in low-complexity video usage.

October 1999, Diamond (DS) [12]search algorithm is adopted by MPEG-4

verification model. Although the diamond method has superior overall performance

than other algorithms, and the application of DS was a big success. But still, the

performance in some particular case, for example, this algorithm does not provide

flexible way to deal with different video content, and it is easy to fall into local

optimization, which in return impacts the search performance and coding efficiency.

Hexagon Search (HEXBS) [13] is another advanced algorithm. It uses relatively large

search box and fast-moving module to reduce the search times, much less than

Diamond Search. But HEXBS also did not consider the motion vector correlation,

thus cannot handle video with intense movement quite well. Adaptive Motion Vector

Search (MVFAST) and Predicted Motion Vector Adaptive Search (PMVFAST) [14]

are included in 2001, MPEG-4 video standard. Both of algorithms use motion

correlation and related content based on movement and features to choose different

search modes, in addition, it use prediction vector and other new concept such as,

early termination, to improve both search speed and video quality. But still like

other algorithms mentioned above, these two algorithms cannot handle video with

intense movement quite well.

In chapter 3, all the algorithms mentioned above will be analyzed in detail. And based

on this, a new, hybrid algorithm will be introduced which is more suitable for 3G

4

mobile platform.

1.3 Thesis Outline

Chapter 1 is the introduction part. It covers the difficult point to apply video coding

technology to 3G network usage. Then it summarizes the current study focus, and

from which leads out to the study focus of this thesis.

Chapter 2 focuses on the basic principle and framework of H.264/AVC, especially the

motion estimation.

Chapter 3 introduces several classic motion estimation algorithms, and then analyzes

the pros and cons of each individual algorithm.

In Chapter 4, a motion estimation algorithm called UMHexagonS is analyzed. And

based on it, this thesis brings some improvements to it. And this chapter explains the

whole improvement process in detail.

Chapter 5 is the experiment proof. This thesis uses both objective and subjective ways

to prove that the improved algorithms can reduce the complexity of the motion

estimation while in the same time keep the video quality unharmed.

Chapter 6 is the conclusion.

5

Chapter2 Principle of Block-Based Motion Estimation

Algorithm

2.1 Introduction of the H.264/AVC Standard

MPEG (Moving Picture Experts Group) and VCEG (Video Coding Experts Group) have

jointly developed AVC (Advanced Video Coding), which is better than any early

video codec, like MPEG and H.263. This video codec is also known as ITU-T Rec. H.264

and MPEG-4 Part 10 standard. Here, in short, we name it H.264 /AVC or H.264. This

international standard was ITU-T adopted and officially promulgated on March, 2003. It is

widely believed that the promulgation of H.264 is a major event in the development of

video compression coding discipline and its superior compression performance will also

play an important role in all aspects of digital television broadcasting, video, real-time

communication, network video streaming delivery and multimedia messaging. Specifically,

compared with other video coding technology H.264 AVC has the following advantages:

1. Higher coding efficiency: compared with the H.263, it can save approximately

50% of the bit rate while providing the same video quality.

2. The quality of the video: H.264 can provide high-quality video images on low bit rate

channel, like 3G network, for example.

3. Improvement of network adaptability: H.264 can work in real-time, low-latency

communications applications. (Such as video conferencing) And can also be used for no

delay video storage or video streaming server.

4. Using of hybrid coding structure: similar with H.263, H.264 also use DCT

transform coding plus the DPCM coding structure. And it uses several advanced

technology, such as multi-mode motion estimation, intra prediction, multi-frame prediction,

variable length coding based on the contents, 4x4 two-dimensional integer transform new

coding method to improve the coding efficiency.

5. Less encoding options: it often needs to set quite a lot of options in H.263, which

increases the difficulty of encoding. H.264 tries to be brief "back to basics" and

reduce the encoding complexity.

6. Adaptable for different occasions: H.264 can use different transmission and playback

rate depending on the environment, and also provides a wealth of error-handling

tools, you can control or eliminate packet loss and bit error.

6

7. Error recovery: H.264 provides the tools to solve the problem of network

transmission packet loss, especially in a wireless network, which has high bit error

rate.

8. Higher degree of complexity: H.264 improves its performance by increasing the

complexity. It is estimated that the computational complexity of H.264 encoding is

roughly equivalent to three times that of the H.263. And decoding complexity is

roughly equivalent to two times that of the H.263.

2.2 Encoder and Decoder of the H.264/AVC Standard

H.264/AVC does not give a specific implementation instruction of the encoder and

decoder, but only provide a set of semantics and rules. And different encoders and

decoders from different providers can work under the predefined framework. This can

encourage the positive competition among providers.

The composition of encoders and decoders of H.264/AVC is illustrated below:

Chart 2.1 H.264 encoder

Chart 2.2 H.264 decoder

From the charts above, we can see that the main function module of H.264/AVC is

7

'

'

n D

similar to former standards, e.g. H.261, H.263, MPEG-1 and MPEG-4. The main

difference lies in the detail of each function module.

Encoder uses a hybrid coding method of transform and prediction. The input filed, or

frame Fn is processed in unit of macroblock. And there are two ways of predictions

in this stage. They are inter-frame prediction and intra-frame prediction. If intra-frame

prediction is adopted, the predicted value PRED (represented as P) is calculated from

the previous macroblock within current frame through motion compensation (MC).

And the reference block is marked as Fn 1 . Another way is inter-frame prediction,

which is more precise and compression efficient. The reference block can be either

the past frame or the frame in the future. Subtract the PRED from the current block,

we can gain a residual block ( Dn ). Then through transform and quantization,

transform coefficients X is calculated. And after entropy coding and adding some side

information like motion vector and quantization parameter, the final stream can be

transferred through NAL for transition or storage purpose. And in order to provide the

reference image for further prediction, the encoder also need to have the ability to

rebuild image. So the residual image can be inverse quantized and inverse

transformed to Dn . And we can get the unfiltered frame

uF '

'

by adding n

'

with P.

And after some noise removing, we can get the rebuilt frame Fn .

And Chart 2.2 is the reversed process of Chart 2.1, the input is the H.264/AVC stream, '

and after the reversed process, the current frame

the final image signal.

2.3 Motion Estimation Theory

Fn can be extracted and then output

Motion estimation is the process to predict the movement trend of the image from

inter-frames. Thus, it can use a smaller amount of information (image change) to

describe the entire image. Currently, motion estimation algorithm is divided into three

categories:

l. Pixel-based motion estimation

This type of motion estimation algorithm uses pixels as the basic unit, to describe the

different state of motion of each pixel. [15]This algorithm is highly precise, but also

with high computational complexity that it is difficult to achieve real-time encoding.

So it cannot be used in real-time scenario.

2. Object-based motion estimation

8

Object-based motion estimation usually split video images, to create a number of

objects, and then to track and match these objects. This algorithm highly depends on

video object segmentation algorithm, which is not considered to be mature enough.

As a result, the progress in object-based motion estimation research is quite slow. And

also, split object may be of different sizes, without any rules in common, leading to

the higher complexity of the algorithm. And such kind of design cannot achieve

practical purposes.

3. Block-based motion estimation

Block-based motion estimation uses blocks as the basic unit, each block contains a

number of pixels, and assuming that all pixels within each block has a consistent state

of motion. Because it uses block as a unit, the computational complexity of motion

estimation can be greatly reduced.

In addition, there are also studies about how to improve the encoding quality of the

work according to the characteristics of the human eye, for example, there is a coding

method that skips the non-sensible video content to allocate more bits on sensible

content to obtain better overall video quality.

This thesis will focus on the block-based motion estimation, and in the next section,

basic principle of block matching motion estimation will be introduced.

2.3.1 Basic concept on Motion Estimation

Like any other video coding technology, H.264/AVC is built upon several basic

concepts. First, let me introduce those important concepts.

Field and Frame

Field or frame of the video can be used to generate an encoded image. Typically, a

video frame may be divided into two types: continuous or interlaced video frame. In

traditional television, a frame is divided into two interlaced field, in order to reduce

the blinking of the video image. Usually, video content with low motion movement

should adopt frame coding mode, while those with fierce movements should uses

field coding mode.

Macroblock

A coded image can be divided into several macroblocks, and each macroblock

9

consists of a 16 x 16 array of luminance (Y) pixels and some Chroma pixels (Cr, Cb) ,

which depend on the indicator in the sequence header. And several macroblocks can

form a slice. In Slice I, there is only Macroblock I. Slice P can contain Macroblock P

and Macroblock I. And Slice B can contain Macroblock B and Macroblock I.

Macroblock I can only uses coded pixels in current slice to perform intra-frame

prediction.

Macroblock P can uses previously coded image to do inter-frame prediction. The

macroblock can be divided into 16×16, 16×8,8×16, or 8×8. And it can be further

divided into small blocks, for example, if 8×8 mode is chosen, you can choose the

sub-block size like 8×8, 8×4, 4×8 or 4×4.

Macroblock B is similar to P, but it can also use the future image to do inter-frame

prediction.

Motion Vector

In inter-frame prediction, every MB is predicted from a certain same sized MB in

reference frame. And the vector among these two is called Motion Vector (MV). It

has 1/4 pixel accuracy with Luminance component and 1/8 pixel precision with

chroma component. As a result, the reference pixel might not really exist in reference

frame (if the MV is not integer), but using interpolation operation to gain the result.

The transmission of each MV requires certain amount of bits, especially for

small-sized block size. In order to reduce the bits rate, we can use adjacent MV to

predict current MV, because adjacent MV has high correlation. And there only needs

to transmit the differential of the MV (MVD), instead of transmit the whole MV

(MVP). By doing this, we can save large amount of bit rate.

Chart 2.3 Current MB and adjacent MBs (in same size)

10

As shown above, E is the current macroblock or sub-macroblock. A B C is the

adjacent macroblock on the left, top and top right. If there is more than one macro

block on the left (chart 2.4), A, which is on the top is used as reference. And B, which

is the left of the top adjacent macroblocks, is chosen to be the reference block.

Chart 2.4 Current MB and adjacent MBs (in different sizes)

In Chart 2.4, if

The block size is not 16×8 or 8×16, the MVP is the average of MV from A, B and C.

If the block size is 16×8, the MVP of the upper part is from B, and the bottom is

from A.

If the block size is 8×16, the MVP of the left part is from A, and the right is from

C.

Motion Compensation

Motion compensation describes the process of turning reference picture to current

picture. The segmentation of Macroblocks (mentioned above) increases the

correlation of each macroblocks or sub-macroblocks, thus providing the possibility

for more efficient motion compensation, which is called tree structured motion

compensation. For each macroblocks or sub-macroblocks, it has to have motion

compensation for individual. And every MV has to be encoded, transferred, and

integrated into the output stream. For large sized MB, the MV and segmentation type

only takes small proportion of the whole stream, while the motion compensation takes

the biggest part, because the details of a large sized MB is more complex than small

sized one. And on the contrast, the MV and segmentation type takes the biggest part

for those small sized MB. So small sized MB is suitable for the image with more

details, and large sized MB is suitable for those with little or no details.

As is show in Chart 2.5, the H.264/AVC encoder chooses the segmentation type for

each block on a residual frame, which has not been through motion compensation yet.

And for the grey background image, it uses big sized block like 16×16. But in the

11

detailed part, for example, face and hair, it uses small sized block to gain a better

encoding efficiency.

2.3.2 Key Principle of Motion Estimation

In normal cases, adjacent frames within a video content is correlated, thus exists

redundancy, as illustrated by Shannon information theory. As a matter of fact, there is

redundancy for almost every video content. And it provides possibility for video

compression and video encoding technology.

Chart 2.5 Residual frame

Using inter-frame prediction technology can erase the redundancy created by frame

correlation. The same pixel in the previous frame has reference value for current

frames for still image. And for the image that is on the move, we should take motion

vector into consideration. So we need to find the matched pixel or MB in previous or

future frame, which have reference value for current frame. And the process of

finding the matched pixel or MB is called motion estimation. Motion estimation can

erase the redundancy by large amount, thus lower the information to encode and also

lower the time estimation for encoding.

As is shown in Chart 2.6, the practice of motion estimation is first to divide image

into individual MBs. Assuming all the pixel within MB has the same MV, then it

searches for the matched MB on the previous or future frames based on pre-defined

matching criterion. And the vector between current MB and matched MB is called

motion vector. And next step is to calculate the differential between motion

compensation and compensation residuals, which will be further transformed,

12

quantized, encoded and transferred.

Chart 2.6 Illustration of motion estimation

2.3.3 The Matching Criterion

There are four types of matching criterion for block matching [16]:

Minimum Mean Square Error (MSE), Minimum Absolute Difference MAD,

Normalized Cross-Correlation Function (NCCF), and Absolute Error (SAD)

Minimum Mean Square Error (MSE)

(2-1)

(i,j) refers to the motion vector, f k

and f k 1

is the grey value of each pixel in

current and previous frame. MN is the size of the block. The MB with lowest MSE is

selected to be the matched MB.

Minimum absolute difference (MAD)

(2-2)

MB with lowest MAD is the matched MB.

Normalized Cross-Correlation Function (NCCF)

(2-3)

B with highest NCCF is the matched MB.

2

1

1

1

]),(),([1

).(

M

m

k

N

n

k jnimfnmfMN

jiMSE

|),(),(|1

).(1

1

1

M

m

k

N

n

k jnimfnmfMN

jiMAD

21

21

1 1

12

11

2

1

1

1

]),([)],([

),(),(

).(

M

m

M

m

k

N

n

N

n

k

M

m

k

N

n

k

jnimfnmf

jnimfnmf

jiNCCF

13

Absolute Error (SAD)

(2-4)

And MB with lowest SAD is selected to be the matched MB. In practice, matching

criterion does not play a vital role to the precision of the matching process.

And because SAD is easier to implement, and it requires low computing power. SAD

is often chosen for the matching criterion.

2.4 Summary

This chapter is the introduction part. It introduces the basic concept of H.264/AVC

video encoding technology. And some of the key technologies like motion

compensation and motion estimation are introduced in detail.

|),(),(|).(1

1

1

M

m

k

N

n

k jnimfnmfjiSAD

14

15

Chapter 3 Analysis of the Classic Motion Estimation

Algorithm

Since the birth of video codec technology, motion estimation algorithm has become

one of the most important elements. The efficiency of the motion estimation

algorithm largely affects the success or failure of the video encoding technology. In

the development of the block-matching motion algorithm, there comes many

innovative, highly efficient algorithms, and many of those have been replaced by

more efficient algorithm. But the thought of those algorithms has become classic. This

chapter will analyze some classic motion estimation algorithm.

3.1 Full Search

`

Chart 3.1 Full search process

Chart 3.1 illustrates the process to find the best motion vector in Full Search

algorithm. The algorithm start the search from the origin of the image (generally

upper left), in a pre-made search box, all possible search block and compared with the

current block. And then to choose the most appropriate prediction block. The offset of

the two blocks is the motion estimation vector MV (u, v). Full search algorithm search

for all possible blocks, so the search accuracy is the best. But the search volume is too

large, it is difficult to meet the needs for real-time encoding.

16

3.2 Three Step Search

The process of the three-step search algorithm is shown in Figure 3.2, the center point

of the a 16x16 search box, (i, j) is set to the origin (0,0). The initial search step size is

set to the half of the maximum step. Then the MAD values of (i, j) points and their

peripheral eight neighboring points are calculated. And the point with smallest MAD

is set to be the new search center point. Then the second search step is changed to half

of the original search step and then repeats the process again, until the point with

smallest MAD is found. And the offset is set to be the motion vector.

The three-step search algorithm is highly efficient, but because its search process

design is too simple, it cannot be guaranteed that optimal motion vector can be found.

Chart 3.2 Three step search process

17

3.3 Four Step Search

The four-step search method has been improved on the basis of the three-step search

method. Figure 3.3 shows the search process.

1. Set the center of the search box as the origin, then search in a 15x15 search box of

nine 5x5 small search box. If the pixel with minimum SAD is in the center of the

search box. Then turn to step 4, otherwise turn to step 2;

2. Set the best matching point in Step 1 as the new search center, still uses 5x5 small

search box, and the next process is divided into two cases: if the optimal point in

step1 is in the middle of the search box, then it only needs to search for the 6

unsearched points for SAD comparison. And if the optimal is on the corner of the

search box, then it only needs to search for the 5 unsearched points. And after the

comparison, if the optimal point is in the center, then turn to step 4. Otherwise go

to Step 3;

3. Repeat step 2, and then jump to step 4;

4. Set search box into the small box of 3x3, nine detection points compare to get the

best match point minimum error matching points is required.

Chart 3.3 Four step search process

18

3.4 Block Based Gradient Descent Search

The Block Based Gradient Descent Search (BBGDS) uses a special search mode, this

mode focuses on central location search, it searches nine points on each search box, as

shown in Figure 3.4. The number of search points of BBGDS is in increment from

three or five. Once the point with smallest BDM is exactly in the position of the

center of the search box or on the boundary of the search box, the algorithms will stop.

BBGDS is very suitable for the scenario with little image movement. But for those

with fierce movement, this algorithm is far from satisfactory.

Chart 3.4 BBGDS process

3.5 Diamond Search

The Diamond Search (DS) is introduced in 1997 by ShanZhu and Kai.Kuang Ma. It is

one of the classic motion estimation algorithms that have a very wide range of

applications today. The main factor that affects speed and effectiveness of algorithm

is the shape and size of the search box, so the diamond search method uses two set of

search box, big diamond and small diamond. The big diamond searches for the center

point and adjacent eight point, while small diamond only searches for the center point,

along with four adjacent points. The detailed search process is illustrated in Fig 3-6 (e)

below.

1. Starts from the center of the image, the algorithm first uses big diamond

search pattern to search for the nine adjacent points. And if the point with

19

smallest SAD is in the center of the search box, then jump to step 3, if not,

continue to step 2.

2. Starts form the point with smallest SAD in step 1, and repeat the search using

big diamond pattern. And compares the SAD of the unsearched points with

center point. If the optimal point is in the center, then jump to step3,

otherwise, repeat the process.

3. Change the search pattern to small diamonds, and detect the points that are

unsearched. Choose the point with smallest SAD to be the optimal matching

point.

And total search point is 9+n(3,5)+4.

Chart 3.5 DS process

3.6 Motion Vector Field Adaptive Search

All the block matching methods mentioned above all use fixed search pattern and

search strategies regardless to the nature of the video content. And as a result, here

come lots of improved algorithms that take advantage of the time and space of the

sequence of images and human visual characteristics. There are two common methods

among those. One is rapid background image detection method. In most of the video

20

sequence, the background of the image takes biggest property of the whole, and if we

can quickly detect the background, then we can reduce the computing time by big

margin. For example, we can directly calculate the SAD value of the zero vector

(Starting point), and if the SAD value is less than a certain threshold value T, then

directly terminate the search, and the zero vector is the final motion vector. In such a

way, we can only perform single time of search to locate the optimal point, and by

doing this we can improve the efficiency of the algorithms.

Another common method is based on prediction of the complexity of current block

movement. Different search patterns will be applied to different complexity of block

movement. If the motion vector of the adjacent blocks are comparably high, then the

current block are considered to be in fierce movement. In this case, big search box

pattern are applied, such as DS and hexagon search, otherwise only use small search

pattern to complete the search. And this is the basic idea of the motion vector field

adaptive search algorithm (MVFAST), which is quite a breakthrough in fast motion

estimation algorithms.

1. Still MB detection

Most of the still MB in any video sequence has the MV of (0,0). And those still MB

can be detected using SAD. If SAD of pixel (0,0) is smaller than a threshold T, then

this MB can be considered to be still, and the search can come to an end, which is

called early termination. And (0, 0) is the motion vector. And in MVFAST, this

threshold is set to 512, and it is configurable. If it is set to 0, then the early termination

process will be skipped.

2. Movement intensity

In MVFAST, the movement intensity of a MB can be defined by the MV of Region

of Support (ROS). ROS is the adjacent MB on the left, top and top right. Assuming

Vi=(xi , yi) is the MV of MBl,MB2,MB3. And Li | xi | | yi |, L max( Li ) . Then

the movement intensity of current MB can be defined below.

Movement intensity=low, L L1

= medium, L1 L L2

=high, L L2 ( L1 , L2 can be pre-defined) (3.1)

21

Chart 3.4 Region of support

3. Starting search point.

The starting search point rely on the Movement intensity of the current MB. If it is

medium or low, then use (0,0) as the starting point. If movement intensity is high,

then SAD of the MBs that three adjacent MB pointing to will be calculated. And the

one with the lowest SAD is chosen to be the starting search point.

4. Search pattern.

There is two types of search pattern used in MVFAST, big diamond search and small

search. As explained above, MB with high movement intensity will use big diamond

as the search pattern. If not, small diamond search is applied.

MVFAST take adjacent MB into consideration, to determine the starting search point.

And it uses different search pattern for different situation. As a result, it is a balanced

way both on speed and quality.

3.7 Summary

This chapter first introduces several classic fast motion estimation using

block-matching algorithm, and focus on research and analysis of the search model,

motion estimation strategy and detailed motion estimation process. In the end analyze

strengths and weaknesses of each of these algorithms. It can be concluded from this

chapter that FS algorithm the highest accuracy, but also with the highest amount of

computation. MVFAST algorithm proposed a valuable thought, which is "stop when

is good enough". And in this way, the whole search volume can be saved by large

amount. The above analysis has laid a good foundation for the introduction of

UMHexagons motion estimation algorithm in the next chapter.

22

23

Chapter 4 Analysis and Optimization of Unsymmetrical Multi

Hexagon Search

4.1 Analysis of Unsymmetrical Multi Hexagon Search

Some of the motion estimation algorithms mentioned above, such as TSS, FSS, DS,

HS, all aims to reduce the search volume by means of limit the search points. And

those algorithms can gain good efficiency if the size of the video content is small.

But while dealing with some of the large-size images and a larger search range, those

fast search algorithms tend to fall into local optimization, and thereby seriously

affecting the coding efficiency. Therefore, this chapter focuses on the Unsymmetrical

Multi Hexagon Search (UMHexagonS) [17] algorithm. This algorithm can save up

to

90% of the computing complexity compared to Full Search. And it uses multi-level,

different shapes of search pattern, which can prevent from falling in local

optimization.

UMHexagonS algorithm also uses SAD as its matching criterion. And it uses early

termination mechanism. In most cases, the best matching point is very close to the

initial prediction point, which means that in many cases, the motion estimation search

is superfluous. And in this early termination mechanism, the threshold value is mainly

affected by two factors: the current adjustment factor (β), ------and the predicted

motion compensation ( m cos t pred ). The threshold values are defined as follows

Threshold A m cos t pred (1 1 )

Threshold B m cos t pred (1 2 )

(4.1)

(4.2)

If ThresholdB

is met, then the algorithm skips Step 3, and directly jump to Step4_2.

And if it only meet the requirement for Threshold A , it needs to perform Step4_1

before Step4_2. And the whole process is illustrated in Chart 4.1.

24

Chart 4.1 Flow chart of UMHexagonS

There are mainly four steps

1. Starting search point prediction

Starting search point prediction takes advantage of the correlation of adjacent MB

with current MB. It has four prediction modes:

Median Prediction(MP)

MP uses the median of the MB on the left(A), top (B) and top right (C) as the

predicted MV.

MVpredMP median[MVA MVB MVC ]

(4.3)

25

Chart 4.4 Corresponding-block Prediction

Chart 4.2 Median Prediction

Uplayer Prediction (UP)

In H.264/AVC there is 7 modes for MB segmentation from 16×16 down to 4×4. And

UP uses the MB of previous frame, which also has one level bigger of MB size as the

predicted MV. For example, the current MB is 8×16, then it will search for the MB

size of 16×16 as its reference.

MV predUP MVuplayer

(4.4)

Chart 4.3 Median Prediction

Corresponding-block Prediction(CP)

CP uses the MB on the same position in the previous frame for its reference. This is

more suitable for the image with low movement.

MV predCP MVCP

(4.5)

26

Neighboring Reference-frame Prediction (NRP)

NRP uses one previous frame as a reference for another pervious frame. Assuming the

current frame is in time t, and to choose the match MB in time t’, it can take the frame

t’ +1 as a reference. Which is

MVpredNRP MVNR t t ' t t '1

(4.6)

Chart 4.5 Neighboring Reference-frame Prediction

2. Asymmetric cross search

The horizontal movement in a video content is more often than vertical movement. As

a result, asymmetric cross search can gain a better accuracy and efficiency. As is

shown in step 2 in Chart 4.6, the vertical search range is double the length of

horizontal search range.

3. Uneven multi-level hexagon search

This step has two sub-steps. First, it performs a 5x5 square search from pixel (-2,-2) to

(2, 2). The matched pixel is the center for next step. Then it performs the 16 pixels

uneven multi-level hexagon search. Once the first hexagon search is finished, the

search range will be expanded to double size to perform a bigger hexagon search. This

kind of process can reduce the possibility of fall into local optimization.

4. Extended hexagon search

There are also two sub-steps in this step. First of all, the best match point from step 3

is set to be the center of the search. Now the search pattern will be changed into a

small, 6 pixel search. And if the best matching pixel is located, then it will perform a

small cross search to locate the best matching. And by doing this, the best matching

MB and its MV is found.

27

Chart 4.6 UMHexagonS process

UMHexagonS uses multi-layered, different kinds of search pattern to improve the

accuracy of the search, while still keep the computing power at a low level. It can

save about 90 percent of the computer power compare to full search. And

UMHexagonS is officially adopted by H.264/AVC group.

4.2 Optimization of Unsymmetrical Multi Hexagon Search

UMHexagonS has several advantages compared to other algorithms. First of all, it has

a complete starting point prediction system. And with this system, it can help to save

quite amount of computing power, although start point prediction itself also consume

some computing power as well. Also, is has variety of search pattern to handle

different scenario. It uses uneven multi-level hexagon search to handle large scale

search, and hexagon pattern to do detailed search. The combination of these two

pattern can reduce those unnecessary search step, and in the same time gain search

accuracy. Finally, the introduction of early termination is quite a success. and it use

two threshold to further save the computing power.

28

Because the advantages over other algorithms, this thesis uses UMHexagonS as the

model for further optimization. A good algorithm does not mean it is perfect in

every case, and after some deep thinking, this algorithms can be optimized in those

ways: First, UMHexagons is not specially made for mobile usage. Because the

quantization parameter (QP) in mobile usage is usually high, which means the

tolerance of the error is comparably high. As a result, some part of UMHexagons

is over designed. For example, after early termination, it still needs to do cross

search to locate the matching block. And here we can use the concept in

MVFAST "stop when good enough" to optimize the algorithm.

Secondly, for the threshold settings, UMHexagonS uses fixed value. Because the

video by nature is not static, and it can change beyond anyone's guess. It is not

reasonable to give a fixed number that can be adopted by all kinds of video.

Last, while the condition for early termination is met, in another word SAD is smaller

than threshold. Then we can consider the matching block is nearby and hexagon

search pattern is applied. But in several cases, for example the background is gradient

from black to dark grey, the left side of the image already meet the condition for early

termination. But the matching block is on the right side of the image. So it needs to

uses hexagon pattern to search from left to right, which will of course waste lots of

computing power. It is doubtable that SAD can be used to determine whether the

matching block is nearby.

To sum up those ideas, it is possible to further optimize UMHexagonS.

4.3 Optimization Details of Unsymmetrical Multi Hexagon

Search

4.3.1 Optimization on Early Termination

In UMHexagonS, two thresholds is applied for early termination, i.e. when the

predicted value is under the lower threshold, then the system consider this is the

perfect starting point and then perform small cross search. And if the value is under

the higher threshold, somehow higher than the lower threshold, system consider this is

an acceptable point, and then to perform hexagon search.

However, for a fast motion estimation algorithm, UMHexagonS is too complex on the

process. Because even the point can meet the criterion for early termination, it still

needs to perform hexagon search and small cross search, which are quite time

29

consuming.

Taking the core thought of MVFAST, stop when it is good enough, into consideration,

we change the details of early termination of UMHexagonS. The higher threshold is

kept untouched, while we change the lower threshold to acceptable threshold. Which

means if the predicted value is lower than the threshold, the whole search will come to

an end. And we can have the MV in the same time.

We also changed the fixed threshold into adaptive threshold, the threshold

T min( SADabove, SADaboveright, SADleft , SAD front ,512) (4.7)

SADabove, SADaboveright, SADleft , SAD front

stands for the SAD of the block on the top, top

right, left, and previous frame. The reason to set a adaptive threshold is that the video

content is flexible, and uses a fixed threshold cannot handle all the situations. With

the adaptive threshold, the algorithm can adjust the threshold in reaction to different

video content. And the improved algorithm flow is like this

30

Chart 4.7 Optimized process (1)

4.3.2 Adoption of Movement Intensity

In most of the cases, the matching MB is very close to the starting search point. In

another word, many of the following process are not actually used in many cases. This

is the reason that early termination is adopted in UMHexagonS. But here comes

another question, how to choose the criterion for early termination. If we use a

comparably high threshold, the computing time will be shorten, but the meanwhile the

precision is harmed. And if we use a low threshold, there will be more computing

time required. And by the nature of the video content, the movement of the block is

not quite related to the value of SAD. So here, we use another term to describe the

31

movement of the video, which is movement intensity. When the movement intensity

of the current MB is low, then it is considered that the match MB is in the nearby.

And this kind of setting is more reasonable than SAD. Different from MVFAST, we

only use two kinds of movement intensity, high and low. And it is determined by

adjacent MB, which is MB1, MB2 and MB3. The MV of MB1, MB2, MB3 is

represented as Vi=(xi , yi). And Li | xi | | yi |, L max( Li ) .

Thus the movement intensity of current MB is

Movement intensity = low, L L1

= high, L L1

L1 is set to 5. (4.8)

Chart4.8 Region of Support

If L L1 , then the current MB is in low movement, and we can think that the

matching MB is in the nearby. So we can uses hexagon search to find the result. And

if not, the target is in a distance, so we have to follow steps as is mentioned above.

The final process is illustrated in Chart 4.9.

32

Chart 4.9 Optimized process (2)

4.4 Implementation of the Algorithm

The improved process of the algorithm is explained in detail in previous chapter. And

based on it, there is the implementation process.

4.4.1 Starting Point Prediction

The first step is starting point prediction. There are four modes in UMHexagons.

Taking median prediction for example, the algorithm first get the MV of three

adjacent MBs, and then average the value to get the result. The implementation on the

UMHexagons source code is as follows:

33

void Get_MVp(const int x,const int y,MV *pre_mv,int &mvx,int &mvy,uint32

*sad=NULL)

{ uint32 num[10];

if(sad==NULL) sad=num;

if(y>0) { pre_mv[0]=frame_info.mv[x][y-1];

sad[0]=frame_info.sad[x][y-1]; }

else {pre_mv[0].dx=pre_mv[0].dy=0;

sad[0]=0; } // predict MV of above MB

if(x>0) {pre_mv[1]=frame_info.mv[x-1][y];

sad[1]=frame_info.sad[x-1][y];}

else {pre_mv[1].dx=pre_mv[1].dy=0;

sad[1]=0;} // predict MV of right MB

if(x>0 && y<Y-1)

{ pre_mv[2]=frame_info.mv[x-1][y-1];

sad[2]=frame_info.sad[x-1][y-1]; }

else {pre_mv[2].dx=pre_mv[2].dy=0;

sad[2]=0; }//predict MV of right above MB

if(x==0)

{ mvx=pre_mv[0].dx;

mvy=pre_mv[0].dy;

return; }

mvx=x264_median(pre_mv[0].dx,pre_mv[1].dx,pre_mv[2] .dx);

mvy=x264_median(pre_mv[0].dy,pre_mv[1].dy,pre_mv[2].dy);// get the

median MV

}

4.4.2 Search Pattern

We use the same search pattern as UMHexagons, including asymmetric cross search,

multi-layered hexagon search, hexagon search and small cross search. Those search

patterns ensures the precision of the algorithm and it is also quite efficient. The

implementation of those search patterns can directly extracted from UMHexagons.

34

4.4.3 Search Process

Based on the thought of early termination, we improved the search steps of

UMHexagons, in order to keep the precision of the algorithm and in the same time

reduce the complexity. And some key code for early search is listed below.

if(current_sad<T) goto END; // early termination

{Get_MVp(x,y,pre_mv,mvx,mvy); // get predicted MV

int L=Get_Mv_Activity(x,y,pre_mv);

if(L<=L1)goto SMALL_SEARCH; //L1 equals to 5 by default

else goto BIGCROSS_SEARCH;

}

int Get_Mv_Length(int x,int y,MV *pre_mv,int mvx=0,int mvy=0) // Function to

get the motion intensity

{

if(x==0&&y==0)

return 6;// the MB is on the corner, can not predict the activity, thus return a

int bigger than 5

int L=0,num;

num=abs(pre_mv[0].dx-mvx)+abs(pre_mv[0].dy-mvy);

L+=num;

num=abs(pre_mv[1].dx-mvx)+abs(pre_mv[1].dy-mvy);

L+=num;

num=abs(pre_mv[2].dx-mvx)+abs(pre_mv[2].dy-mvy);

L+=num;

return L;

}

4.5 Summary

In this chapter, a famous fast motion estimation algorithm called UMHexagonS is

35

analyzed in detail. And based on its shorting comings, together with some thoughts in

other algorithms, this thesis provides some improvement and fully implementation.

36

37

Chapter 5 Experiment Proof of Improved Algorithm

Currently, there are two types of methodologies to evaluate the quality of video.

[18][19]They are Subjective Quality Assessment (SQA) and Objective Quality

Assessment (OQA). SQA uses a set of users to watch the video and give their ratings

objectively. On the contrast, OQA uses a scientific model to analysis the video and

then output some metric that can used to evaluate the quality of the video. In the

thesis, both methodologies are used to evaluate whether the optimized algorithms has

gain any improvement on video quality and coding efficiency.

5.1 Objective Quality Assessment based on JM Model

5.1.1 Design of the Experiment

JM Model [20] is developed by JVT team, as a reference model for H.264/AVC. This

model implements most of the key algorithms of H.264/AVC, and it is widely used

for scientific experiment.

The hardware of the computer used in this experiment has a CPU of Core2 1.8Ghz. It

has a memory of 2 gigabytes and also a 7200rpm hard disk. The OS is windows 7 and

JM model version is 17.2. And the IDE is visual studio 2008.

Chart 5.1 IDE used in the experiment

There are three test video sequences used in this experiment, mobile,foreman

38

and akiyo. Those three video sequences represent videos with high, medium and low

complexity. They all have the resolution of 176*144 and 30 frames per second.

Chart 5.2 Video sequences used in the experiment,from left to right, mobile,foreman and akiyo

Four search algorithms are used in this experiment: FS, UMHexagonS, MVFAST and

the improved algorithm. All those algorithms is running with same configuration for

all three video sequence, which is listed below

GOP structure: IPPPP,

QP:18,Number of reference frames: 2,

Search range: 32,

RDOptimization: on,

Symbol mode: CAVLC.

5.1.2 Analysis of the Result

The metric used in this experiment are peak signal-to-noise ratio (PSNR), motion

estimation time and the bit rate of the output video sequence. Higher PSNR means

high video quality of the output video sequence. Higher ME time means the algorithm

require more time on motion estimation thus is more complex. So a good algorithm

should have low ME time. At last the bit rate of the output video represents the coding

efficiency of the algorithms. A desirable algorithm should have low output bit rate.

39

Algorithm

PSNR(y)

(dB)

PSNR(u)

(dB)

PSNR(v)

(dB)

ME time(ms)

Bit rate(kbps)

FS

37.22

41.24

42.04

280.23

115.73

MVFAST

37.11

41.16

41.88

46.344

121.43

UMHexagonS

37.20

41.22

42.01

57.432

117.23

Mine

37.15

41.22

41.98

49.345

119.23

Chart 5.3 Result for Akiyo

Algorithm PSNR(y)

(dB)

PSNR(u)

(dB)

PSNR(v) (dB)

ME time(ms)

Bit rate(kbps)

FS

37.23

40.98

41.36

438.22

213.32

MVFAST

37.01

40.78

41.22

116.32

225.33

UMHexagonS

37.12

40.88

41.33

104.22

210.78

Mine

37.08

40.87

41.37

89.23

224.13

Chart 5.4 Result for foreman

Algorithm

PSNR(y)

(dB)

PSNR(u)

(dB)

PSNR(v)

(dB)

ME

time(ms)

Bit rate(kbps)

FS

37.12

41.11

41.35

723.33

350.78

MVFAST

36.98

40.78

41.08

202.11

401.38

UMHexagonS

37.08

41.08

41.25

187.78

387.48

Mine

37.00

40.87

41.24

178.87

388.87

Chart 5.5 Result for mobile

Algorithm

PSNR

ME time

Bit rate

MVFAST

-0.14%

-74.70%

10.05%

UMHexagonS

-0.05%

-75.76%

5.25%

Mine

-0.10%

-77.98%

7.71%

Chart 5.6 Result compares to FS

40

From the result we can see that MVFAST UMHexagonS and the improved algorithms

all has significant advantage compared to FS, especially in the Akiyo sequence with

lowest complexity. Those three only require one fifth of the time to finish the motion

estimation. As the video complexity goes up, the gap is closing up, but still in the

highest complexity video sequence Mobile, the ratio is one to three. And comparing

within those three algorithms, the improved algorithms has the lowest ME time,

having 10 percent advantage comparing to UMHexagonS.

On the term of PSNR, FS is no doubt the best, because all the possible block are

precisely calculated. Compare to FS, all the other algorithms has a margin from 0.1bB

to 0.2dB. And the PSNR of the improved algorithm is slightly lower than

UMHexagonS, better than MVFAST. And the result is reasonable, because some of

the step are cut in UMHexagonS.

The result is similar for output bit rate. All the three algorithms has 5kpbs to 15kpbs

shift comparing to FS. But this margin is not significant for video sequence.

From chart 4, we can see more obviously that the improved algorithm has the lowest

ME time, saving 77.98% of the ME time required by FS. And its PSNR is better than

MVFAST, slightly lower than UMHexagonS. In another way, we can say that the

improved algorithm saves 2.2% of algorithms complexity, and in return only sacrifice

0.5% of the video quality compared to UMHexagonS.

5.2 Subjective Quality Assessment Using Double Stimulus

Impairment Scale

5.2.1 Design of Experiment

Double Stimulus Impairment Scale is used in the experiment, the participator will

watch a video sequence, coupled by the original video sequence and the output

sequence. And then give rating on the difference between those two sequences. There

are five scales used here

41

1

2

3

4

5

Cannot notice the

difference

Can notice the

difference but it

is not annoying

A little bit

annoying

Annoying

Very annoying

Chart 5.7 Rating metric

Due to limited conditions, the experiment is held in the classroom. And we used a

laptop for the display. Ten participates are recruited, and none of them is trained in

video assessment before.

Each participator should give rating on three video sequences, same sequence used

with JM model test. For each of the video sequence, there are three sequence pairs

generated by FS, UMHexagonS and the improved algorithm. Those three sequence

pair has random order in order to give a more precise result.

5.2.2 Analysis of the Result

Algorithm

People rate 1

People rate 2

People rate 3

People rate 4

People rate 5

FS

1

4

5

0

0

UMHexagonS

0

5

4

1

0

Mine

1

3

5

1

0

Chart 5.8 Results of mobile

Algorithm

People rate 1

People rate 2

People rate 3

People rate 4

People rate 5

FS

1

6

3

0

0

UMHexagonS

2

4

4

0

0

Mine

1

5

3

1

0

Chart 5.9 Results of foreman

42

Algorithm

People rate 1

People rate 2

People rate 3

People rate 4

People rate 5

FS

5

4

1

0

0

UMHexagonS

5

2

2

1

0

Mine

6

3

1

0

0

Chart 5.10 Results of akiyo

Algorithm

Mobile

Foreman

Akiyo

All

FS

2.4

2.2

1.6

2.07

UMHexagonS

2.6

2.2

1.9

2.23

Mine

2.9

2.4

1.5

2.27

Chart 5.11 Results of average rating

The result shows that the improved algorithm does not have significant video quality

reduction comparing to the most complicated algorithm, FS. And it even surpasses FS

in Akiyo sequence. (Of course there is some subjective deviation) For the average

score, the improved algorithm is 2.27, only 0.04 behind UMHexagonS. And it can

conclude that the improved algorithm does not have obvious reduction on video

quality compare to UMHexagonS, especially on video with low complexity. So for

the application like video chat that only has low video complexity, this algorithm can

gain more advantages.

5.3 Summary

In this chapter, both Objective Quality Assessment and Subjective Quality

Assessment are used to evaluate the performance of the improved algorithms. In OQA,

JM Model is used as a test platform to evaluate three major metrics of video coding

performance. In SQA, DISS is adopted. And both experiments show that the

improved algorithms can reduce the complexity of the motion estimation while in the

same time keep the video quality unharmed. And this improved algorithm is suitable

for 3G usage.

43

Chapter 6 Conclusion

The application of video content over 3G network relies heavily on the video codec.

Because video codec can impact the clarity, file size, real timing of the video content.

As the most resource-consuming part of the video coding process, Motion estimation

is considered to be the bottom neck of the whole process. So this thesis focus on the

motion estimation algorithm, more precisely, block-based motion estimation

algorithm. It researches into the core technology and the framework of H.264/AVC.

And then compares the pros and cons of some classic motion estimation algorithms,

from which pick the most outstanding algorithm, UMHexagonS as the target for

further optimization. Under deep analysis and thinking, some shortcoming of

UMHexagonS is found and conquered, and also some of the classic thoughts are

brought into the new, improved algorithms. And in the end, both OQA and SQA are

used in the evaluation of coding performance of the improved algorithm. The result

shows that the improved algorithms can reduce the complexity of the motion

estimation while in the same time keep the video quality unharmed. And this

algorithm can be quite suitable for 3G network usage.

44

References

[1] Richardson I E. The H. 264 advanced video compression standard [M]. Wiley, 2011.

[2] Richardson I E G. H. 264/MPEG-4 Part 10: Transform & Quantization [J]. A white paper.

[Online]. Available: http://www. vcodex. com, 2003.

[3] Richardson I E G. H. 264/MPEG-4 Part 10 White Paper: Overview of H. 264[J]. Oct, 2002, 7:

1-3.

[4] Bo Li, Recent advances on TD-SCDMA in China, Communications Magazine, IEEE, Volume

43, Issue: 1 Page 30 – 37

[5] Agrawal, P., Battery power sensitive video processing in wireless networks, Personal, Indoor

and Mobile Radio Communications, 1998. The Ninth IEEE International Symposium on,

116- 120 vol.1

[6] Tuan, J.-C. Chang, T.-S. Jen, C.-W, On the Data Reuse and Memory Bandwidth Analysis for

Full-Search Block-Matching VLSI Architecture, IEEE TRANSACTIONS ON CIRCUITS

AND SYSTEMS FOR VIDEO TECHNOLOGY, 2002, VOL 12; PART 1, pages 61-72

[7] Jong, H.-M. Chen, L.-G. Chiueh, T.-D., Accuracy Improvement and Cost Reduction of 3-Step

Search Block Matching Algorithm for Video Coding, INSTITUTEOF ELECTRICAL AND

ELECTRONICS ENGINEERS,1994, VOL 4; NUMBER 1, pages 88

[8] Li, R. Zeng, B. Liou, M. L.,A New Three-Step Search Algorithm for Block Motion

Estimation,INSTITUTEOF ELECTRICAL AND ELECTRONICS ENGINEERS,1994, VOL

4; NUMBER 4, pages 438

[9] Po, L.-M. Ma, W.-C., A Novel Four-Step Search Algorithm for Fast Block Motion Estimation,

INSTITUTEOF ELECTRICAL AND ELECTRONICS ENGINEERS, 1996, VOL 6;

NUMBER 3, pages 313-316

[10] Liu, L.-K. Feig, E., A Block-Based Gradient Descent Search Algorithm for Block Motion

Estimation in Video Coding, INSTITUTEOF ELECTRICAL AND ELECTRONICS

ENGINEERS, 1996, VOL 6; NUMBER 4, pages 419-421

[11] Chen, O. T.-C., Motion Estimation Using a One-Dimensional Gradient Descent Search, IEEE

TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2000,

VOL 10; PART 4, pages 608-616

[12] Zhu, S. Ma, K. K., A New Diamond Search Algorithm for Fast Block-Matching Motion,

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2000, VOL 9; PART 2, pages 287-290

[13] Zhu, C. Lin, X. Chau, L.-P., Hexagon-Based Search Pattern for Fast Block Motion Estimation,

INSTITUTEOF ELECTRICAL AND ELECTRONICS ENGINEERS, 2002, VOL 12; PART

5, pages 349-355

[14] Tourapis, A. M. Au, O. C. Liou, M. L., Predictive motion vector field adaptive search

technique (PMVFAST): enhancing block-based motion estimation [4310-92],

PROCEEDINGS- SPIE THE INTERNATIONAL SOCIETY FOR OPTICAL

ENGINEERING, 2001, ISSU 4310, pages 883-892

[15] Nicolas, Henri, and Claude Labit. "Region-based motion estimation using deterministic

relaxation schemes for image sequence coding." Acoustics, Speech, and Signal Processing,

45

1992. ICASSP-92., 1992 IEEE International Conference on. Vol. 3. IEEE, 1992.

[16] Chen, Mei-Juan, et al. "A new block-matching criterion for motion estimation and its

implementation." Circuits and Systems for Video Technology, IEEE Transactions on 5.3

(1995): 231-236.

[17] CA Rahman, W Badawy, UMHexagonS algorithm based motion estimation architecture for H.

264/AVC, System-on-Chip for Real-Time, 2005 .

[18] Wang, Zhou, et al. "Image quality assessment: From error visibility to structural

similarity." Image Processing, IEEE Transactions on 13.4 (2004): 600-612.

[19] Huynh-Thu, Quan, and Mohammed Ghanbari. "Scope of validity of PSNR in image/video

quality assessment." Electronics letters 44.13 (2008): 800-801.

[20] H.264/AVC reference software JM17.2, http://iphome.hhi.de/suehring/tml, 2010