FINAL REPORT PERFORMANCE ANALYSIS OF … 5359 MULTIMEDIA PROCESSING FINAL REPORT PERFORMANCE...

47
EE 5359 MULTIMEDIA PROCESSING FINAL REPORT PERFORMANCE ANALYSIS OF AVS-M AND ITS APPLICATION IN MOBILE ENVIRONMENT Under the guidance of DR. K R RAO DETARTMENT OF ELECTRICAL ENGINEERING UNIVERSITY OF TEXAS AT ARLINGTON Vidur K. Vajani (1000679332) Email id: [email protected]

Transcript of FINAL REPORT PERFORMANCE ANALYSIS OF … 5359 MULTIMEDIA PROCESSING FINAL REPORT PERFORMANCE...

EE 5359 MULTIMEDIA PROCESSING

FINAL REPORT

PERFORMANCE ANALYSIS OF AVS-M AND ITS

APPLICATION IN MOBILE ENVIRONMENT

Under the guidance of

DR. K R RAO DETARTMENT OF ELECTRICAL ENGINEERING

UNIVERSITY OF TEXAS AT ARLINGTON

Vidur K. Vajani (1000679332) Email id: [email protected]

2

Acknowledgement:

I would like to acknowledge the helpful discussions I had with Dr. K. R Rao. I sincerely appreciate the

help, guidance, support and motivation by Dr. Rao during the preparation of my project.

I would also like to thank my fellow classmates and seniors for their guidance and advice.

3

List of Acronyms:

AU Access Unit

AVS Audio Video Standard

AVS-M Audio Video Standard for mobile

B-Frame Interpolated Frame

CAVLC Context Adaptive Variable Length Coding

CBP Coded Block Pattern

CIF Common Intermediate Format

DIP Direct Intra Prediction

DPB Decoded Picture Buffer

EOB End of Block

HD High Definition

HHR Horizontal High Resolution

ICT Integer Cosine Transform

IDR Instantaneous Decoding Refresh

I-Frame Intra Frame

IMS IP Multimedia Subsystem

ITU-T International Telecommunication Union

MB Macroblocks

MPEG Moving Picture Experts Group

MPM Most Probable Mode

MV Motion Vector

NAL Network Abstraction Layer

P-Frame Predicted Frame

PIT Prescaled Integer Transform

PPS Picture Parameter Set

QCIF Quarter Common Intermediate Format

4

QP Quantization Parameter

RD Cost Rate Distortion Cost

SAD Sum of Absolute Differences

SD Standard Definition

SEI Supplemental Enhancement Information

SPS Sequence Parameter Set

VLC Variable Length Coding

5

List of figures:

Figure 1: History of audio video coding standards

Figure 2: Evaluation of AVS China

Figure 3: Common Intermediate Format (CIF) 4:2:0 chroma sampling

Figure 4: Quadrature Common Intermediate Format (QCIF) 4:2:0 chroma sampling

Figure 5: Layered structure

Figure 6: Current picture predicted from previous P pictures

Figure 7: Slice layer example

Figure 8: Macroblock in (a) 4:2:0 and (b) 4:2:2 formats

Figure 9: AVS-M encoder

Figure 10: Intra_4x4 Prediction including current block and its surrounding coded pixels for prediction

Figure 11: Eight Directional Prediction Modes in AVS-P7

Figure 12: Nine Intra_4x4 prediction Modes in AVS-P7

Figure 13: The position of integer, half and quarter pixel samples

Figure 14: luma and chroma block edges

Figure 15: Horizontal or Vertical Edge of 4×4 Block

Figure 16: Adaptive sliding window based reference picture marking process

Figure 17: Block diagram of an AVS-M decoder

Figure 18: Inverse DCT matrix of AVS-M

Figure 19: Subpixel locations around integer pixel “A”

Figure 20: The flow chart of main()

Figure 21: The flow chart of Encode_I_Frame()

Figure 22: The flow chart of Encode_P_FrameO

Figure 23: Video quality at various QP values for miss_america_qcif

Figure 24: PSNR (dB) vs. Bitrate (Kbps) for miss_america_qcif

Figure 25: SSIM vs. Bitrate (Kbps) for miss_america_qcif

6

Figure 26: Video quality at various QP values for mother_daughter_qcif

Figure 27: PSNR (dB) vs. Bitrate (Kbps) for mother_daughter_qcif

Figure 28: SSIM vs. Bitrate (Kbps) for mother_daughter_qcif

Figure 29: Video quality at various QP values for stefan_cif

Figure 30: PSNR (dB) vs. Bitrate (Kbps) for stefan_cif

Figure 31: SSIM vs. Bitrate (Kbps) for stefan_cif

Figure 32: Video quality at various QP values for silent_cif

Figure 33: PSNR (dB) vs. Bitrate (Kbps) for silent_cif

Figure 34: SSIM vs. Bitrate (Kbps) for silent_cif

7

List of Tables:

Table 1: History of AVS China

Table 2: Different parts of AVS China

Table 3: Comparison between different AVS profiles

Table 4: AVS profiles and their applications

Table 5: Macroblock typres of P picture

Table 6: Submacroblock types of P picture

Table 7 Context-based Most Probable Intra Mode Decision Table

Table 8: Kth Order Golomb Code

Table 9: NAL unit types

Table 10: Interpolation filter coefficient

Table 11: Compressed file size, compression ratio, bit rate, PSNR and SSIM at various QP for miss-

america_qcif sequence

Table 12: Compressed file size, compression ratio, bit rate, PSNR and SSIM at various QP for mother-

daughter_qcif sequence

Table 13: Compressed file size, compression ratio, bit rate, PSNR and SSIM at various QP for Stefan_cif

sequence

Table 14: Compressed file size, compression ratio, bit rate, PSNR and SSIM at various QP for silent_cif

sequence

8

Abstract:

The modes of digital representation of information such as audio and video signals have

undergone much transformation in leaps and bounds. The real-time mobile video communication requires

the balance between performance and complexity. Audio Video coding Standard (AVS) is established by

the Working Group of China in the same name [4]. Up to now, there are two separate parts in this

standard targeting to different video compression applications: AVS Part 2 for high-definition digital

video broadcasting and high-density storage media and AVS Part 7 for low complexity, low picture

resolution mobility applications [7].

Primary focus of this project is to study and analyze the performance of AVS-M. In this project,

the major AVS-M video coding tools, their performance and complexity are analyzed. This project

provides an insight into the AVS-M video standard, architecture of AVS-M codec, features it offers and

various data formats it supports. A study is done on the key techniques such as transform and

quantization, intra prediction, quarter-pixel interpolation, motion compensation modes, entropy coding

and in-loop de-blocking filter. AVS-M video codec is analyzed at various quality measures like bit rate,

PSNR and SSIM.

9

Contents:

Acknowledgement 2

List of Acronyms 3

List of figures 5

List of tables 7

Abstract 8

1.0 Introduction 10

2.0 AVS Standard 12

2.1 Introduction to AVS-M 14

2.2 Data Formats 16

2.3 Picture Format 16

2.4 Layered Structure 16

3.0 AVS-M encoder 19

3.1 Intra Prediction 20

3.1.1 Intra_4x4 20

3.1.2 Content based Most Probable Intra Mode Decision 21

3.1.3 Direct Intra Prediction 21

3.2 Interprediction Mode 22

3.3 Deblocking filter 24

3.4 Entropy Coding 24

4.0 AVS Tools 25

4.1 High level tools similar to H.264/AVC 25

4.2 High Level Tools / Features Different Than H.264/AVC 28

5.0 AVS-M decoder 31

5.1 Error concealment tools in AVS-M decoder 32

6.0 Main program flow analysis for encoder 32

7.0 Performance Analysis 36

7.1 Simulation results of sequence miss-america_qcif 36

7.2 Simulation results of sequence mother-daughter.qcif 38

7.3 Simulation results of sequence sefrat_cif 41

7.4 Simulation results of sequence silent_cif 43

8.0 Conclusions 46

References 46

10

1. Introduction:

Over the past 20 years, analog based communication around the world has been sidetracked by

digital communications. As the demand of audio and video signals is increased by huge amount need for

the worldwide standard for audio, video and image has also increased tremendously. The modes of digital

representation of information such as audio and video signals have undergone much transformation in

leaps and bounds. Many successful standards of audio-video signals have been released and they have

plenty of applications in today‟s digital media.

Figure 1 explains the evaluation of coding standards.

Figure 1: History of audio video coding standards [5]

Moving Picture Experts Group (MPEG) [3] was the first one which came up with the format for

transferring information in digital format. Soon, after its release this format became the standard for audio

and video file compression and transmission. After that MPEG-2 and MPEG-4 were released

subsequently. MPEG-4 part 2 uses advanced coding tools with additional complexity to achieve higher

compression factors than MPEG-2. MPEG-4 is very efficient in terms of coding, being almost 1/4th the

size of MPEG-1. These standards had almost monopoly in the market in 1990's.

AVS video standard is developed by the Audio Video Coding Standard Working Group of China

(AVS working group in short), which was approved by the Chinese Science and Technology Department

of Ministry of Information Industry in June 2002. [3]. This audio and video standard was initiated by the

Chinese government in order to counter the monopoly of the MPEG standards, which were costing it

dearly. The mandate of the AVS working group is to establish the China's national standards for

11

compression, manipulation and digital rights management in digital audio and video multimedia

equipment and systems.

AVS Mission [5]:

• To develop a 2nd

Generation video coding standard with same/better coding performance than

others

• Avoids licensing risk based on clearly analysis of related patents in last 50 years

• To help DTV, IPTV and new media operators in China and also outside China

Three main characteristics of AVS China [4]:

China coordinates the formulation, technically advanced second generation of source coding

standard - advanced

Patent pond management system and completed standard workgroup law document -

independent

Formulations process open, internationalized - opening

This standard is applied in the fields like high-resolution digital broadcast, wireless

communications medium, and internet broadcast media.

Short history of AVS China: [18]

Mar18-21. 2002 178th Xiangshan Science Conference, Beijing, “Broad-band Network and Security

Stream Media Technology”

June 11, 2002 Science and Technology Department of MII released a bulletin about setting up

“Audio Video Coding Standard Workgroup” on China Electronics

June 21, 2002 “Audio Video Coding Standard Working Group” was set up in Beijing.

Aug 23-24, 2002 first meeting of AVS, AVS united with MPEG-China. Website of AVS opened to the

members formally.

Dec 9, 2002 Department of Science and Technology of Ministry of Information Industry issued

the notice of Setting up “Audio Video Coding Standard Working Group” and

assigned the task of the group

Dec 19, 2003 On the 7th AVS meeting, AVS-video (1.0) and AVS-system (1.0) was finalized

Mar 29, 2004 Industry forum of AVS video coding technology towards 3G, ShenZhen, sponsored

together with universities and companies of Hong Kong

Mar. 30-31, 2004 Start up the video coding standardization for new generation of mobile

communication

Table 1: History of AVS China [18]

Audio video standard for Mobile (AVS-M) is the seventh part of the most recent video coding

standard which is developed by AVS workgroup of China which aims for mobile systems and devices

with limited processing and power consumption.

12

Figure 2 shows the evaluation of AVS video coding standards.

Figure 2: Evaluation of AVS China [9]

2. AVS Standard:

AVS is a set of integrity standard system which contains system, video, audio and media

copyright management. AVS comprises 10 parts. The different parts of AVS China are listed in Table 1.

Table 2: Different parts of AVS China [5]

13

As it can be seen from the table 2 that AVS has vast applications in various digital media.

According to the application requirement, the trade-off between the encoding efficiency and

encoder/decoder implementation complexity is selected. Considering different requirements AVS is

subdivided into four profiles.

1. Jizhun Profile: Jizhun profile is defined as the first profile in the national standard of

AVS-Part2, approved as national standard in 2006, which mainly focuses on digital video

applications like commercial broadcasting and storage media, including high-definition

applications. Typically, it is preferable for high coding efficiency on video sequences of

higher resolutions, at the expense of moderate computational complexity.

2. Jiben profile: Jiben profile is defined in AVS-Part7 target mobility video applications

featured with smaller picture resolution. Thus, computational complexity becomes a

critical issue. In addition, the ability on error resilience is needed due to the wireless

transporting environment.

3. Shenzhan profile: The standard of AVS-Shenzhan focuses exclusively on solutions of

standardizing the video surveillance applications. Especially, there are special features of

sequences from video surveillance, i.e. the random noise appearing in pictures, relatively

lower encoding complexity affordable, and friendliness to events detection and searching

required.

4. Jiaqiang profile: To fulfill the needs of multimedia entertainment, one of the major

concerns of Jiaqiang profile is movie compression for high-density storage. Relatively

higher computational complexity can be tolerated at the encoder side to provide higher

video quality, with compatibility to AVS China-Part 2 as well.

Table 3 shows the comparison between different AVS profiles.

Table 3: Comparison between different AVS profiles [10]

14

According to the configuration, different profiles have different applications; some key

applications of each profile are shown in table 4.

Table 4: AVS profiles and their applications [6]

2.1 Introduction to AVS-M:

The seventh part of the AVS and Jiben profile of AVS china aims at mobile systems and devices

with limited processing and power consumption. This is also called AVS-M, where letter M stands for

“mobile”. However, the target applications of AVS-M are not limited to mobile applications as implied

by the name. AVS-M has been developed to meet the needs of video compression in the applications of

digital storage media, networked media streaming, multimedia communications and so on. The standard

is applicable to the following applications:

Interactive storage media

Wide-band video services

Real-time telecommunication services

Remote video surveillance

AVS P7 short history can be given by: [13]

2004.8,WD(working draft)

2004.9,10th AVS meeting, WD 2.0

2004.11,CD(committee draft)

2005.9,14th AVS meeting, FCD (Final CD)

2006.1, FD

2006.3, GB

15

Most common test sequences are Common Intermediate Format (CIF) and Quadrature Common

Intermediate Format (QCIF). Their formats are shown in figures 3 and 4 respectively. The CIF sequence

has a fixed dimension of 352(width) x 288(height) whereas, the QCIF sequence has a fixed dimension of

176(width) x 144(height). Figures 3 and 4 show the CIF and QCIF structure with respect to 4:2:0 chroma

sampling, where Y is the luminance (brightness) component and Cb and Cr are the chrominance (color)

components.

Fig 3: Common Intermediate Format (CIF) 4:2:0 chroma sampling [24]

Fig 4: Quadrature Common Intermediate Format (QCIF) 4:2:0 chroma sampling [24]

According to the encoding speed and sequence format AVS M a Jiben Profile has 9 different levels.

These are shown below. [6]

1.0: up to QCIF and 64 kbps and resolution is 176 × 144

1.1: up to QCIF and 128 kbps and resolution is 176 × 144

1.2: up to CIF and 384 kbps and resolution is 352 × 288

1.3: up to CIF and 768 kbps and resolution is 352 × 288

2.0: up to CIF and 2 Mbps and resolution is 352 × 288

2.1: up to HHR and 4 Mbps and resolution is 704×480

2.2: up to SD and 4 Mbps and resolution is 720×576

3.0: up to SD and 6 Mbps and resolution is 720×576

3.1: up to SD and 8 Mbps and resolution is 720×576

16

2.2 Data Formats[7]:

1. Progressive Scan:

This format is directly compatible with all content that originates in film, and can accept inputs

directly from progressive telecine machines. It is also directly compatible with the emerging standard for

digital production – the so-called “24p” standard. In the next few years, most movie production and much

TV production will be converted to this new standard.

A significant benefit of progressive format is the efficiency with which motion estimation

operates. Progressive content can be encoded at significantly lower bitrates than interlaced content with

the same perceptual quality. Furthermore, motion compensated coding of progressive format data is

significantly less complex than coding of interlaced data.

2. Interlaced Scan:

AVS also provides coding tools for interlaced scan format. These tools offer coding of legacy

interlaced format video.

2.3Picture Format:

AVS is a generic standard and can code pictures with a rectangular format up to 16K x 16K

pixels in size. Pixels are coded in luminance-chrominance format (YCrCb) and each component can have

precision of 8 bits. AVS supports a range of commonly used frame rates and pixel aspect ratios AVS

supports 4:2:0 and 4:2:2 chroma formats. Chromaticity is defined by international standards.

2.4 Layered Structure [5]:

AVS is built on a layered data structure representing traditional video data. This structure is

mirrored in the coded video bit stream. Figure 5 illustrates this layered structure.

Figure 5: Layered structure [7]

17

At the highest layer, sets of frames of continuous video are organized into a sequence. The

sequence provides an opportunity to download parameter sets to decoders. Pictures can optionally be

subdivided into rectangular regions called Slices. Slices are further subdivided into square regions of

pixels called macroblocks. These are the fundamental coding units used by AVS and comprise a set of

luminance and chrominance blocks of pixels covering the same square region of the picture.

Sequence: The sequence layer comprises a set of mandatory and optional downloaded system parameters. The

sequence layer provides an entry point into the coded video. For example, they should be placed at the

start of each chapter on a DVD to facilitate random access. Alternatively they should be placed every ½-

second in broadcast TV to facilitate changing channels.

Picture: The picture layer provides the coded representation of a video frame. It comprises a header with

mandatory and optional parameters and optionally with user data. Three types of picture are defined by

AVS:

Intra pictures (I-pictures)

Predicted pictures (P-pictures)

Interpolated pictures (B-pictures)

AVS uses adaptive modes for motion compensation at the picture layer and macroblock layer. At the

picture layer, the modes are [7]

Forward prediction from the most recent reference frame

Forward prediction from the second most recent prediction frame

Interpolative prediction between the most recent reference frame and a future reference frame.

Intra coding

Figure 6 illustrates how current picture is predicted from the previous reference pictures.

Figure 6: Current picture predicted from previous P pictures [7]

Slice: The slice structure provides the lowest-layer mechanism for resynchronizing the bitstream in case of

transmission error. Slices comprise an arbitrary number of raster-ordered rows of macroblocks as

illustrated in the example of figure 7.

18

Figure 7: Slice layer example [7]

Macroblock: A macroblock includes the luminance and chrominance component pixels that collectively represent a

16x16 region of the picture. In 4:2:0 mode, the chrominance pixels are subsampled by a factor of two in

each dimension; therefore each chrominance component contains only one 8x8 block. In 4:2:2 mode, the

chrominance pixels are subsampled by a factor of two in the horizontal dimension; therefore each

chrominance component contains two 8x8 blocks. This is illustrated in Figure 8.

(a)

(b)

Figure 8: Macroblock in (a) 4:2:0 and (b) 4:2:2 formats [7]

At the macroblock layer, the modes depend on the picture mode [7]

•In intra pictures, all macroblocks are intra coded.

•In predicted pictures, macroblocks may be forward predicted or intra coded.

•In interpolated pictures, macroblocks may be forward predicted, backward predicted,

interpolated or intra coded.

19

Block: The block is the smallest coded unit and contains the transform coefficient data for the prediction

errors. In the case of intra-coded blocks, Intra prediction is performed from neighboring blocks.

There are two MB types of I-picture specified by AVS-M. If mb_type is 1, the type of current

MB is I_4x4; otherwise, the type is I Direct. The MB types of P-picture are shown in Table 5 and Table 6.

If skip mode flag is 1, then MbTypeIndex is equal to mb_type plus 1; otherwise, MbTypeIndex is equal to

mb_type. If MbTypeIndex is greater than or equal to 5, MbTypeIndex is set to 5.

Table 5 : Macroblock typres of P picture [6]

Table 6: Submacroblock types of P picture [6]

3.0 AVS-M encoder:

AVS-M encoder is shown in figure 9.

Figure 9: AVS-M encoder [10]

There are 2 modes of prediction:

1. Intra prediction

2. Inter prediction

20

3.1 Intra Prediction [16]

Two types of intra prediction modes are adopted in AVS-P7, Intra_4x4 and Direct Intra

Prediction (DIP). AVS-P7‟s intra coding brings a significant complexity reduction and maintains a

comparable performance.

3.1.1 Intra_4x4

When using the Intra_4x4 mode, each 4x4 block is predicted from spatially neighboring samples

as illustrated in Fig. 10. The 16 samples of the 4x4 block which are labeled as a-p are predicted using

prior decoded samples in adjacent block label as A-D, E-H and X. The up-right pixels used to predict are

expanded by pixel sample D. Similarly, the down-left pixels are expanded by H. Compared with the

reference pixels locations used by Intra_4x4 of H.264/AVC, AVS-P7 brings with data fetching and on-

chip memory consuming reduction while remaining comparable performance.

Fig. 10 Intra_4x4 Prediction including current block and its surrounding coded pixels for prediction [16]

For each 4x4 block, one of nine prediction modes can be utilized to exploit spatial correlation

including eight directional prediction modes (such as Down Left, Vertical, etc) and non-directional

prediction mode (DC). In addition to “DC” prediction mode (which all 16 pixels in 4x4 block is predicted

by the average of surrounding available pixels), eight directional prediction modes are specified as shown

in Fig. 11.

Fig. 11 Eight directional prediction modes in AVS-P7 [16]

21

All the modes adopted by the AVS-P7 are utilized to improve the intra coding efficiency in

heterogeneous area, e.g. multiple objects in one macroblock or block with different motion tendency.

And, we present all the modes specifically in Fig. 12.

Fig. 12 Nine intra_4x4 prediction modes in AVS-P7 [16]

3.1.2 Content based Most Probable Intra Mode Decision

A statistical model is used to determine the most probable intra mode of current block based on

video characteristics and content correlation. A look up table is used to predict the most probable intra

mode decision of current block. Irrespective of whether Intra_4x4 or DIP is used, the most probable mode

decision method is described as follows:

Get the intra mode of up block and left block. If the up (or left) block is not available for intra

mode prediction, the mode up (or left) block is defined as -1.

Use the up intra mode and left intra mode to find the most probable mode in the table.

If the current MB is coded as Intra_4x4 mode, the intra prediction mode is coded as follows:

If the best mode equals to the most probable mode, 1 bit of flag is transmitted to each block to

indicate the mode of current block is its most probable mode.

Table 7 Context-based most probable intra mode decision table [16]

3.1.3 Direct Intra Prediction: When direct intra prediction is used, we use a new method to code the intra prediction mode

information. As analysis before, when Intra_4x4 is used, for each block at least 1 bit is needed to present

the mode information. It means, for a macroblock, even when intra prediction mode of all 16 blocks are

their most probable mode, 16 bits is needed to indicate the mode information.

22

A rate-distortion based direct intra prediction mainly contains 5 steps

Step 1: All 16 4×4 blocks in a MB use their MPMs to do Intra_4×4 prediction and calculate RDCost(DIP)

of this MB.

Step 2: Mode search of Intra_4×4, find the best intra prediction mode of each block, and calculate

RDCost(Intra_4x4).

Step 3: Compare RDCost(DIP) and RDCost(Intra_4x4). If RDCost(DIP) is less than RDCost(Intra_4x4),

DIP flag equals to 1 then go to step 4, else DIP flag equals to 0 go to step 5.

Step 4: Encode the MB using DIP and finish encoding of this MB.

Step 5: Encode the MB using ordinary Intra_ 4×4 and finish encoding of this MB.

3.2 Interprediction Mode:

AVS-M defines I picture and P picture. P pictures use forward motion compensated prediction.

The maximum number of reference pictures used by a P picture is two. To improve the error resilience

capability, one of the two reference pictures can be a I/P pictures far away from current picture. AVS-M

also specifies nonreference P pictures. If the nal_ref_idc of a P picture is equal to 0, the P picture shall not

be used as a reference picture. The nonreference P pictures can be used for temporal scalability. The

reference pictures are identified by the reference picture number, which is 0 for IDR picture. The

reference picture number of a non-IDR reference picture is calculated as given in equation 1.

= + − , ≤ num (1)

= + − + 32, otherwise

Where num is the frame num value of current picture, is the frame num value of the

previous reference picture, and refnum is the reference picture number of the previous reference picture.

After decoding current picture, if nal_ref_idc of current picture is not equal to 0, then current

picture is marked as “used for reference”. If current picture is an IDR picture, all reference pictures except

current picture in decoded picture buffer (DPB) shall be marked as “unused for reference”. Otherwise, if

nal_unit_type of current picture is not equal to 0 and the total number of reference pictures excluding

current picture is equal to num ref frames, the following applies:

1. If num ref frames is 1, reference pictures excluding current picture in DBP shall be marked as

“unused for reference.”

2. If num ref frames is 2 and sliding window size is 2, the reference picture excluding current

picture in DPB with smaller reference picture number shall be marked as “unused for reference.”

3. Otherwise, if num ref frames is 2 and sliding window size is 1, the reference picture excluding

current picture in DBP with larger reference picture number shall be marked as “unused for

reference.”

The size of motion compensation block can be 16×16, 16×8, 8×16, 8×8, 8×4, 4×8 or 4×4. If the

half_pixel_mv_flag is equal to „1‟, the precision of motion vector is up to ½ pixel, otherwise the precision

of motion vector is up to ¼ pixel [18]. When half_pixel_mv_flag is not present in the bitstream, it shall be

inferred to be “11.”

23

The positions of integer, half and quarter pixel samples are depicted in figure 13. Capital letters

indicate integer sample positions, while small letters indicate half and quarter sample positions. The

interpolated values at half sample positions can be obtained using 8-tap filter F1 =

(−1,4,−12,41,41,−12,4,−1) and 4-tap filter F2 = (−1,5,5,−1). The interpolated values at quarter sample

position can be obtained by linear interpolation.

Figure 13: The position of integer, half and quarter pixel samples [1]

According to Figure 13, half sample b is calculated as follows:

b‟ = −C+4D−12E +41F +41G−12H +4I−J

b = (b‟+32) >> 6

And half sample h is calculated as follows:

h‟ = −A+5F +5−S

h = (h‟+4) >> 3

Both b and h should be clipped to the range [0,255]

Quarter sample a is calculated as

a = (F +b+1)>>1

Interpolation of chroma sample values is shown in Figure 14. A, B, C and D are the integer

sample values around the interpolated sample. dx and dy are the horizontal and vertical distances from

predicted sample to A, respectively. The predicted sample predxy is calculated as given by equation 2.

= ((8− )(8− )A+ (8− )B+(8− ) C+ D +32)>>6, (2)

should be clipped to the range [0,255].

24

3.3 Deblocking filter [3]

AVS Part 7 makes use of a simplified deblocking filter, wherein boundary strength is decided at

MB level [3]. Filtering is applied to the boundaries of luma and chroma blocks except for the boundaries

of picture or slice. In Figure 15 and Figure 16 the dotted lines indicate the boundaries which will be

filtered. Intra prediction MB usually has more and bigger residuals than that of inter prediction MB,

which leads to very strong blocking artifacts at the same QP. Therefore, a stronger filter is applied to intra

predicted MB and a weak filter is applied to inter predicted MB. When the MB type is P Skip, there is no

coded residual. When QP is not very large, the distortion caused by quantization is relatively small,

henceforth no filtering is required.

Figure 14: Luma and chroma block edges [3]

Figure 14 shows the pixels used for sample-level deblocking filter. Different filtering processes are

applied to each sample-level boundary under different filter modes and the values of some pixels are

updated.

Figure 15: Horizontal or vertical edge of 4×4 Block [2]

3.4 Entropy Coding:

In entropy coding, the basic concept is mapping from a video signal after prediction and

transforming to a variable length coded bitstream, generally referring to two entropy coding methods,

either variable length coding or arithmetic coding. For the request of higher coding efficiency, context-

based adaptive entropy coding technique is developed and favored by current coding standards. AVS-M

uses Exp-Golomb code, as shown in Table 6 to encode syntax elements such as quantized coefficients,

macroblock coding type, and motion vectors. Eighteen coding tables are used in quantized coefficients

encoding. The encoder uses the run and the absolute value of the current coefficient to select Table 6.

25

Table 6: Kth order golomb code [5]

4.0 AVS Tools [19]:

4.1 High level tools similar to H.264/AVC:

1. Network abstraction layer (NAL) unit – structure

2. Parameter sets.

3. Instantaneous decoding refresh (IDR) picture

4. Gradual decoding refresh (GDR) or gradual random access

5. Flexible slice coding

6. Reference picture numbering

7. Non-reference P picture

8. Constrained intra prediction

9. Loop filter disabling at slice boundaries

10. Byte-stream format.

1. NAL Unit Structure: [19]

Video coding standards, earlier than H.264/AVC, use the start code based bitstream structure,

wherein the bi=stream consists of several layers, typically including several of the following: a sequence

layer, a picture layer, a slice layer, a macroblock layer, and a block layer. The bitstream for each layer

typically consists of a header and associated data, and each header of a slice or higher layer starts with a

start code for resynchronization and identification. The NAL unit structure was first introduced into

H.264/AVC. In this structure, the coded video data is organized into NAL units, each of which contains

an NAL unit header and payload. The NAL unit header contains the type of data contained in the NAL

unit. The NAL unit structure definition specifies a generic format for use in both packet-oriented and

bitstream-oriented transport systems. A series of NAL units generated by and encoder is referred to as an

NAL unit stream.

NAL unit has 6 types: [18]

picture_header_rbsp( ) of non-IDR picture

picture_header_rbsp( ) of IDR picture

slices_layer_rbsp( )

seq_parameter_set_rbsp( )

pic_parameter_set_rbsp( )

26

sei_rbsp( )

andom_access_point_indicator_rbsp( )

Table 9: NAL unit types [6]

The advantages of the NAL unit structure over the start code based structure include the

following.

The NAL unit structure provides convenient conveyance of video data for different

transport layers.

For packet-based systems, such as RTP, the transport layer can identify NAL unit

boundaries without use of start codes. Therefore, those overhead bits can be saved.

The NAL unit structure provides flexible extension capability using new NAL unit type.

If needed, start code prefixes can be used to transform an NAL unit stream to the start

code based structure.

2. Parameter Sets:

Parameter sets contain the sequence-level header information and the infrequently changing

picture-level header information. With parameter sets, the infrequently changing information is not to be

repeated for each sequence or picture, hence coding efficiency is improved. Furthermore, use of

parameter sets enables out-of-band transmission of the important header information, such that improved

error resilience is achieved. In “out-of-band” transmission, parameter set NAL units is transmitted in a

channel different than the one for transmission of other NAL units.

27

3. IDR Picture:

Support of random access is important for any video codec. In video coding standards earlier than

H. 264/AVC, intra coded pictures are used as random access points. However, because the use of multiple

reference pictures, a following an intrapicture may have inter prediction reference from a picture

preceding the intra picture in decoding order. This may make an intra picture not random accessible.

Consequently, to perform random access, it is required to check for each intra picture whether any

subsequent pictures have a reference dependency on pictures prior to the intra picture in decoding order.

This significantly increases the complexity for random access operations.

To solve the above problem and better meet the requirement, explicitly signaling of IDR pictures

is specified such that random accessible intra pictures can be easily identified. In both AVS-M and

H.264/AVC, IDR pictures are signaled by a unique NAL unit type.

4. Gradual Decoding Refresh (GDR) or Gradual Random Access

GDR enables gradual random access, wherein the decoded video is completely refreshed after a

number of decoded pictures, rather than the instantaneous decoding refresh by random accessing at an

IDR picture. A nice feature of GDR is that random access can be performed at non-IDR pictures or even

non-intra pictures. Furthermore, GDR can enable gradual random access and improved error resilience

simultaneously. In both, AVS-M and H.264/AVC, GDR or gradual random access points are signaled

using supplemental enhancement information (SEI) messages.

5. Flexible Slice Coding

Slice coding is an efficient tool to improve video error resilience. In many packet-oriented

transport systems, the optimal size of a packet and a slice in bytes is a function of the expected packet loss

rate and the underlying maximum transmission unit (MTU) size. It is beneficial to encapsulate one sliece

into one packet, because if a slice is split into multiple packets, the loss of one of these packets may

prevent decoding of the entire slice. If the slice size were to be in the granularity of MB rows and must

start from the first MB of an MB row, the sizes of the slices would likely be non-optimal. In other words,

such a slice design would prevent optimal transport.

6. Reference Picture Numbering

In AVS-M, reference pictures are labeled by the 5-bit syntax element frame_num, which is

incremented by one for each reference picture. For non-reference pictures, the value is incremented by

one is relation to the value of the previous reference picture in decoding order. This frame number enables

decoders to detect loss of reference pictures. If there is no loss of reference pictures, the decoder can go

on decoding. Otherwise, a proper action should be taken.

7. Non-Reference P Picture

In standards earlier than H.264/AVC, I and P pictures are always reference pictures, while B

pictures are always non-reference pictures. In H.264/AVC, pictures of any picture type, including B

pictures, may be reference pictures, and P pictures may be reference or non-reference pictures. The latest

AVS-M specification does not support B pictures. To enable temporal scalability with only I and P

pictures, non-reference picture has the syntax element equal to 0.

28

8. Constrained Intra Prediction

Intra prediction utilizes available neighboring samples in the same picture for prediction of the

coded samples to improve the efficiency of intra coding. In the constrained intra prediction mode,

samples from inter coded blocked are not used for intra prediction. Use of this mode can improve error

resilience. If errors happen in the reference picture, the error may propagate to the inter coded blocks of

the current picture. If constrained intra prediction were not used, then the error may also propagate to the

intra coded blocks due to intra prediction. Consequently, the intra coded blocks would not have efficiently

stopped error proportion.

9. Loop filter Disabling at Slice Boundaries

Similar to H.264/AVC, AVS-M includes a deblocking loop filter. Loop filtering is applied to

each 4by 4 block boundary. Loop filtering is the only decoding operation that makes slices in the same

picture not completely independent of each other. To enable completely independent slices in the same

picture, turning off loop filtering at slice boundaries is supported, both in H.264/AVC and AVS-M. AVS-

M further supports turning off loop- filtering completely. Having completely independent slices can

achieve perfect gradual random access based on gradual decoding refresh.

10. Byte Stream Format

The byte stream format is specified to enable transport of the video in byte or bit oriented

transport systems, where start codes are needed to detect the start of each coded slice for

resynchronization purpose. The method to transform an NAL unit stream to a byre stream is to prefix a

start code to each NAL unit. Start code emulation shall not occur in the bits of each NAL unit. Even

though start code emulation is not a problem for packet based transport systems, start code emulation

prevention is always used, because otherwise transforming NAL unit streams to byte streams would

become much more complex.

4.2 High Level Tools / Features Different Than H.264/AVC [19]

This subsection discusses the following AVS-M high-level tools or features that are different than

H. 264/AVC:

1. Picture ordering and timing

2. Random access point indicator

3. Picture header

4. Signaling of scalable points

5. Reference picture management

6. Hypothetical reference decoder (HRD)

1. Picture order and timing

The picture timing mechanism of AVS-M is similar to H.263. The syntax element

picture_distance in AVS-M is similar to the temporal reference (TR) in H.263. Picture_distance is an

indication of picture order. The value of picture_distance increases by one in relation to the previous

coded picture plus the number of skipped pictures after the previous coded picture and is then modulo

divided by 256. A difference between picture_distance and TR is than picture_distance resets to zero at

29

IDR pictures. Together with the syntax element, the time duration of the picture_distance difference of 1,

the picture_distance also indicates the picture timing.

2. Random access point indicator

AVS-M specifies the random access point indicator (RAPI) NAL unit type. An RAPI NAL unit,

if present, shall be the first NAL unit of an access unit in decoding order. An RAPI NAL unit indicates

that the access unit contains an IDR picture or a gradual random access SEI message, and the decoding of

any access unit after the RAPI NAL unit in decoding order does not need any sequence parameter set

NAL unit, picture parameter set NAL unit or SEI NAL unit prior to the RAPI NAL unit in decoder order.

The RAPI NAL unit enables easier random access operation. In order to know that an access unit

is a random access point, it is no longer necessary to buffer and parse all the NAL units in the beginning

of an access units until the first coded slice of an IDR picture or the first SEI NAL unit containing a

gradual random access SEI message. This feature is particularly useful in broadcast or multicast

applications for a client to tune in an ongoing session.

3. Picture header

AVS-M uses picture header and picture parameter set simultaneously. The main reason to

reintroduce a picture header to AVS-M was for coding efficiency purposes. A conservative estimate of

the overhead is as follows. If the picture header were not used and those picture header syntaz elements

were put in the slice header, and further assume that for the case of CIF 30fps, 256Kbps, each MB row

was coded as one slice, the additional overhead would be 4.8% of the total bitrate. For mobile video

telephony, it is reasonable to have a QCIF resolution at 15fps, 100 bytes per slice , 64Kbps, which results

in 80 slices per second. In this case, the additional overhead would be roughly 2.4% of the total bit rate.

4. Signalling of scalable points

Temporal scalability can be achieved by using nonreference pictures supported in AVS-M. With

such functionality, one AVS-M bitstream may be efficiently served to decoding devices with different

decoding capabilities or connected with different transmission bandwidth.

For example, an originating terminal creates an AVS-M that complies with a profile at a certain

level and is scalably coded in such a way that a base layer bitstream containing only the reference pictures

of the entire bitstream is compliant with a lower level. If the receiver is only capable of decoding the

lower level, the server should adapt the video bitstream accordingly. To make sure that the adapted

bitstream complies with the lower level, the server should analyze the bitstream, make necessary

transcoding, and check that the adapted bitstream conforms to requirements of the lower level. These

processing steps require a lot of computations in the server.

The above problem can be solved by signaling of the profile and level of a scalable bistreamlayer.

This signaling allows creation or modification of the compressed bitstream in such a way where the

created or modified bitstream later is guaranteed to conform to a certain pair of profile and level. The

signaling greatly simplifies and speeds up multimedia adaptation and computational scalability in local

playback or streaming applications.

5. Reference picture management

Reference picture management is one important aspect of decoded picture buffer (DPB)

management. Decoded pictures used for predicting subsequent coded pictures and for future output are

buffered in the DPB. To efficiently utilize the buffer memory, the DPB management processes, including

the storage process of decoded pictures into the DPB, the marking process of decoded pictures from the

30

DPB, shall be specified. This is particularly important when multiple reference pictures are used and

when picture output order is not decoding order.

Figure 16: Adaptive sliding window based reference picture marking process [19]

6. Hypothetical reference decoder (HRD)

In video coding standards, a compliant bit stream must be able to be decoded by a hypothetical

decoder that is conceptually connected to the output of an encoder and consists of at least a pre-decoder

buffer, a decoder, and an output/display unit. This virtual decoder is known as the hypothetical reference

decoder (HRD).

The HRD supports multiple operation points, each of which is characterized by the following

five parameters:

Bitrate, indicating the maximum input bit rate of the corresponding operation point;

CpbSize, indicating the size of CPB of the corresponding operation point;

DpbSize, indicating the size of the DPB of the corresponding operation point;

InitCpbDelay, indicating the delay for the corresponding operation point between the

time of arrival in the CPB of the first bit of the first picture and the time of removal from

the CPB of the first bit of the first picture;

InitDpbDelay, indicating the delay for the corresponding operation point between the

time of arrival in the DPB of the first decoded picture and the time of output from the

DPB of the first decoded picture.

The instances of the five parameters are signaled in the HRD buffering parameters SEI message. The

operation of each HRD operation point is as follows.

1. The buffers are initially emply.

2. Bits of coded access units enter the CPB at the rate equal to BitRate.

3. The decoding timer starts from a negative value equal to 0 when the first bit enters the

CPB. Data is not removed from the CPB if the value of the decoding timer is smaller

than 0.

4. Removal of the first coded access unit from the CBP occurs when the value of the

decoding timer is equal to 0. Removal of any other coded access unit occurs when the

value of the decoding timer is equal to the relative output time of the access unit.

5. When a coded access unit has been removed from the CPB, the corresponding decoded

picture is immediately placed into the DPB. The output timer starts from a negative

value equal to 0-InitDpbDelay when the first decoded picture enters the DPB. Data is not

outputted from the DPB if the value of the output timers is smaller than 0.

6. A decoded picture is outputted from the DPB when the value of the output timer is equal

to the relative output time of the decoded picture. If the decoded picture is a non-

lreference picture or if the decoded picture is a reference picture but marked as “unused

31

for reference”, the data of the decoded picture is also removed from the DPB when it is

outputted. When outputting a decoded picture from the DPB, decoded pictures that are

not marked as “used for reference” and the relative output times are earlier than the

value of the output timer are removed from the DPB.

5.0 AVS-M decoder [12]:

The generalized block diagram of an AVS-M decoder is presented in Figure 16.

Figure 17: Block diagram of an AVS-M decoder [12]

The decoder complexity estimation is very important and instructive for target platform-

dependent codec realization. Storage complexity and computational complexity are two major concerns

of the decoder implementation.

A. Inverse Transform and Quantization:

Only integer 4 X 4 block-based DCT is adopted in AVS-M to simplify the hardware design and

decrease the processing complexity of residuals. The inverse transform matrix is illustrated in Figure 18.

The range of the QP is extended from 0 to 63 with 8-step quantization approximately.

Figure 18: Inverse DCT matrix of AVS-M [12]

B. Interpolation:

In AVS-M, the 1/2-pixels in luma reference frame are divided into horizontal and vertical classes

with different filters, which are illustrated in table 10. For an integer pixel A, there are three 1/2-pixels,

marked bh, bv, c; and, twelve 1/4-pixels denoted as d, e,J g, h, i in Figure 19. Chroma interpolation is

omitted from this paper since it is easier to implement than luma, as it is performed by straightforward

bilinear interpolation operations.

32

Figure 19: Subpixel locations around integer pixel “A” [12]

Table 10: Interpolation filter coefficient [12]

C. In-loop deblocking:

Deblocking is performed across the edges of 4x4 block on the nearest two pixels. The filter mode

is determined by the macroblock type and the QP of current macroblock, i.e. if the macroblock is INTRA

coded, the filter is INTRA mode; and if the macroblock is not SKIP or the QP is not less than the pre-

defined threshold, the filter is INTER mode.

5.1 Error concealment tools in AVS-M decoder [17]

Error concealment in video communication is a very challenging topic. Many techniques have

been proposed to deal with the transmission error problem. Generally, all these techniques can be

categorized into three kinds: forward error concealment, backward error concealment and interactive error

concealment.

Forward error concealment refers to techniques in which the encoder plays the primary role,

which partitions video data into more than one layer with different priority. The layer with higher priority

is delivered with a higher degree of error protection. Better quality can be achieved when more layers are

received at decoder side.

Backward error concealment refers to the concealment or estimation of lost information due to

transmission errors in which the decoder fulfills the error concealment task.

The decoder and encoder interactive techniques achieve the best reconstruction quality, but are

more difficult to implement. Generally speaking, a feedback channel is needed from decoder to encoder

and low time delay should be guaranteed.

6.0 Main program flow analysis for encoder [15]:

In this section, we analyze in detail the main program flow in three key function: Main( ),

Encode_I Frame( ) ,Encode_P_Frame( ) and give flow diagram instructions [5]. This function is the AVS-

M program's main function. The main process of the main function is that the required parameters and

cache used in the entire program are allocated and initialized. And then, according to the parameters

pglmage-> type, decide on the current image I frame or P frame coding, respectively, into the I frame or P

frame coding procedures for processing. At last compensation image to return to the main function is

33

reconstructed and stored. For image motion compensation, the amount of data itself will be significantly

reduced.

Flow chart of the main() is shown in the figure 20.

Figure 20: The flow chart of main() [15]

34

Flow chart of Encode_I_Frame is shown in figure 21.

Figure 21: The flow chart of Encode_I_Frame() [15]

35

The flow chart of Encode_P_Frame() is shown in figure 22.

Figure 22: The flow chart of Encode_P_Frame()[15]

36

7.0 Performance Analysis:

Quantization parameter (QP) values ranging from 0 - 63 and AVS-china part 7 video codec was

analyzed at various quality measures like Bit Rate, PSNR and SSIM [23] were calculated. The test

sequences used were of QCIF and CIF formats. The bit rate was plotted against the PSNR and MSE.

Total 4 sequences are simulated, two from qcif format and another two from cif format.

1. Miss-america_qcif

2. Mother-daughter_qcif

3. Stefan_cif

4. Silent_cif

7.1 Simulation results of sequence miss-america_qcif:[22]

Input Sequence: miss-america_qcif.yuv

Total No: of frames: 30 frames.

Original file size : 1114Kb

Width: 176.

Height: 144.

Frame rate: 30 fps.

Figure 23 illustrates video quality of miss_america_qcif sequence at various QP values.

Original File QP =10

QP = 50 QP= 63

Fig 23: Video quality at various QP values for miss_america_qcif

37

Simulation Results for miss-america_qcif Sequence:

Table 11 shows values compressed file size, compression ratio, bit rate, PSNR and SSIM at various QP

for miss_america_qcif sequence.

Table 11: Compressed file size, compression ratio, bit rate, PSNR and SSIM at various QP for miss-

america_qcif sequence

Results in Graphical Form:

Fig 24 shows the plot of PSNR (dB) vs. Bitrate (Kbps) for miss_america_qcif.

Fig 24: PSNR (dB) vs. Bitrate (Kbps) for miss_america_qcif

QP COMPRESSED FILE SIZE [kb]

Compression Ratio

Bit Rate [Kbps] Y-PSNR[dB] Y-SSIM

0 330 3.3757:1 2698.85 54.2773 0.9972

5 201 5.5422:1 1641.18 51.3345 0.9946 10 111 10.036:1 902.83 49.1926 0.9916

20 23 48.434:1 179.76 44.9110 0.9842 30 8 139.25:1 56.80 40.7322 0.9716

40 4 278.50:1 30.66 36.0983 0.9429

50 2.80 397.85:1 22.21 30.7096 0.8869 55 2.55 436.86:1 20.17 28.0461 0.8455

60 2.40 464.16:1 18.98 25.1375 0.7999 63 2.33 478.11:1 18.42 22.0501 0.7829

38

Figure 25 shows the plot of SSIM vs. bitrate (Kbps) for miss_america_qcif.

Fig 25: SSIM vs. Bitrate (Kbps) for miss_america_qcif

7.2 Simulation results of sequence mother-daughter.qcif[21]

Input Sequence: mother-daughter_qcif.yuv

Total No: of frames: 30 frames.

Original file size : 1139Kb

Width: 176.

Height: 144.

Frame rate: 30 fps

Figure 26 illustrates video quality of mother_daughter_qcif sequence at various QP values.

Original File QP = 10

39

QP = 50 QP=63

Fig 26: Video quality at various QP values for mother_daughter_qcif

Results for mother-daughter_qcif Sequence:

Table 12 shows the values of compressed file size, compression ratio, bit rate, PSNR and SSIM at various

QP for mother-daughter_qcif sequence.

Table 12: Compressed file size, compression ratio, bit rate, PSNR and SSIM at various QP for mother-

daughter_qcif sequence

QP COMPRESSED FILE SIZE [kb]

Compression Ratio

Bit Rate [Kbps] Y-PSNR [dB]

Y-SSIM

0 237 4.8059:1 1937.45 53.97741 0.9981

5 127 8.9685:1 1037.33 51.1885 0.9964 10 63 18.0794:1 514.99 48.8490 0.9945

20 19 59.9474:1 152.79 43.4243 0.9856 30 8 142.3750:1 60.56 38.4661 0.9617

40 4 284.7500:1 31.99 33.7337 0.9030

50 2.79 408.2437:1 22.15 29.4328 0.8023 55 2.56 444.9219:1 20.29 26.0557 0.6981

60 2.38 478.5714:1 18.83 23.0817 0.6371 63 2.16 527.3148:1 18.53 22.3413 0.6221

40

Results in Graphical Form:

Figure 27 shows the plot of PSNR (dB) vs. Bitrate (Kbps) for mother_daughter_qcif.

Fig 27: PSNR (dB) vs. Bitrate (Kbps) for mother_daughter_qcif

Figure 28 shows the plot of SSIM vs. Bitrate (Kbps) for mother_daughter_qcif

Fig 28: SSIM vs. Bitrate (Kbps) for mother_daughter_qcif

41

7.3 Simulation results of sequence sefrat_cif:[21]

Input Sequence: stefan_cif.yuv

Total No: of frames: 15 frames.

Original file size : 2227.5 Kb

Width: 352

Height: 288.

Frame rate: 30 fps.

Figure 29 illustrates video quality of Stefan_cif sequence at various QP values

Original File QP = 10

QP = 50 QP=63

Fig 29: Video quality at various QP values for stefan_cif

42

Simulation Results for stefan_cif Sequence

Table 13 shows the values of compressed file size, compression ratio, bit rate, PSNR and SSIM at various

QP for Stefan_cif sequence.

Table 13: Compressed file size, compression ratio, bit rate, PSNR and SSIM at various QP for stefan_cif

sequence

Results in Graphical Form:

Figure 30 shows the plot of PSNR (dB) vs. Bitrate (Kbps) for stefan_cif.

Fig 30: PSNR (dB) vs. Bitrate (Kbps) for stefan_cif

QP COMPRESSED FILE SIZE [kb]

Compression Ratio

Bit Rate [Kbps] Y-PSNR[dB] Y-SSIM

0 1082 2.0587:1 17722.04 53.7192 0.9987

5 810 2.7500:1 13257.49 50.5553 0.9973 10 588 3.7883:1 9616.75 48.0813 0.9953

20 270 8.2500:1 4419.03 41.8208 0.9884 30 107 20.8178:1 1749.46 36.0297 0.9742

40 41 54.3293:1 655.40 30.7737 0.9403

50 19 117.2368:1 309.20 25.9537 0.8419 55 15 148.5000:1 233.75 23.6506 0.7556

60 11 202.5000:1 177.33 20.7062 0.5875 63 10 222.7500:1 151.39 19.0242 0.4688

43

Figure 31 shows the plot of SSIM vs. Bitrate (Kbps) for stefan_cif.

Fig 31: SSIM vs. Bitrate (Kbps) for stefan_cif

7.4 Simulation results of sequence silent_cif[21]

Input Sequence: silent_cif.yuv

Total No of frames: 15 frames.

Original file size : 2227.5Kb

Width: 352.

Height: 288.

Frame rate: 30 fps.

Fig 32 illustrates video quality of silent_cif sequence at various QP values.

44

Original File QP = 10

QP = 50 QP=63 Fig 32: Video quality at various QP values for silent_cif

Simulation Results for silent_cif Sequence:

Table 14 shows the values of compressed file size, compression ratio, bit rate, PSNR and SSIM at various

QP for silent_cif sequence.

Table 14: Compressed file size, compression ratio, bit rate, PSNR and SSIM at various QP for silent_cif

QP COMPRESSED FILE SIZE [kb]

Compression Ratio

Bit Rate [Kbps] Y-PSNR[dB] y-SSIM

0 592 3.7627:1 9694.90 53.5225 0.9982

5 357 6.2395:1 5836.70 50.6134 0.9965 10 199 11.1935:1 3244.26 47.7497 0.9934

20 66 33.7500:1 1076.57 41.8517 0.9769 30 28 79.5536:1 445.79 36.5718 0.9329

40 13 171.3462:1 206.23 32.0900 0.8498

50 8 278.4375:1 121.74 28.1807 0.7315 55 7 318.2143:1 101.13 25.9689 0.6612

60 5.54 402.0758:1 89.21 23.8874 0.6125 63 5.25 424.2857:1 84.36 22.1109 0.5366

45

Results in Graphical Form:

Figure 33 shows the plot of PSNR (dB) vs. Bitrate (Kbps) for silent_cif.

Fig 33: PSNR (dB) vs. Bitrate (Kbps) for silent_cif

Figure 34 shows the plot of SSIM vs. Bitrate (Kbps) for silent_cif.

Fig 34: SSIM vs. Bitrate (Kbps) for silent_cif

46

8.0 Conclusions:

The design of AVS-M encoder, decoder and major AVS-M tools is described in detail. Such

design makes AVS-M applicable in mobile environment. In this project, AVS encoder and decoder are

implemented using AVS M software. Tests are carried out on various QCIF and CIF sequences. The bit

rate, PSNR and SSIM values are tabulated. The performance of AVS-china is analyzed by varying the

quantization parameter (QP). The PSNR and bit rate and SSIM are calculated. It can be observed that at

higher QP the performance is best but decoded file is size is also large. As QP decreases quality of video

and size of video decreases. As AVS-M targets majorly to cell phone devices, the resolution and clarity of

the picture is very low compared to AVS part 2.

References:

[1] J.Ostermann et al, “Video coding with H.264/AVC: Tools, Performance, and Complexity”, IEEE

Circuits and Systems Magazine, vol. 4, Issue:1, pp. 7 – 28, Aug. 2004.

[2]B. Tang, Y. Chen and W. Ji “AVS Encoder Performance and Complexity Analysis Based on Mobile

Video Communication”, DOI 10.1109/CMC.2009.171, 2009 International Conference on

Communications and Mobile Computing

[3]L. Fan, S. Ma, F. Wu, “Overview of AVS Video Standard”, IEEE International Conference on

Multimedia and Expo (ICME), pp 423-427, 2004

[4 ] AVS working group official website, http://www.avs.org.cn

[5] H. Tiejun “AVS – Technology, IPR and Applications”, available at, www.avs.org.cn/reference/AVS

进展(20101112.ppt

[6]S. Devaraju and K.R. Rao, “A Study on AVS-M Video Standard”, M.S. Thesis, Electrical Engineering

Department, University of Texas at Arlington, Arlington, TX, 2009.

[7] W. Gao et al, “AVS– the Chinese next-generation video coding standard,” National Association of

Broadcasters, Las Vegas, 2004.

[8] K. R. Rao and D.N. Kim, “Current Video Coding Standards: H.264/AVC, Dirac, AVS China and VC-

1”, IEEE 42nd Southeastern symposium on system theory (SSST), pp. 1-8,March 2010.

[9] W.Gao and T.Huang “AVS Standard -Status and Future Plan”, Workshop on Multimedia New

Technologies and Application, Shenzhen, China, Oct. 2007.

[10] L.Fan, “Mobile Multimedia Broadcasting Standards”, ISBN: 978-0-387-78263-8, Springer US,

2009.

[11]W. Gao, K.N. Ngan and L. Yu, “Special issue on AVS and its applications: Guest editorial”, Signal

Process: Image Commun, vol. 24, Issue 4, pp. 245-344, April 2009.

47

[12] Z. Ma et al, “Complexity analysis OF AVS-M jiben profile decoder”, Proceedings of 2005

International Symposium on Intelligent Signal Processing and Communication Systems, December 13-

16, 2005 Hong Kong

[13] M. Liu and Z. Wei, “A fast mode decision algorithm for intra prediction in AVS-M video coding”

Vol. 1, ICWAPR apos;07, Issue, 2-4, pp.326 –331, Nov. 2007.

[14] Y. Cheng et al, “Analysis and application of error concealment tools in AVS-M decoder”, Journal of

Zhejiang University –Science A, vol. 7, pp. 54-58, Jan 2006.

[15] S.Hu, X.Zhang and Z.Yang, “Efficient Implementation of Interpolation for AVS”, Congress on

Image and Signal Processing, 2008. Vol 3, pp133 –138, 27-30 May 2008

[16] R. Schafer and T. Sikora,“Digital video coding standards and their role in video communications”,

Proc. of the IEEE, vol. 83, pp. 907-924, June 1995.

[17] B. Lei et al, “Optimization and Implementation of AVS-M Decoder on ARM”, DOI

10.1109/ICIG.2007.166, IEEE 2007

[18] L. Yu, “AVS Project and AVS-Video Techniques”, http://www-

ee.uta.edu/dip/Courses/EE5351/ISPACSAVS.pdf, Dec.13, 2005 ISPACS 2005

[19]Y. Wang ” AVS_M: From standards to Applications”, Journal of Computer Science and Technology

- Special section on China AVS standard Vol.21. No.3 pp. 332-344, May 2006.

[20] MSU video quality measuement tool,

http://www.softrecipe.com/Download/msu_video_quality_measurement_tool.html

[21] MATLAB software, http://www.mathworks.com/products/matlab/tryit.html

[22] Test video sequences http://trace.eas.asu.edu/yuv/

[23] Z. Wang et al, "Image quality assessment: From error visibility to structural similarity," IEEE Trans.

Image Processing, vol. 13, no. 4, pp. 600-612, Apr. 2004

[24] S.Kwon, A. Tamhankar and K.R. Rao, “Overview of H.264 / MPEG-4 Part 10”, J. Visual

Communication and Image Representation, vol. 17, pp.186-216, April 2006.