Comparison and Performance Analysis of H.264, AVS-China, VC-1 and Dirac
FINAL REPORT PERFORMANCE ANALYSIS OF … 5359 MULTIMEDIA PROCESSING FINAL REPORT PERFORMANCE...
Transcript of FINAL REPORT PERFORMANCE ANALYSIS OF … 5359 MULTIMEDIA PROCESSING FINAL REPORT PERFORMANCE...
EE 5359 MULTIMEDIA PROCESSING
FINAL REPORT
PERFORMANCE ANALYSIS OF AVS-M AND ITS
APPLICATION IN MOBILE ENVIRONMENT
Under the guidance of
DR. K R RAO DETARTMENT OF ELECTRICAL ENGINEERING
UNIVERSITY OF TEXAS AT ARLINGTON
Vidur K. Vajani (1000679332) Email id: [email protected]
2
Acknowledgement:
I would like to acknowledge the helpful discussions I had with Dr. K. R Rao. I sincerely appreciate the
help, guidance, support and motivation by Dr. Rao during the preparation of my project.
I would also like to thank my fellow classmates and seniors for their guidance and advice.
3
List of Acronyms:
AU Access Unit
AVS Audio Video Standard
AVS-M Audio Video Standard for mobile
B-Frame Interpolated Frame
CAVLC Context Adaptive Variable Length Coding
CBP Coded Block Pattern
CIF Common Intermediate Format
DIP Direct Intra Prediction
DPB Decoded Picture Buffer
EOB End of Block
HD High Definition
HHR Horizontal High Resolution
ICT Integer Cosine Transform
IDR Instantaneous Decoding Refresh
I-Frame Intra Frame
IMS IP Multimedia Subsystem
ITU-T International Telecommunication Union
MB Macroblocks
MPEG Moving Picture Experts Group
MPM Most Probable Mode
MV Motion Vector
NAL Network Abstraction Layer
P-Frame Predicted Frame
PIT Prescaled Integer Transform
PPS Picture Parameter Set
QCIF Quarter Common Intermediate Format
4
QP Quantization Parameter
RD Cost Rate Distortion Cost
SAD Sum of Absolute Differences
SD Standard Definition
SEI Supplemental Enhancement Information
SPS Sequence Parameter Set
VLC Variable Length Coding
5
List of figures:
Figure 1: History of audio video coding standards
Figure 2: Evaluation of AVS China
Figure 3: Common Intermediate Format (CIF) 4:2:0 chroma sampling
Figure 4: Quadrature Common Intermediate Format (QCIF) 4:2:0 chroma sampling
Figure 5: Layered structure
Figure 6: Current picture predicted from previous P pictures
Figure 7: Slice layer example
Figure 8: Macroblock in (a) 4:2:0 and (b) 4:2:2 formats
Figure 9: AVS-M encoder
Figure 10: Intra_4x4 Prediction including current block and its surrounding coded pixels for prediction
Figure 11: Eight Directional Prediction Modes in AVS-P7
Figure 12: Nine Intra_4x4 prediction Modes in AVS-P7
Figure 13: The position of integer, half and quarter pixel samples
Figure 14: luma and chroma block edges
Figure 15: Horizontal or Vertical Edge of 4×4 Block
Figure 16: Adaptive sliding window based reference picture marking process
Figure 17: Block diagram of an AVS-M decoder
Figure 18: Inverse DCT matrix of AVS-M
Figure 19: Subpixel locations around integer pixel “A”
Figure 20: The flow chart of main()
Figure 21: The flow chart of Encode_I_Frame()
Figure 22: The flow chart of Encode_P_FrameO
Figure 23: Video quality at various QP values for miss_america_qcif
Figure 24: PSNR (dB) vs. Bitrate (Kbps) for miss_america_qcif
Figure 25: SSIM vs. Bitrate (Kbps) for miss_america_qcif
6
Figure 26: Video quality at various QP values for mother_daughter_qcif
Figure 27: PSNR (dB) vs. Bitrate (Kbps) for mother_daughter_qcif
Figure 28: SSIM vs. Bitrate (Kbps) for mother_daughter_qcif
Figure 29: Video quality at various QP values for stefan_cif
Figure 30: PSNR (dB) vs. Bitrate (Kbps) for stefan_cif
Figure 31: SSIM vs. Bitrate (Kbps) for stefan_cif
Figure 32: Video quality at various QP values for silent_cif
Figure 33: PSNR (dB) vs. Bitrate (Kbps) for silent_cif
Figure 34: SSIM vs. Bitrate (Kbps) for silent_cif
7
List of Tables:
Table 1: History of AVS China
Table 2: Different parts of AVS China
Table 3: Comparison between different AVS profiles
Table 4: AVS profiles and their applications
Table 5: Macroblock typres of P picture
Table 6: Submacroblock types of P picture
Table 7 Context-based Most Probable Intra Mode Decision Table
Table 8: Kth Order Golomb Code
Table 9: NAL unit types
Table 10: Interpolation filter coefficient
Table 11: Compressed file size, compression ratio, bit rate, PSNR and SSIM at various QP for miss-
america_qcif sequence
Table 12: Compressed file size, compression ratio, bit rate, PSNR and SSIM at various QP for mother-
daughter_qcif sequence
Table 13: Compressed file size, compression ratio, bit rate, PSNR and SSIM at various QP for Stefan_cif
sequence
Table 14: Compressed file size, compression ratio, bit rate, PSNR and SSIM at various QP for silent_cif
sequence
8
Abstract:
The modes of digital representation of information such as audio and video signals have
undergone much transformation in leaps and bounds. The real-time mobile video communication requires
the balance between performance and complexity. Audio Video coding Standard (AVS) is established by
the Working Group of China in the same name [4]. Up to now, there are two separate parts in this
standard targeting to different video compression applications: AVS Part 2 for high-definition digital
video broadcasting and high-density storage media and AVS Part 7 for low complexity, low picture
resolution mobility applications [7].
Primary focus of this project is to study and analyze the performance of AVS-M. In this project,
the major AVS-M video coding tools, their performance and complexity are analyzed. This project
provides an insight into the AVS-M video standard, architecture of AVS-M codec, features it offers and
various data formats it supports. A study is done on the key techniques such as transform and
quantization, intra prediction, quarter-pixel interpolation, motion compensation modes, entropy coding
and in-loop de-blocking filter. AVS-M video codec is analyzed at various quality measures like bit rate,
PSNR and SSIM.
9
Contents:
Acknowledgement 2
List of Acronyms 3
List of figures 5
List of tables 7
Abstract 8
1.0 Introduction 10
2.0 AVS Standard 12
2.1 Introduction to AVS-M 14
2.2 Data Formats 16
2.3 Picture Format 16
2.4 Layered Structure 16
3.0 AVS-M encoder 19
3.1 Intra Prediction 20
3.1.1 Intra_4x4 20
3.1.2 Content based Most Probable Intra Mode Decision 21
3.1.3 Direct Intra Prediction 21
3.2 Interprediction Mode 22
3.3 Deblocking filter 24
3.4 Entropy Coding 24
4.0 AVS Tools 25
4.1 High level tools similar to H.264/AVC 25
4.2 High Level Tools / Features Different Than H.264/AVC 28
5.0 AVS-M decoder 31
5.1 Error concealment tools in AVS-M decoder 32
6.0 Main program flow analysis for encoder 32
7.0 Performance Analysis 36
7.1 Simulation results of sequence miss-america_qcif 36
7.2 Simulation results of sequence mother-daughter.qcif 38
7.3 Simulation results of sequence sefrat_cif 41
7.4 Simulation results of sequence silent_cif 43
8.0 Conclusions 46
References 46
10
1. Introduction:
Over the past 20 years, analog based communication around the world has been sidetracked by
digital communications. As the demand of audio and video signals is increased by huge amount need for
the worldwide standard for audio, video and image has also increased tremendously. The modes of digital
representation of information such as audio and video signals have undergone much transformation in
leaps and bounds. Many successful standards of audio-video signals have been released and they have
plenty of applications in today‟s digital media.
Figure 1 explains the evaluation of coding standards.
Figure 1: History of audio video coding standards [5]
Moving Picture Experts Group (MPEG) [3] was the first one which came up with the format for
transferring information in digital format. Soon, after its release this format became the standard for audio
and video file compression and transmission. After that MPEG-2 and MPEG-4 were released
subsequently. MPEG-4 part 2 uses advanced coding tools with additional complexity to achieve higher
compression factors than MPEG-2. MPEG-4 is very efficient in terms of coding, being almost 1/4th the
size of MPEG-1. These standards had almost monopoly in the market in 1990's.
AVS video standard is developed by the Audio Video Coding Standard Working Group of China
(AVS working group in short), which was approved by the Chinese Science and Technology Department
of Ministry of Information Industry in June 2002. [3]. This audio and video standard was initiated by the
Chinese government in order to counter the monopoly of the MPEG standards, which were costing it
dearly. The mandate of the AVS working group is to establish the China's national standards for
11
compression, manipulation and digital rights management in digital audio and video multimedia
equipment and systems.
AVS Mission [5]:
• To develop a 2nd
Generation video coding standard with same/better coding performance than
others
• Avoids licensing risk based on clearly analysis of related patents in last 50 years
• To help DTV, IPTV and new media operators in China and also outside China
Three main characteristics of AVS China [4]:
China coordinates the formulation, technically advanced second generation of source coding
standard - advanced
Patent pond management system and completed standard workgroup law document -
independent
Formulations process open, internationalized - opening
This standard is applied in the fields like high-resolution digital broadcast, wireless
communications medium, and internet broadcast media.
Short history of AVS China: [18]
Mar18-21. 2002 178th Xiangshan Science Conference, Beijing, “Broad-band Network and Security
Stream Media Technology”
June 11, 2002 Science and Technology Department of MII released a bulletin about setting up
“Audio Video Coding Standard Workgroup” on China Electronics
June 21, 2002 “Audio Video Coding Standard Working Group” was set up in Beijing.
Aug 23-24, 2002 first meeting of AVS, AVS united with MPEG-China. Website of AVS opened to the
members formally.
Dec 9, 2002 Department of Science and Technology of Ministry of Information Industry issued
the notice of Setting up “Audio Video Coding Standard Working Group” and
assigned the task of the group
Dec 19, 2003 On the 7th AVS meeting, AVS-video (1.0) and AVS-system (1.0) was finalized
Mar 29, 2004 Industry forum of AVS video coding technology towards 3G, ShenZhen, sponsored
together with universities and companies of Hong Kong
Mar. 30-31, 2004 Start up the video coding standardization for new generation of mobile
communication
Table 1: History of AVS China [18]
Audio video standard for Mobile (AVS-M) is the seventh part of the most recent video coding
standard which is developed by AVS workgroup of China which aims for mobile systems and devices
with limited processing and power consumption.
12
Figure 2 shows the evaluation of AVS video coding standards.
Figure 2: Evaluation of AVS China [9]
2. AVS Standard:
AVS is a set of integrity standard system which contains system, video, audio and media
copyright management. AVS comprises 10 parts. The different parts of AVS China are listed in Table 1.
Table 2: Different parts of AVS China [5]
13
As it can be seen from the table 2 that AVS has vast applications in various digital media.
According to the application requirement, the trade-off between the encoding efficiency and
encoder/decoder implementation complexity is selected. Considering different requirements AVS is
subdivided into four profiles.
1. Jizhun Profile: Jizhun profile is defined as the first profile in the national standard of
AVS-Part2, approved as national standard in 2006, which mainly focuses on digital video
applications like commercial broadcasting and storage media, including high-definition
applications. Typically, it is preferable for high coding efficiency on video sequences of
higher resolutions, at the expense of moderate computational complexity.
2. Jiben profile: Jiben profile is defined in AVS-Part7 target mobility video applications
featured with smaller picture resolution. Thus, computational complexity becomes a
critical issue. In addition, the ability on error resilience is needed due to the wireless
transporting environment.
3. Shenzhan profile: The standard of AVS-Shenzhan focuses exclusively on solutions of
standardizing the video surveillance applications. Especially, there are special features of
sequences from video surveillance, i.e. the random noise appearing in pictures, relatively
lower encoding complexity affordable, and friendliness to events detection and searching
required.
4. Jiaqiang profile: To fulfill the needs of multimedia entertainment, one of the major
concerns of Jiaqiang profile is movie compression for high-density storage. Relatively
higher computational complexity can be tolerated at the encoder side to provide higher
video quality, with compatibility to AVS China-Part 2 as well.
Table 3 shows the comparison between different AVS profiles.
Table 3: Comparison between different AVS profiles [10]
14
According to the configuration, different profiles have different applications; some key
applications of each profile are shown in table 4.
Table 4: AVS profiles and their applications [6]
2.1 Introduction to AVS-M:
The seventh part of the AVS and Jiben profile of AVS china aims at mobile systems and devices
with limited processing and power consumption. This is also called AVS-M, where letter M stands for
“mobile”. However, the target applications of AVS-M are not limited to mobile applications as implied
by the name. AVS-M has been developed to meet the needs of video compression in the applications of
digital storage media, networked media streaming, multimedia communications and so on. The standard
is applicable to the following applications:
Interactive storage media
Wide-band video services
Real-time telecommunication services
Remote video surveillance
AVS P7 short history can be given by: [13]
2004.8,WD(working draft)
2004.9,10th AVS meeting, WD 2.0
2004.11,CD(committee draft)
2005.9,14th AVS meeting, FCD (Final CD)
2006.1, FD
2006.3, GB
15
Most common test sequences are Common Intermediate Format (CIF) and Quadrature Common
Intermediate Format (QCIF). Their formats are shown in figures 3 and 4 respectively. The CIF sequence
has a fixed dimension of 352(width) x 288(height) whereas, the QCIF sequence has a fixed dimension of
176(width) x 144(height). Figures 3 and 4 show the CIF and QCIF structure with respect to 4:2:0 chroma
sampling, where Y is the luminance (brightness) component and Cb and Cr are the chrominance (color)
components.
Fig 3: Common Intermediate Format (CIF) 4:2:0 chroma sampling [24]
Fig 4: Quadrature Common Intermediate Format (QCIF) 4:2:0 chroma sampling [24]
According to the encoding speed and sequence format AVS M a Jiben Profile has 9 different levels.
These are shown below. [6]
1.0: up to QCIF and 64 kbps and resolution is 176 × 144
1.1: up to QCIF and 128 kbps and resolution is 176 × 144
1.2: up to CIF and 384 kbps and resolution is 352 × 288
1.3: up to CIF and 768 kbps and resolution is 352 × 288
2.0: up to CIF and 2 Mbps and resolution is 352 × 288
2.1: up to HHR and 4 Mbps and resolution is 704×480
2.2: up to SD and 4 Mbps and resolution is 720×576
3.0: up to SD and 6 Mbps and resolution is 720×576
3.1: up to SD and 8 Mbps and resolution is 720×576
16
2.2 Data Formats[7]:
1. Progressive Scan:
This format is directly compatible with all content that originates in film, and can accept inputs
directly from progressive telecine machines. It is also directly compatible with the emerging standard for
digital production – the so-called “24p” standard. In the next few years, most movie production and much
TV production will be converted to this new standard.
A significant benefit of progressive format is the efficiency with which motion estimation
operates. Progressive content can be encoded at significantly lower bitrates than interlaced content with
the same perceptual quality. Furthermore, motion compensated coding of progressive format data is
significantly less complex than coding of interlaced data.
2. Interlaced Scan:
AVS also provides coding tools for interlaced scan format. These tools offer coding of legacy
interlaced format video.
2.3Picture Format:
AVS is a generic standard and can code pictures with a rectangular format up to 16K x 16K
pixels in size. Pixels are coded in luminance-chrominance format (YCrCb) and each component can have
precision of 8 bits. AVS supports a range of commonly used frame rates and pixel aspect ratios AVS
supports 4:2:0 and 4:2:2 chroma formats. Chromaticity is defined by international standards.
2.4 Layered Structure [5]:
AVS is built on a layered data structure representing traditional video data. This structure is
mirrored in the coded video bit stream. Figure 5 illustrates this layered structure.
Figure 5: Layered structure [7]
17
At the highest layer, sets of frames of continuous video are organized into a sequence. The
sequence provides an opportunity to download parameter sets to decoders. Pictures can optionally be
subdivided into rectangular regions called Slices. Slices are further subdivided into square regions of
pixels called macroblocks. These are the fundamental coding units used by AVS and comprise a set of
luminance and chrominance blocks of pixels covering the same square region of the picture.
Sequence: The sequence layer comprises a set of mandatory and optional downloaded system parameters. The
sequence layer provides an entry point into the coded video. For example, they should be placed at the
start of each chapter on a DVD to facilitate random access. Alternatively they should be placed every ½-
second in broadcast TV to facilitate changing channels.
Picture: The picture layer provides the coded representation of a video frame. It comprises a header with
mandatory and optional parameters and optionally with user data. Three types of picture are defined by
AVS:
Intra pictures (I-pictures)
Predicted pictures (P-pictures)
Interpolated pictures (B-pictures)
AVS uses adaptive modes for motion compensation at the picture layer and macroblock layer. At the
picture layer, the modes are [7]
Forward prediction from the most recent reference frame
Forward prediction from the second most recent prediction frame
Interpolative prediction between the most recent reference frame and a future reference frame.
Intra coding
Figure 6 illustrates how current picture is predicted from the previous reference pictures.
Figure 6: Current picture predicted from previous P pictures [7]
Slice: The slice structure provides the lowest-layer mechanism for resynchronizing the bitstream in case of
transmission error. Slices comprise an arbitrary number of raster-ordered rows of macroblocks as
illustrated in the example of figure 7.
18
Figure 7: Slice layer example [7]
Macroblock: A macroblock includes the luminance and chrominance component pixels that collectively represent a
16x16 region of the picture. In 4:2:0 mode, the chrominance pixels are subsampled by a factor of two in
each dimension; therefore each chrominance component contains only one 8x8 block. In 4:2:2 mode, the
chrominance pixels are subsampled by a factor of two in the horizontal dimension; therefore each
chrominance component contains two 8x8 blocks. This is illustrated in Figure 8.
(a)
(b)
Figure 8: Macroblock in (a) 4:2:0 and (b) 4:2:2 formats [7]
At the macroblock layer, the modes depend on the picture mode [7]
•In intra pictures, all macroblocks are intra coded.
•In predicted pictures, macroblocks may be forward predicted or intra coded.
•In interpolated pictures, macroblocks may be forward predicted, backward predicted,
interpolated or intra coded.
19
Block: The block is the smallest coded unit and contains the transform coefficient data for the prediction
errors. In the case of intra-coded blocks, Intra prediction is performed from neighboring blocks.
There are two MB types of I-picture specified by AVS-M. If mb_type is 1, the type of current
MB is I_4x4; otherwise, the type is I Direct. The MB types of P-picture are shown in Table 5 and Table 6.
If skip mode flag is 1, then MbTypeIndex is equal to mb_type plus 1; otherwise, MbTypeIndex is equal to
mb_type. If MbTypeIndex is greater than or equal to 5, MbTypeIndex is set to 5.
Table 5 : Macroblock typres of P picture [6]
Table 6: Submacroblock types of P picture [6]
3.0 AVS-M encoder:
AVS-M encoder is shown in figure 9.
Figure 9: AVS-M encoder [10]
There are 2 modes of prediction:
1. Intra prediction
2. Inter prediction
20
3.1 Intra Prediction [16]
Two types of intra prediction modes are adopted in AVS-P7, Intra_4x4 and Direct Intra
Prediction (DIP). AVS-P7‟s intra coding brings a significant complexity reduction and maintains a
comparable performance.
3.1.1 Intra_4x4
When using the Intra_4x4 mode, each 4x4 block is predicted from spatially neighboring samples
as illustrated in Fig. 10. The 16 samples of the 4x4 block which are labeled as a-p are predicted using
prior decoded samples in adjacent block label as A-D, E-H and X. The up-right pixels used to predict are
expanded by pixel sample D. Similarly, the down-left pixels are expanded by H. Compared with the
reference pixels locations used by Intra_4x4 of H.264/AVC, AVS-P7 brings with data fetching and on-
chip memory consuming reduction while remaining comparable performance.
Fig. 10 Intra_4x4 Prediction including current block and its surrounding coded pixels for prediction [16]
For each 4x4 block, one of nine prediction modes can be utilized to exploit spatial correlation
including eight directional prediction modes (such as Down Left, Vertical, etc) and non-directional
prediction mode (DC). In addition to “DC” prediction mode (which all 16 pixels in 4x4 block is predicted
by the average of surrounding available pixels), eight directional prediction modes are specified as shown
in Fig. 11.
Fig. 11 Eight directional prediction modes in AVS-P7 [16]
21
All the modes adopted by the AVS-P7 are utilized to improve the intra coding efficiency in
heterogeneous area, e.g. multiple objects in one macroblock or block with different motion tendency.
And, we present all the modes specifically in Fig. 12.
Fig. 12 Nine intra_4x4 prediction modes in AVS-P7 [16]
3.1.2 Content based Most Probable Intra Mode Decision
A statistical model is used to determine the most probable intra mode of current block based on
video characteristics and content correlation. A look up table is used to predict the most probable intra
mode decision of current block. Irrespective of whether Intra_4x4 or DIP is used, the most probable mode
decision method is described as follows:
Get the intra mode of up block and left block. If the up (or left) block is not available for intra
mode prediction, the mode up (or left) block is defined as -1.
Use the up intra mode and left intra mode to find the most probable mode in the table.
If the current MB is coded as Intra_4x4 mode, the intra prediction mode is coded as follows:
If the best mode equals to the most probable mode, 1 bit of flag is transmitted to each block to
indicate the mode of current block is its most probable mode.
Table 7 Context-based most probable intra mode decision table [16]
3.1.3 Direct Intra Prediction: When direct intra prediction is used, we use a new method to code the intra prediction mode
information. As analysis before, when Intra_4x4 is used, for each block at least 1 bit is needed to present
the mode information. It means, for a macroblock, even when intra prediction mode of all 16 blocks are
their most probable mode, 16 bits is needed to indicate the mode information.
22
A rate-distortion based direct intra prediction mainly contains 5 steps
Step 1: All 16 4×4 blocks in a MB use their MPMs to do Intra_4×4 prediction and calculate RDCost(DIP)
of this MB.
Step 2: Mode search of Intra_4×4, find the best intra prediction mode of each block, and calculate
RDCost(Intra_4x4).
Step 3: Compare RDCost(DIP) and RDCost(Intra_4x4). If RDCost(DIP) is less than RDCost(Intra_4x4),
DIP flag equals to 1 then go to step 4, else DIP flag equals to 0 go to step 5.
Step 4: Encode the MB using DIP and finish encoding of this MB.
Step 5: Encode the MB using ordinary Intra_ 4×4 and finish encoding of this MB.
3.2 Interprediction Mode:
AVS-M defines I picture and P picture. P pictures use forward motion compensated prediction.
The maximum number of reference pictures used by a P picture is two. To improve the error resilience
capability, one of the two reference pictures can be a I/P pictures far away from current picture. AVS-M
also specifies nonreference P pictures. If the nal_ref_idc of a P picture is equal to 0, the P picture shall not
be used as a reference picture. The nonreference P pictures can be used for temporal scalability. The
reference pictures are identified by the reference picture number, which is 0 for IDR picture. The
reference picture number of a non-IDR reference picture is calculated as given in equation 1.
= + − , ≤ num (1)
= + − + 32, otherwise
Where num is the frame num value of current picture, is the frame num value of the
previous reference picture, and refnum is the reference picture number of the previous reference picture.
After decoding current picture, if nal_ref_idc of current picture is not equal to 0, then current
picture is marked as “used for reference”. If current picture is an IDR picture, all reference pictures except
current picture in decoded picture buffer (DPB) shall be marked as “unused for reference”. Otherwise, if
nal_unit_type of current picture is not equal to 0 and the total number of reference pictures excluding
current picture is equal to num ref frames, the following applies:
1. If num ref frames is 1, reference pictures excluding current picture in DBP shall be marked as
“unused for reference.”
2. If num ref frames is 2 and sliding window size is 2, the reference picture excluding current
picture in DPB with smaller reference picture number shall be marked as “unused for reference.”
3. Otherwise, if num ref frames is 2 and sliding window size is 1, the reference picture excluding
current picture in DBP with larger reference picture number shall be marked as “unused for
reference.”
The size of motion compensation block can be 16×16, 16×8, 8×16, 8×8, 8×4, 4×8 or 4×4. If the
half_pixel_mv_flag is equal to „1‟, the precision of motion vector is up to ½ pixel, otherwise the precision
of motion vector is up to ¼ pixel [18]. When half_pixel_mv_flag is not present in the bitstream, it shall be
inferred to be “11.”
23
The positions of integer, half and quarter pixel samples are depicted in figure 13. Capital letters
indicate integer sample positions, while small letters indicate half and quarter sample positions. The
interpolated values at half sample positions can be obtained using 8-tap filter F1 =
(−1,4,−12,41,41,−12,4,−1) and 4-tap filter F2 = (−1,5,5,−1). The interpolated values at quarter sample
position can be obtained by linear interpolation.
Figure 13: The position of integer, half and quarter pixel samples [1]
According to Figure 13, half sample b is calculated as follows:
b‟ = −C+4D−12E +41F +41G−12H +4I−J
b = (b‟+32) >> 6
And half sample h is calculated as follows:
h‟ = −A+5F +5−S
h = (h‟+4) >> 3
Both b and h should be clipped to the range [0,255]
Quarter sample a is calculated as
a = (F +b+1)>>1
Interpolation of chroma sample values is shown in Figure 14. A, B, C and D are the integer
sample values around the interpolated sample. dx and dy are the horizontal and vertical distances from
predicted sample to A, respectively. The predicted sample predxy is calculated as given by equation 2.
= ((8− )(8− )A+ (8− )B+(8− ) C+ D +32)>>6, (2)
should be clipped to the range [0,255].
24
3.3 Deblocking filter [3]
AVS Part 7 makes use of a simplified deblocking filter, wherein boundary strength is decided at
MB level [3]. Filtering is applied to the boundaries of luma and chroma blocks except for the boundaries
of picture or slice. In Figure 15 and Figure 16 the dotted lines indicate the boundaries which will be
filtered. Intra prediction MB usually has more and bigger residuals than that of inter prediction MB,
which leads to very strong blocking artifacts at the same QP. Therefore, a stronger filter is applied to intra
predicted MB and a weak filter is applied to inter predicted MB. When the MB type is P Skip, there is no
coded residual. When QP is not very large, the distortion caused by quantization is relatively small,
henceforth no filtering is required.
Figure 14: Luma and chroma block edges [3]
Figure 14 shows the pixels used for sample-level deblocking filter. Different filtering processes are
applied to each sample-level boundary under different filter modes and the values of some pixels are
updated.
Figure 15: Horizontal or vertical edge of 4×4 Block [2]
3.4 Entropy Coding:
In entropy coding, the basic concept is mapping from a video signal after prediction and
transforming to a variable length coded bitstream, generally referring to two entropy coding methods,
either variable length coding or arithmetic coding. For the request of higher coding efficiency, context-
based adaptive entropy coding technique is developed and favored by current coding standards. AVS-M
uses Exp-Golomb code, as shown in Table 6 to encode syntax elements such as quantized coefficients,
macroblock coding type, and motion vectors. Eighteen coding tables are used in quantized coefficients
encoding. The encoder uses the run and the absolute value of the current coefficient to select Table 6.
25
Table 6: Kth order golomb code [5]
4.0 AVS Tools [19]:
4.1 High level tools similar to H.264/AVC:
1. Network abstraction layer (NAL) unit – structure
2. Parameter sets.
3. Instantaneous decoding refresh (IDR) picture
4. Gradual decoding refresh (GDR) or gradual random access
5. Flexible slice coding
6. Reference picture numbering
7. Non-reference P picture
8. Constrained intra prediction
9. Loop filter disabling at slice boundaries
10. Byte-stream format.
1. NAL Unit Structure: [19]
Video coding standards, earlier than H.264/AVC, use the start code based bitstream structure,
wherein the bi=stream consists of several layers, typically including several of the following: a sequence
layer, a picture layer, a slice layer, a macroblock layer, and a block layer. The bitstream for each layer
typically consists of a header and associated data, and each header of a slice or higher layer starts with a
start code for resynchronization and identification. The NAL unit structure was first introduced into
H.264/AVC. In this structure, the coded video data is organized into NAL units, each of which contains
an NAL unit header and payload. The NAL unit header contains the type of data contained in the NAL
unit. The NAL unit structure definition specifies a generic format for use in both packet-oriented and
bitstream-oriented transport systems. A series of NAL units generated by and encoder is referred to as an
NAL unit stream.
NAL unit has 6 types: [18]
picture_header_rbsp( ) of non-IDR picture
picture_header_rbsp( ) of IDR picture
slices_layer_rbsp( )
seq_parameter_set_rbsp( )
pic_parameter_set_rbsp( )
26
sei_rbsp( )
andom_access_point_indicator_rbsp( )
Table 9: NAL unit types [6]
The advantages of the NAL unit structure over the start code based structure include the
following.
The NAL unit structure provides convenient conveyance of video data for different
transport layers.
For packet-based systems, such as RTP, the transport layer can identify NAL unit
boundaries without use of start codes. Therefore, those overhead bits can be saved.
The NAL unit structure provides flexible extension capability using new NAL unit type.
If needed, start code prefixes can be used to transform an NAL unit stream to the start
code based structure.
2. Parameter Sets:
Parameter sets contain the sequence-level header information and the infrequently changing
picture-level header information. With parameter sets, the infrequently changing information is not to be
repeated for each sequence or picture, hence coding efficiency is improved. Furthermore, use of
parameter sets enables out-of-band transmission of the important header information, such that improved
error resilience is achieved. In “out-of-band” transmission, parameter set NAL units is transmitted in a
channel different than the one for transmission of other NAL units.
27
3. IDR Picture:
Support of random access is important for any video codec. In video coding standards earlier than
H. 264/AVC, intra coded pictures are used as random access points. However, because the use of multiple
reference pictures, a following an intrapicture may have inter prediction reference from a picture
preceding the intra picture in decoding order. This may make an intra picture not random accessible.
Consequently, to perform random access, it is required to check for each intra picture whether any
subsequent pictures have a reference dependency on pictures prior to the intra picture in decoding order.
This significantly increases the complexity for random access operations.
To solve the above problem and better meet the requirement, explicitly signaling of IDR pictures
is specified such that random accessible intra pictures can be easily identified. In both AVS-M and
H.264/AVC, IDR pictures are signaled by a unique NAL unit type.
4. Gradual Decoding Refresh (GDR) or Gradual Random Access
GDR enables gradual random access, wherein the decoded video is completely refreshed after a
number of decoded pictures, rather than the instantaneous decoding refresh by random accessing at an
IDR picture. A nice feature of GDR is that random access can be performed at non-IDR pictures or even
non-intra pictures. Furthermore, GDR can enable gradual random access and improved error resilience
simultaneously. In both, AVS-M and H.264/AVC, GDR or gradual random access points are signaled
using supplemental enhancement information (SEI) messages.
5. Flexible Slice Coding
Slice coding is an efficient tool to improve video error resilience. In many packet-oriented
transport systems, the optimal size of a packet and a slice in bytes is a function of the expected packet loss
rate and the underlying maximum transmission unit (MTU) size. It is beneficial to encapsulate one sliece
into one packet, because if a slice is split into multiple packets, the loss of one of these packets may
prevent decoding of the entire slice. If the slice size were to be in the granularity of MB rows and must
start from the first MB of an MB row, the sizes of the slices would likely be non-optimal. In other words,
such a slice design would prevent optimal transport.
6. Reference Picture Numbering
In AVS-M, reference pictures are labeled by the 5-bit syntax element frame_num, which is
incremented by one for each reference picture. For non-reference pictures, the value is incremented by
one is relation to the value of the previous reference picture in decoding order. This frame number enables
decoders to detect loss of reference pictures. If there is no loss of reference pictures, the decoder can go
on decoding. Otherwise, a proper action should be taken.
7. Non-Reference P Picture
In standards earlier than H.264/AVC, I and P pictures are always reference pictures, while B
pictures are always non-reference pictures. In H.264/AVC, pictures of any picture type, including B
pictures, may be reference pictures, and P pictures may be reference or non-reference pictures. The latest
AVS-M specification does not support B pictures. To enable temporal scalability with only I and P
pictures, non-reference picture has the syntax element equal to 0.
28
8. Constrained Intra Prediction
Intra prediction utilizes available neighboring samples in the same picture for prediction of the
coded samples to improve the efficiency of intra coding. In the constrained intra prediction mode,
samples from inter coded blocked are not used for intra prediction. Use of this mode can improve error
resilience. If errors happen in the reference picture, the error may propagate to the inter coded blocks of
the current picture. If constrained intra prediction were not used, then the error may also propagate to the
intra coded blocks due to intra prediction. Consequently, the intra coded blocks would not have efficiently
stopped error proportion.
9. Loop filter Disabling at Slice Boundaries
Similar to H.264/AVC, AVS-M includes a deblocking loop filter. Loop filtering is applied to
each 4by 4 block boundary. Loop filtering is the only decoding operation that makes slices in the same
picture not completely independent of each other. To enable completely independent slices in the same
picture, turning off loop filtering at slice boundaries is supported, both in H.264/AVC and AVS-M. AVS-
M further supports turning off loop- filtering completely. Having completely independent slices can
achieve perfect gradual random access based on gradual decoding refresh.
10. Byte Stream Format
The byte stream format is specified to enable transport of the video in byte or bit oriented
transport systems, where start codes are needed to detect the start of each coded slice for
resynchronization purpose. The method to transform an NAL unit stream to a byre stream is to prefix a
start code to each NAL unit. Start code emulation shall not occur in the bits of each NAL unit. Even
though start code emulation is not a problem for packet based transport systems, start code emulation
prevention is always used, because otherwise transforming NAL unit streams to byte streams would
become much more complex.
4.2 High Level Tools / Features Different Than H.264/AVC [19]
This subsection discusses the following AVS-M high-level tools or features that are different than
H. 264/AVC:
1. Picture ordering and timing
2. Random access point indicator
3. Picture header
4. Signaling of scalable points
5. Reference picture management
6. Hypothetical reference decoder (HRD)
1. Picture order and timing
The picture timing mechanism of AVS-M is similar to H.263. The syntax element
picture_distance in AVS-M is similar to the temporal reference (TR) in H.263. Picture_distance is an
indication of picture order. The value of picture_distance increases by one in relation to the previous
coded picture plus the number of skipped pictures after the previous coded picture and is then modulo
divided by 256. A difference between picture_distance and TR is than picture_distance resets to zero at
29
IDR pictures. Together with the syntax element, the time duration of the picture_distance difference of 1,
the picture_distance also indicates the picture timing.
2. Random access point indicator
AVS-M specifies the random access point indicator (RAPI) NAL unit type. An RAPI NAL unit,
if present, shall be the first NAL unit of an access unit in decoding order. An RAPI NAL unit indicates
that the access unit contains an IDR picture or a gradual random access SEI message, and the decoding of
any access unit after the RAPI NAL unit in decoding order does not need any sequence parameter set
NAL unit, picture parameter set NAL unit or SEI NAL unit prior to the RAPI NAL unit in decoder order.
The RAPI NAL unit enables easier random access operation. In order to know that an access unit
is a random access point, it is no longer necessary to buffer and parse all the NAL units in the beginning
of an access units until the first coded slice of an IDR picture or the first SEI NAL unit containing a
gradual random access SEI message. This feature is particularly useful in broadcast or multicast
applications for a client to tune in an ongoing session.
3. Picture header
AVS-M uses picture header and picture parameter set simultaneously. The main reason to
reintroduce a picture header to AVS-M was for coding efficiency purposes. A conservative estimate of
the overhead is as follows. If the picture header were not used and those picture header syntaz elements
were put in the slice header, and further assume that for the case of CIF 30fps, 256Kbps, each MB row
was coded as one slice, the additional overhead would be 4.8% of the total bitrate. For mobile video
telephony, it is reasonable to have a QCIF resolution at 15fps, 100 bytes per slice , 64Kbps, which results
in 80 slices per second. In this case, the additional overhead would be roughly 2.4% of the total bit rate.
4. Signalling of scalable points
Temporal scalability can be achieved by using nonreference pictures supported in AVS-M. With
such functionality, one AVS-M bitstream may be efficiently served to decoding devices with different
decoding capabilities or connected with different transmission bandwidth.
For example, an originating terminal creates an AVS-M that complies with a profile at a certain
level and is scalably coded in such a way that a base layer bitstream containing only the reference pictures
of the entire bitstream is compliant with a lower level. If the receiver is only capable of decoding the
lower level, the server should adapt the video bitstream accordingly. To make sure that the adapted
bitstream complies with the lower level, the server should analyze the bitstream, make necessary
transcoding, and check that the adapted bitstream conforms to requirements of the lower level. These
processing steps require a lot of computations in the server.
The above problem can be solved by signaling of the profile and level of a scalable bistreamlayer.
This signaling allows creation or modification of the compressed bitstream in such a way where the
created or modified bitstream later is guaranteed to conform to a certain pair of profile and level. The
signaling greatly simplifies and speeds up multimedia adaptation and computational scalability in local
playback or streaming applications.
5. Reference picture management
Reference picture management is one important aspect of decoded picture buffer (DPB)
management. Decoded pictures used for predicting subsequent coded pictures and for future output are
buffered in the DPB. To efficiently utilize the buffer memory, the DPB management processes, including
the storage process of decoded pictures into the DPB, the marking process of decoded pictures from the
30
DPB, shall be specified. This is particularly important when multiple reference pictures are used and
when picture output order is not decoding order.
Figure 16: Adaptive sliding window based reference picture marking process [19]
6. Hypothetical reference decoder (HRD)
In video coding standards, a compliant bit stream must be able to be decoded by a hypothetical
decoder that is conceptually connected to the output of an encoder and consists of at least a pre-decoder
buffer, a decoder, and an output/display unit. This virtual decoder is known as the hypothetical reference
decoder (HRD).
The HRD supports multiple operation points, each of which is characterized by the following
five parameters:
Bitrate, indicating the maximum input bit rate of the corresponding operation point;
CpbSize, indicating the size of CPB of the corresponding operation point;
DpbSize, indicating the size of the DPB of the corresponding operation point;
InitCpbDelay, indicating the delay for the corresponding operation point between the
time of arrival in the CPB of the first bit of the first picture and the time of removal from
the CPB of the first bit of the first picture;
InitDpbDelay, indicating the delay for the corresponding operation point between the
time of arrival in the DPB of the first decoded picture and the time of output from the
DPB of the first decoded picture.
The instances of the five parameters are signaled in the HRD buffering parameters SEI message. The
operation of each HRD operation point is as follows.
1. The buffers are initially emply.
2. Bits of coded access units enter the CPB at the rate equal to BitRate.
3. The decoding timer starts from a negative value equal to 0 when the first bit enters the
CPB. Data is not removed from the CPB if the value of the decoding timer is smaller
than 0.
4. Removal of the first coded access unit from the CBP occurs when the value of the
decoding timer is equal to 0. Removal of any other coded access unit occurs when the
value of the decoding timer is equal to the relative output time of the access unit.
5. When a coded access unit has been removed from the CPB, the corresponding decoded
picture is immediately placed into the DPB. The output timer starts from a negative
value equal to 0-InitDpbDelay when the first decoded picture enters the DPB. Data is not
outputted from the DPB if the value of the output timers is smaller than 0.
6. A decoded picture is outputted from the DPB when the value of the output timer is equal
to the relative output time of the decoded picture. If the decoded picture is a non-
lreference picture or if the decoded picture is a reference picture but marked as “unused
31
for reference”, the data of the decoded picture is also removed from the DPB when it is
outputted. When outputting a decoded picture from the DPB, decoded pictures that are
not marked as “used for reference” and the relative output times are earlier than the
value of the output timer are removed from the DPB.
5.0 AVS-M decoder [12]:
The generalized block diagram of an AVS-M decoder is presented in Figure 16.
Figure 17: Block diagram of an AVS-M decoder [12]
The decoder complexity estimation is very important and instructive for target platform-
dependent codec realization. Storage complexity and computational complexity are two major concerns
of the decoder implementation.
A. Inverse Transform and Quantization:
Only integer 4 X 4 block-based DCT is adopted in AVS-M to simplify the hardware design and
decrease the processing complexity of residuals. The inverse transform matrix is illustrated in Figure 18.
The range of the QP is extended from 0 to 63 with 8-step quantization approximately.
Figure 18: Inverse DCT matrix of AVS-M [12]
B. Interpolation:
In AVS-M, the 1/2-pixels in luma reference frame are divided into horizontal and vertical classes
with different filters, which are illustrated in table 10. For an integer pixel A, there are three 1/2-pixels,
marked bh, bv, c; and, twelve 1/4-pixels denoted as d, e,J g, h, i in Figure 19. Chroma interpolation is
omitted from this paper since it is easier to implement than luma, as it is performed by straightforward
bilinear interpolation operations.
32
Figure 19: Subpixel locations around integer pixel “A” [12]
Table 10: Interpolation filter coefficient [12]
C. In-loop deblocking:
Deblocking is performed across the edges of 4x4 block on the nearest two pixels. The filter mode
is determined by the macroblock type and the QP of current macroblock, i.e. if the macroblock is INTRA
coded, the filter is INTRA mode; and if the macroblock is not SKIP or the QP is not less than the pre-
defined threshold, the filter is INTER mode.
5.1 Error concealment tools in AVS-M decoder [17]
Error concealment in video communication is a very challenging topic. Many techniques have
been proposed to deal with the transmission error problem. Generally, all these techniques can be
categorized into three kinds: forward error concealment, backward error concealment and interactive error
concealment.
Forward error concealment refers to techniques in which the encoder plays the primary role,
which partitions video data into more than one layer with different priority. The layer with higher priority
is delivered with a higher degree of error protection. Better quality can be achieved when more layers are
received at decoder side.
Backward error concealment refers to the concealment or estimation of lost information due to
transmission errors in which the decoder fulfills the error concealment task.
The decoder and encoder interactive techniques achieve the best reconstruction quality, but are
more difficult to implement. Generally speaking, a feedback channel is needed from decoder to encoder
and low time delay should be guaranteed.
6.0 Main program flow analysis for encoder [15]:
In this section, we analyze in detail the main program flow in three key function: Main( ),
Encode_I Frame( ) ,Encode_P_Frame( ) and give flow diagram instructions [5]. This function is the AVS-
M program's main function. The main process of the main function is that the required parameters and
cache used in the entire program are allocated and initialized. And then, according to the parameters
pglmage-> type, decide on the current image I frame or P frame coding, respectively, into the I frame or P
frame coding procedures for processing. At last compensation image to return to the main function is
33
reconstructed and stored. For image motion compensation, the amount of data itself will be significantly
reduced.
Flow chart of the main() is shown in the figure 20.
Figure 20: The flow chart of main() [15]
34
Flow chart of Encode_I_Frame is shown in figure 21.
Figure 21: The flow chart of Encode_I_Frame() [15]
35
The flow chart of Encode_P_Frame() is shown in figure 22.
Figure 22: The flow chart of Encode_P_Frame()[15]
36
7.0 Performance Analysis:
Quantization parameter (QP) values ranging from 0 - 63 and AVS-china part 7 video codec was
analyzed at various quality measures like Bit Rate, PSNR and SSIM [23] were calculated. The test
sequences used were of QCIF and CIF formats. The bit rate was plotted against the PSNR and MSE.
Total 4 sequences are simulated, two from qcif format and another two from cif format.
1. Miss-america_qcif
2. Mother-daughter_qcif
3. Stefan_cif
4. Silent_cif
7.1 Simulation results of sequence miss-america_qcif:[22]
Input Sequence: miss-america_qcif.yuv
Total No: of frames: 30 frames.
Original file size : 1114Kb
Width: 176.
Height: 144.
Frame rate: 30 fps.
Figure 23 illustrates video quality of miss_america_qcif sequence at various QP values.
Original File QP =10
QP = 50 QP= 63
Fig 23: Video quality at various QP values for miss_america_qcif
37
Simulation Results for miss-america_qcif Sequence:
Table 11 shows values compressed file size, compression ratio, bit rate, PSNR and SSIM at various QP
for miss_america_qcif sequence.
Table 11: Compressed file size, compression ratio, bit rate, PSNR and SSIM at various QP for miss-
america_qcif sequence
Results in Graphical Form:
Fig 24 shows the plot of PSNR (dB) vs. Bitrate (Kbps) for miss_america_qcif.
Fig 24: PSNR (dB) vs. Bitrate (Kbps) for miss_america_qcif
QP COMPRESSED FILE SIZE [kb]
Compression Ratio
Bit Rate [Kbps] Y-PSNR[dB] Y-SSIM
0 330 3.3757:1 2698.85 54.2773 0.9972
5 201 5.5422:1 1641.18 51.3345 0.9946 10 111 10.036:1 902.83 49.1926 0.9916
20 23 48.434:1 179.76 44.9110 0.9842 30 8 139.25:1 56.80 40.7322 0.9716
40 4 278.50:1 30.66 36.0983 0.9429
50 2.80 397.85:1 22.21 30.7096 0.8869 55 2.55 436.86:1 20.17 28.0461 0.8455
60 2.40 464.16:1 18.98 25.1375 0.7999 63 2.33 478.11:1 18.42 22.0501 0.7829
38
Figure 25 shows the plot of SSIM vs. bitrate (Kbps) for miss_america_qcif.
Fig 25: SSIM vs. Bitrate (Kbps) for miss_america_qcif
7.2 Simulation results of sequence mother-daughter.qcif[21]
Input Sequence: mother-daughter_qcif.yuv
Total No: of frames: 30 frames.
Original file size : 1139Kb
Width: 176.
Height: 144.
Frame rate: 30 fps
Figure 26 illustrates video quality of mother_daughter_qcif sequence at various QP values.
Original File QP = 10
39
QP = 50 QP=63
Fig 26: Video quality at various QP values for mother_daughter_qcif
Results for mother-daughter_qcif Sequence:
Table 12 shows the values of compressed file size, compression ratio, bit rate, PSNR and SSIM at various
QP for mother-daughter_qcif sequence.
Table 12: Compressed file size, compression ratio, bit rate, PSNR and SSIM at various QP for mother-
daughter_qcif sequence
QP COMPRESSED FILE SIZE [kb]
Compression Ratio
Bit Rate [Kbps] Y-PSNR [dB]
Y-SSIM
0 237 4.8059:1 1937.45 53.97741 0.9981
5 127 8.9685:1 1037.33 51.1885 0.9964 10 63 18.0794:1 514.99 48.8490 0.9945
20 19 59.9474:1 152.79 43.4243 0.9856 30 8 142.3750:1 60.56 38.4661 0.9617
40 4 284.7500:1 31.99 33.7337 0.9030
50 2.79 408.2437:1 22.15 29.4328 0.8023 55 2.56 444.9219:1 20.29 26.0557 0.6981
60 2.38 478.5714:1 18.83 23.0817 0.6371 63 2.16 527.3148:1 18.53 22.3413 0.6221
40
Results in Graphical Form:
Figure 27 shows the plot of PSNR (dB) vs. Bitrate (Kbps) for mother_daughter_qcif.
Fig 27: PSNR (dB) vs. Bitrate (Kbps) for mother_daughter_qcif
Figure 28 shows the plot of SSIM vs. Bitrate (Kbps) for mother_daughter_qcif
Fig 28: SSIM vs. Bitrate (Kbps) for mother_daughter_qcif
41
7.3 Simulation results of sequence sefrat_cif:[21]
Input Sequence: stefan_cif.yuv
Total No: of frames: 15 frames.
Original file size : 2227.5 Kb
Width: 352
Height: 288.
Frame rate: 30 fps.
Figure 29 illustrates video quality of Stefan_cif sequence at various QP values
Original File QP = 10
QP = 50 QP=63
Fig 29: Video quality at various QP values for stefan_cif
42
Simulation Results for stefan_cif Sequence
Table 13 shows the values of compressed file size, compression ratio, bit rate, PSNR and SSIM at various
QP for Stefan_cif sequence.
Table 13: Compressed file size, compression ratio, bit rate, PSNR and SSIM at various QP for stefan_cif
sequence
Results in Graphical Form:
Figure 30 shows the plot of PSNR (dB) vs. Bitrate (Kbps) for stefan_cif.
Fig 30: PSNR (dB) vs. Bitrate (Kbps) for stefan_cif
QP COMPRESSED FILE SIZE [kb]
Compression Ratio
Bit Rate [Kbps] Y-PSNR[dB] Y-SSIM
0 1082 2.0587:1 17722.04 53.7192 0.9987
5 810 2.7500:1 13257.49 50.5553 0.9973 10 588 3.7883:1 9616.75 48.0813 0.9953
20 270 8.2500:1 4419.03 41.8208 0.9884 30 107 20.8178:1 1749.46 36.0297 0.9742
40 41 54.3293:1 655.40 30.7737 0.9403
50 19 117.2368:1 309.20 25.9537 0.8419 55 15 148.5000:1 233.75 23.6506 0.7556
60 11 202.5000:1 177.33 20.7062 0.5875 63 10 222.7500:1 151.39 19.0242 0.4688
43
Figure 31 shows the plot of SSIM vs. Bitrate (Kbps) for stefan_cif.
Fig 31: SSIM vs. Bitrate (Kbps) for stefan_cif
7.4 Simulation results of sequence silent_cif[21]
Input Sequence: silent_cif.yuv
Total No of frames: 15 frames.
Original file size : 2227.5Kb
Width: 352.
Height: 288.
Frame rate: 30 fps.
Fig 32 illustrates video quality of silent_cif sequence at various QP values.
44
Original File QP = 10
QP = 50 QP=63 Fig 32: Video quality at various QP values for silent_cif
Simulation Results for silent_cif Sequence:
Table 14 shows the values of compressed file size, compression ratio, bit rate, PSNR and SSIM at various
QP for silent_cif sequence.
Table 14: Compressed file size, compression ratio, bit rate, PSNR and SSIM at various QP for silent_cif
QP COMPRESSED FILE SIZE [kb]
Compression Ratio
Bit Rate [Kbps] Y-PSNR[dB] y-SSIM
0 592 3.7627:1 9694.90 53.5225 0.9982
5 357 6.2395:1 5836.70 50.6134 0.9965 10 199 11.1935:1 3244.26 47.7497 0.9934
20 66 33.7500:1 1076.57 41.8517 0.9769 30 28 79.5536:1 445.79 36.5718 0.9329
40 13 171.3462:1 206.23 32.0900 0.8498
50 8 278.4375:1 121.74 28.1807 0.7315 55 7 318.2143:1 101.13 25.9689 0.6612
60 5.54 402.0758:1 89.21 23.8874 0.6125 63 5.25 424.2857:1 84.36 22.1109 0.5366
45
Results in Graphical Form:
Figure 33 shows the plot of PSNR (dB) vs. Bitrate (Kbps) for silent_cif.
Fig 33: PSNR (dB) vs. Bitrate (Kbps) for silent_cif
Figure 34 shows the plot of SSIM vs. Bitrate (Kbps) for silent_cif.
Fig 34: SSIM vs. Bitrate (Kbps) for silent_cif
46
8.0 Conclusions:
The design of AVS-M encoder, decoder and major AVS-M tools is described in detail. Such
design makes AVS-M applicable in mobile environment. In this project, AVS encoder and decoder are
implemented using AVS M software. Tests are carried out on various QCIF and CIF sequences. The bit
rate, PSNR and SSIM values are tabulated. The performance of AVS-china is analyzed by varying the
quantization parameter (QP). The PSNR and bit rate and SSIM are calculated. It can be observed that at
higher QP the performance is best but decoded file is size is also large. As QP decreases quality of video
and size of video decreases. As AVS-M targets majorly to cell phone devices, the resolution and clarity of
the picture is very low compared to AVS part 2.
References:
[1] J.Ostermann et al, “Video coding with H.264/AVC: Tools, Performance, and Complexity”, IEEE
Circuits and Systems Magazine, vol. 4, Issue:1, pp. 7 – 28, Aug. 2004.
[2]B. Tang, Y. Chen and W. Ji “AVS Encoder Performance and Complexity Analysis Based on Mobile
Video Communication”, DOI 10.1109/CMC.2009.171, 2009 International Conference on
Communications and Mobile Computing
[3]L. Fan, S. Ma, F. Wu, “Overview of AVS Video Standard”, IEEE International Conference on
Multimedia and Expo (ICME), pp 423-427, 2004
[4 ] AVS working group official website, http://www.avs.org.cn
[5] H. Tiejun “AVS – Technology, IPR and Applications”, available at, www.avs.org.cn/reference/AVS
进展(20101112.ppt
[6]S. Devaraju and K.R. Rao, “A Study on AVS-M Video Standard”, M.S. Thesis, Electrical Engineering
Department, University of Texas at Arlington, Arlington, TX, 2009.
[7] W. Gao et al, “AVS– the Chinese next-generation video coding standard,” National Association of
Broadcasters, Las Vegas, 2004.
[8] K. R. Rao and D.N. Kim, “Current Video Coding Standards: H.264/AVC, Dirac, AVS China and VC-
1”, IEEE 42nd Southeastern symposium on system theory (SSST), pp. 1-8,March 2010.
[9] W.Gao and T.Huang “AVS Standard -Status and Future Plan”, Workshop on Multimedia New
Technologies and Application, Shenzhen, China, Oct. 2007.
[10] L.Fan, “Mobile Multimedia Broadcasting Standards”, ISBN: 978-0-387-78263-8, Springer US,
2009.
[11]W. Gao, K.N. Ngan and L. Yu, “Special issue on AVS and its applications: Guest editorial”, Signal
Process: Image Commun, vol. 24, Issue 4, pp. 245-344, April 2009.
47
[12] Z. Ma et al, “Complexity analysis OF AVS-M jiben profile decoder”, Proceedings of 2005
International Symposium on Intelligent Signal Processing and Communication Systems, December 13-
16, 2005 Hong Kong
[13] M. Liu and Z. Wei, “A fast mode decision algorithm for intra prediction in AVS-M video coding”
Vol. 1, ICWAPR apos;07, Issue, 2-4, pp.326 –331, Nov. 2007.
[14] Y. Cheng et al, “Analysis and application of error concealment tools in AVS-M decoder”, Journal of
Zhejiang University –Science A, vol. 7, pp. 54-58, Jan 2006.
[15] S.Hu, X.Zhang and Z.Yang, “Efficient Implementation of Interpolation for AVS”, Congress on
Image and Signal Processing, 2008. Vol 3, pp133 –138, 27-30 May 2008
[16] R. Schafer and T. Sikora,“Digital video coding standards and their role in video communications”,
Proc. of the IEEE, vol. 83, pp. 907-924, June 1995.
[17] B. Lei et al, “Optimization and Implementation of AVS-M Decoder on ARM”, DOI
10.1109/ICIG.2007.166, IEEE 2007
[18] L. Yu, “AVS Project and AVS-Video Techniques”, http://www-
ee.uta.edu/dip/Courses/EE5351/ISPACSAVS.pdf, Dec.13, 2005 ISPACS 2005
[19]Y. Wang ” AVS_M: From standards to Applications”, Journal of Computer Science and Technology
- Special section on China AVS standard Vol.21. No.3 pp. 332-344, May 2006.
[20] MSU video quality measuement tool,
http://www.softrecipe.com/Download/msu_video_quality_measurement_tool.html
[21] MATLAB software, http://www.mathworks.com/products/matlab/tryit.html
[22] Test video sequences http://trace.eas.asu.edu/yuv/
[23] Z. Wang et al, "Image quality assessment: From error visibility to structural similarity," IEEE Trans.
Image Processing, vol. 13, no. 4, pp. 600-612, Apr. 2004
[24] S.Kwon, A. Tamhankar and K.R. Rao, “Overview of H.264 / MPEG-4 Part 10”, J. Visual
Communication and Image Representation, vol. 17, pp.186-216, April 2006.