HEVC-SIM: A simulation and analysis tool for H.265 packet...

HEVC-SIM:

A simulation and analysis tool for H.265 packet

transmission

By

Mohammad Manjur Rashed Khan

B.Sc., Bangladesh University of Engineering and Technology (BUET), 2003

Project report Submitted in Partial Fulfillment of the

Requirements for the Degree of

Master of Engineering

In the

School of Engineering Science

Faculty of Applied Science

Mohammad Manjur Rashed Khan 2015

SIMON FRASER UNIVERSITY

Fall 2015

ii

Approval

Name: Mohammad Manjur Rashed Khan

Degree: Master of Engineering

Title: HEVC-SIM: A simulation and analysis tool for H.265 packet transmission

Supervisory Committee: Chair: Dr. Jie Liang Professor

Dr. Ivan V. Bajić Senior Supervisor Associate Professor

Thomas Pang Supervisor Senior Manager Software Development Broadcom Canada.

Date Defended/Approved:

October 9, 2015

iii

Abstract

Video compression and decompression are essential parts of multimedia

communications. Lowering the bit rate while maintaining good video quality remains a

considerable challenge with increasing video resolutions and frame rates. In particular,

the introduction of 4K and 8K video resolution has created a significant demand for

efficient video compression appropriate for these high resolutions. The International

Telecommunication Union (ITU) and the Moving Picture Experts Group (MPEG) have

introduced High-Efficiency Video Coding (HEVC, also known as H.265) that has better

compression efficiency at the same or higher video quality, compared to earlier video

coding standards. In this project, a video coding and simulation tool was developed based

on open-source implementations of HEVC encoders and decoders. The tool, named

HEVC-SIM, can be used to encode raw video, analyze the packet structure and simulate

packet loss. In addition, two error concealment methods were implemented to help the

video decoder recover from packet loss. Simulations were carried out to assess the

effectiveness of the various algorithms implemented in HEVC-SIM.

iv

Dedication

I would like to dedicate this report to my wife Shifa and my daughter Aarisha. Aarisha was

born at the time I was doing this project. She came into our life with joy and happiness.

My wife Shifa provided me positive support to complete this project. And I am also grateful

to my parents for giving me lots of encouragement.

I also dedicate this work to Thomas pang who always see everything positively and always

tries to find good from it. This is a perfect example for me to improve people skill.

I also dedicate this work to my professor Ivan V. Bajić who helped me by giving clear

direction on this project and providing his suggestion and advice.

Finally I dedicate this work to all Mighty who creates every single thing for a purpose.

v

Acknowledgements

I would like to thank Dr. Ivan V. Bajić, Associate Professor of SFU, for supervising this

project and providing his guidance on improving it. He made me aware of the HM

reference code for HEVC, as well as x265, an efficient implementation of HEVC. He also

explained the 2-state Markov chain model, as well various approaches for packet loss

concealment. I would also like to thank Thomas Pang, Sr. Manager of Broadcom who

provided his valuable feedback and suggestion.

I would like to thank my wife Sayeeda Sharmin Shifa to co-operate and support me to

complete this project when she had to manage our little daughter Aarisha. She also

provided suggestion to write this report.

vi

Table of Contents

Approval .............................................................................................................................ii Abstract ............................................................................................................................. iii Dedication .........................................................................................................................iv Acknowledgements ........................................................................................................... v Table of Contents ..............................................................................................................vi List of Tables .................................................................................................................... vii List of Figures.................................................................................................................. viii List of Acronyms ................................................................................................................ix

Chapter 1. Introduction ............................................................................................... 1

Chapter 2. HEVC-SIM Software Description.............................................................. 7

Chapter 3. Result and Analysis ................................................................................ 19

Chapter 4. Conclusion .............................................................................................. 30

References .................................................................................................................. 31

vii

List of Tables

Table 1: The PSNR values in dB for the four frames in the example .............................. 25

Table 2: HM and x265 encoder execution time comparison ........................................... 29

viii

List of Figures

Figure 1: The YUV420 format, from [3] ............................................................................. 3

Figure 2: Typical HEVC video encoder (with decoder modeling elements shaded in light gray) from [1]. ................................................................................. 4

Figure 3: HEVC bitstream structure. ................................................................................. 6

Figure 4: HM encoder user interface ................................................................................. 8

Figure 5: x265 encoder user interface .............................................................................. 9

Figure 6: Packet loss simulation user interface ............................................................... 10

Figure 7: Gilbert-Elliott (2-state Markov) model, from [10] .............................................. 11

Figure 8: User-supplied packet loss file for packet loss simulation ................................. 12

Figure 9: Decoder user interface ..................................................................................... 12

Figure 10: Packet loss concealment and PSNR calculation ........................................... 13

Figure 11: Illustration of “CTU MV copy” concealment ................................................... 14

Figure 12: Motion vector file format ................................................................................. 15

Figure 13: Video preview window ................................................................................... 16

Figure 14: List window displays frame type (e.g., I or P) ................................................ 17

Figure 15: Output window shows the contents of VPS NAL unit .................................... 17

Figure 16: Output window shows the contents of SPS NAL unit .................................... 18

Figure 17: Output window shows the contents of PPS NAL unit .................................... 18

Figure 18: Output window shows decoding progress of slice NAL units ......................... 18

Figure 19: Frame 3 decoded without any packet loss. .................................................... 21

Figure 20: Frame 3 with 8 CTUs lost and no concealment ............................................. 22

Figure 21: Frame 3 with 8 CTUs missing and "CTU copy" concealment ........................ 23

Figure 22: Frame 3 with 8 CTUs lost and "CTU MV copy" concealment ........................ 24

Figure 23: PSNR (dB) vs. packet loss percentage for Foreman ..................................... 26

Figure 24: PSNR (dB) near 2% packet loss percentage for Foreman ............................ 27

Figure 25: PSNR (dB) vs. packet loss percentage for Flower Garden ............................ 28

ix

List of Acronyms

HEVC High Efficiency Video Coding

AVC Advanced Video Coding

CTU Coding Tree Unit

CTB Coding Tree Block

CU Coding Unit

CB Coding Block

PU Prediction Unit

PB Prediction Block

TU Transform Unit

TB Transform Block

AMVP Advance Motion Vector Prediction

MV Motion Vector

CABAC Context Adaptive Binary Arithmetic Coding

NAL Network Abstraction Layer

SEI Supplemental Enhancement Information

VPS Video Parameter Set

SPS Sequence Parameter Set

PPS Picture Parameter Set

VCL Video Coding Layer

PSNR Peak signal to noise ratio

RGB Red, Green and Blue

YUV Y is for Luminance and UV is chrominance

1

Chapter 1. Introduction

1.1 Background

Internet users have significantly increased their demand for video content.

YouTube and Netflix are some of the prime examples that demonstrate how people are

using the Internet to stream videos, thus adding to the Internet traffic. It is estimated [14]

that by 2018, video will account for 75% of the Internet traffic. Moreover, the 4K (Ultra-HD)

and 8K resolution video will become the broadcasting standards of the future. For these

reasons, there will be additional pressure to develop video codecs with higher coding

efficiency that deliver lower bit rates and higher video quality. As the current answer to this

increase in video demand, High Efficiency Video Coding (HEVC, also known as H.265)

comes as a new coding standard with 50% lower bit rate at the same video quality

compared to the earlier standard H.264 [1].

The HEVC standard has been designed to significantly reduce the bit rate for the

same video quality, especially at high resolutions. It can support video resolutions from

320240 to 81924320 (widthheight in pixels). The International Telecommunication

Union (ITU) and the Moving Picture Experts Group (MPEG) finalized this standard on

January 25, 2013. The scalable coding extension, multi-view extensions, and 3D

extensions have also been developed based on HEVC. The main coding tools of HEVC

are still based on Inter/Intra picture prediction, motion vector prediction and coding, spatial

transform, loop and deblocking filters, as its predecessor H.264. The H.264 uses fixed

sized macro block (MB) for each frame, whereas the HEVC uses variable size block which

is renamed as coding tree unit (CTU). This variable size coding tree unit helps to improve

coding efficiency by allowing the coding block to be tailored to video content.

2

The primary objective of this project is to develop a HEVC simulation application

based on the HEVC open source code. This involves understanding the different parts of

the HEVC standard, its various coding tools, video formats and source code. An additional

goal was to implement packet loss concealment, which is not part of the standard. The

developed application is referred to as HEVC-SIM.

HEVC-SIM is a simulator application built upon the HM reference software for

HEVC encoding and decoding, as well as an efficient HEVC encoder implementation

called x265. The HM encoder/decoder is relatively slow since it was developed primarily

for research purposes and standardization record-keeping, and does not involve any

hardware acceleration. On the other hand, the x265 encoder uses x86 hardware-

accelerated encoding. HEVC-SIM enables users who are not familiar with video coding in

general, and HEVC in particular, to easily encode raw video, analyze the bitstream, and

then simulate packet loss. Several packet loss concealment methods are implemented so

that user can investigate their performance on various video sequences.

1.2 Video format

The input of the HEVC encoder is a sequence of raw video frames. In video coding

applications, the input video color space is usually YUV, where Y denotes the luminance

component, while U and V are chrominance components. Conversion formulae between

RGB and YUV color spaces are shown below [15].

From RGB to YUV:

Y 0.299R 0.587G 0.114B

U 0.492 B Y

V 0.877 R Y

(1)

This can also be represented as:

3

Y 0.299R 0.587G 0.114B

U 0.147R 0.289G 0.436B

V 0.615R 0.515G 0.100B

(2)

From YUV to RGB:

R Y 1.140V

G Y 0.395U 0.581V

B Y 2.032U

(3)

Depending on the sampling, the YUV format comes in several flavors such as

YUV444, YUV422 and YUV420. The default input format for the main profile of HEVC is

YUV420. This is indicated by the value 1 in the chroma_format_idc field of the Sequence

Parameter Set (SPS) packet. The resolution of U and V components in the YUV420 format

is one quarter of the Y component resolution, half vertically and half horizontally. The

following figure from [3] illustrates the YUV420 format.

Figure 1: The YUV420 format, from [3]

4

1.3 High Efficiency Video Coding (HEVC)

The HEVC video coding layer has a number of processing blocks, as depicted in

Figure 2 below. The raw video frame is split into blocks, and each block region is passed

through various elements of the encoder. The first picture is encoded using intra-picture

prediction, and is referred to as I-Frame. The remaining pictures in the same group can

be encoded using inter-picture prediction, where prediction can be causal (P-Frames) or

bi-directional (B-Frames). The inter-picture prediction employs selected reference pictures

and motion vectors (MVs) that indicate the motion of various blocks.

Figure 2: Typical HEVC video encoder (with decoder modeling elements shaded in

light gray) from [1].

The following is some important terminology of HEVC.

5

Coding tree unit (CTU) and coding tree block (CTB): The CTU is the basic block

of pixels, similar to the macroblock in H264. The frame is divided into CTUs of

size 6464, 3232 or 1616. The CTU has three Coding tree blocks (CTBs) -

one for luminance and two for chrominance components.

Coding unit (CU) and coding block (CB): CTU is divided into one luminance

coding block (CB) and two chrominance CBs, each formed as a quad tree.

These CBs together are called coding unit (CU).

Prediction unit (PU) and prediction block (PB): The decision whether inter-

picture or intra-picture prediction will be used is decided at the CU level. The

CU is partitioned into prediction units (PUs). The CBs are split prediction blocks

(PBs) for luminance and chrominance components. The size of PB varies from

6464 to 44.

Transform unit (TU) and transform block (TB): A transform unit (TU) tree has

root at the CU level and it performs a spatial transform. The luminance and

chrominance part of the TU are called transform blocks (TBs). The TB size can

be 44, 88, 1616, or 3232.

Motion vector signaling: The new algorithm called Advanced Motion Vector

Prediction (AMVP) is used on nearest PBs for MV prediction.

Entropy coding: The entropy coding algorithm used in HEVC is Context

Adaptive Binary Arithmetic Coding (CABAC), similar to H.264.

In-loop deblocking filter: The in-loop deblocking filter is used within the inter-

picture prediction loop.

The NAL units in the HEVC bitstream are separated by certain reserved bit

sequences, specifically 0x000001 (3 Bytes) or 0x00000001 (4 Bytes). The bitstream

contains various parameters, organized into parameter sets, and the encoded pixel values

organized into slices. The structure of the bitstream is illustrated in Figure 3.

6

Figure 3: HEVC bitstream structure.

The parameter sets are as follows:

Video parameter set (VPS) contains the information of various layers, for

example in scalable or 3D video coding extensions.

Sequence parameter set (SPS) contains the information of temporal sub-

layers, color bit depth, coding block size, and so on.

Picture parameter set (PPS) contains the information of the initial quantization

parameter (QP), resolution, number of tiles, and so on.

Slice header contains the frame index (a.k.a., picture order count), the address

of the first CTU, and so on.

These parameters enable the HEVC decoder to interpret and decode the video

bitstream.

VPS SPS PPS Slice header Slice data

7

Chapter 2. HEVC-SIM Software Description

The HEVC-SIM software application was built upon open-software

implementations of HEVC encoder and decoder. For the encoder, the source code was

taken from HM 16.3 [4] and the API was taken from the x265 project [6]. The decoder

source code was taken from HM 16.3. The purpose of HEVC-SIM is to allow the user to

explore various HEVC encoding options and compare the speed of the HM and the x265

video encoders. In addition, packet loss concealment was implemented to allow one to

study the effects of packet loss on the decoded video quality. The remainder of this chapter

describes the various features of HEVC-SIM.

2.1 HM encoder

The HM encoder is used to compress YUV420 video. The user interface was

developed for the HM encoder to allow the user to control the following.

Video resolution (width and height)

Target bitrate

Quantization profile for quantization parameters

Intra period of I-frames

Number of frames to be encoded

Maximum Coding Unit size

Maximum depth of Coding Unit

Slice size determined by CTU, bytes, or tiles.

Various options for motion estimation, such as full search, diamond, or

PMVFAST using a search range.

An illustration of the user interface for the HM encoder is given in Figure 4.

8

Figure 4: HM encoder user interface

2.2 x265 encoder

The x265 encoder is considerably faster than the HM encoder on x86 PCs,

because it uses hardware acceleration for heavy computations. The user interface

developed for the x265 encoder is illustrated in Figure 5 and allows the user to control the

following encoding options:

Video resolution (width and height)

Target bitrate

Quantization profile for quantization parameters

Number of frames to be encoded

9

Figure 5: x265 encoder user interface

2.3 Packet loss simulation

In practice, the encoded video is transmitted to the user through an imperfect

channel, often through a packet-switched network, where packets are prone to errors and

loss. To allow one to study the effects of packet loss on the decoded video, HEVC-SIM

provides packet loss simulation to emulate the channel, as well as packet loss

concealment for the video decoder.

To simulate packet loss, HEVC-SIM takes an encoded HEVC video file as input,

removes some slices from it, and writes the remaining slices into another file. The output

file has some slices missing, compared to the input file, which simulates the effects of

packet loss. The removal of the slices can be done with one of the following two options.

Randomly, using a Gilbert-Elliott [12] (2-state Markov) packet loss model.

With a user-supplied list of packets.

The user interface of the packet loss simulation module is shown in Figure 6 below.

10

Figure 6: Packet loss simulation user interface

The Gilbert-Elliott model is a 2-state Markov model, often used to simulate bursty

errors or losses [10]. It is shown in Figure 7. The two states are termed good (G) state and

bad (B) state. Transition from good to bad state implies packet loss and staying in the bad

state means the same. Transition from bad to good state implies no packet loss and

staying in the good state means the same.

If is the probability of transition from good to bad state, and is the probability of

transition from bad to good state, then the state transition matrix is given by

11

((4)

11

Figure 7: Gilbert-Elliott (2-state Markov) model, from [10]

Probabilities and can be set in the user interface (Figure 6). It is often useful to

translate the model's probabilities and to the parameters of a burst packet loss process.

Let P be the probability of packet loss and L be the average burst length. Then, following

[11], these can be related to the probabilities and as follows:

P (5)

and

L1 (6)

To simulate packet loss caused by a different, possibly more complicated loss

process, the user can also specify which slices/packets to drop. This can be accomplished

by providing packet indices (one per line) in a text file called pkt_drop_index.txt in

the same directory where the HEVC-SIM executable is located. When presented with such

a file, the simulator will read packet indices from the file and remove the corresponding

slices from the bitstream. An illustration of the user-supplied packet loss file is shown in

Figure 8.

G B

1 1

12

Figure 8: User-supplied packet loss file for packet loss simulation

2.4 HM decoder

The HEVC decoder for the HEVC-SIM application was built upon the HM decoder

[4]. In its basic form, it takes a loss-free HEVC file, decodes it, and generates a YUV420

formatted file that can be played using a YUV Player. The corresponding user interface is

shown in Figure 9. The more challenging case, when the encoded file has been subject

to packet loss, is described in the following section.

Figure 9: Decoder user interface

2.5 Packet loss concealment

13

The user interface for the packet loss concealment module is illustrated in Figure

10. Several options have been provided for decoding the video files that have been subject

to packet loss:

No concealment ("No copy" in the user interface)

Previous CTU repeat ("CTU copy" in the user interface)

CTU repeat with motion vector ("CTU MV copy" in the user interface)

Figure 10: Packet loss concealment and PSNR calculation

The “no concealment” option means that no attempt is made to reconstruct the

pixel values in the missing packets/slices. These pixel values are simply set to zero, which

will result in them being displayed as green color. The “previous CTU repeat” option means

that the pixel values in the lost part of the frame are recovered by copying them from the

same position in the previous frame. The “CTU repeat with motion vector” option means

that the pixel values in the lost part of the frame are recovered by copying the equal region

of CTU size that is offset from the current position by a suitably chosen MV. In the current

implementation, the MV is simply taken from the nearest available CTU in the vertical

direction, but other options could be examined in the future. Figure 11 shows how previous

CTU with MV is copied to the current frame. Here, and are the coordinates of

the upper-left corner of a CTU while and are the motion vector components.

14

Figure 11: Illustration of “CTU MV copy” concealment

After choosing the concealment option, user can decode the video file either using

the decoder tab or using the analyzer. Also, the Peak-Signal-to-Noise-Ratio (PSNR, see

Chapter 3) of the decoded video can be calculated.

2.6 Motion vector output

Current frame

Orange color represents lost CTUs

Arrow shows the copy CTU from previous frame

Previous frame

Coding Tree Unit

The MV is taken from the nearest available CTU in the vertical direction.

15

Motion vectors from the compressed bitstream can be useful in a number of video

processing tasks, such as global motion estimation [16], motion segmentation [17],

tracking [18], visual saliency estimation [19, 20], and so on. HEVC-SIM allows the motion

vector of each PU to be saved into the MotionVector.txt file in the output directory.

The format of the file is given in Figure 12 below. parameter m_ctuRsAddr shows the

CTU index, uiNumPartition shows the index of the partition within each CTU, and POC

is the picture order count. The vector format is , , where represent the

horizontal component and is the vertical component.

Figure 12: Motion vector file format

2.7 Preview using OpenGL

The video can be displayed frame by frame on the preview window within HEVC-

SIM's user interface, shown in Figure 13 below. This helps to visualize the effects of

compression, and possible packet loss and error concealment. From this window it is also

possible to initiate stream analysis that displays information about various parameter sets

such as VPS, PPS, SPS, as described next.

TComCUMvField: m_ctuRsAddr=9, m_uiNumPartition=256, POC=1

nTComCUMvField: ref pic number=0

TComCUMvField:dumpMV m_uiNumPartition=256:

(-2,3) (-2,3 )(-2,3)…

16

Figure 13: Video preview window

2.8 NAL unit description

The HEVC-SIM can parse VPS, SPS and PPS NAL units and display the content

of their various fields on the output window. It also displays the first and last CTU index

for each slice or frame. Figures 15 – 18 show the information from these NAL units.

17

Figure 14: List window displays frame type (e.g., I or P)

Figure 15: Output window shows the contents of VPS NAL unit

18

Figure 16: Output window shows the contents of SPS NAL unit

Figure 17: Output window shows the contents of PPS NAL unit

Figure 18: Output window shows decoding progress of slice NAL units

19

Chapter 3. Result and Analysis

This section presents the results and analysis of packet loss concealment in

HEVC-SIM. The tests were done on Intel Core i5 CPU running Windows 7. The packet

loss is introduced into the bitstream by the user-supplied list of lost packets, or the Gilbert-

Elliot packet loss model. The concealment is performed using one of the options described

in Section 2.5.

Peak Signal-to-Noise Ratio (PSNR) is commonly used to measure the objective

quality of video. It is calculated based on the mean square error (MSE). The definition of

PSNR in dB is given in equation (8). The and are the width and height of the frame,

respectively, is original pixel value, and is the reconstructed pixel value, where the

reconstruction is accomplished by decoding and/or error concealment. MAX is the

maximum possible pixel value, which is 255 for 8-bit images.

MSE 1

, ,

((7)

PSNR 10 ∙ logMAXMSE

((8)

If the decoded and original pixel values ( and ) are the same for each pixel, the

MSE will be zero and the PSNR will be infinite – this would indicate a perfect match

between the original and decoded video. Lower PSNR values indicate degradation, due

20

to quantization and possibly packer loss. The analysis is done by comparing the PSNR of

decoded video in different scenarios, for example without packet loss, with packet loss but

no concealment, with “CTU copy” concealment and “CTU MV copy” concealment.

3.1 Visual quality

The next test is performed on a CIF-resolution (352 288 Foreman sequence,

with frame rate of 30 frames per second (fps), encoded at 800 kbps. Four figures are

shown below to illustrate the test. In Figure 19, decoded frame 3 is shown without any

packet loss. Figure 20 shows the same frame, decoded without 8 CTUs, which have been

assumed lost. No concealment has been applied, so the pixels belonging to the lost CTUs

are shown as green.

Figure 21 shows the same frame, but with "CTU copy" applied to recover pixel

values in the missing CTUs. Finally, Figure 22 shows the same frame with "CTU MV copy"

concealment applied, where the MV is taken from the nearest available vertically

neighbouring CTU. In this particular example, due to low motion intensity in the sequence,

both concealment methods produce visually good results. It is hard to distinguish the

concealed frames (Figures 21-22) from the error-free frame in Figure 19.

21

Figure 19: Frame 3 decoded without any packet loss.

22

Figure 20: Frame 3 with 8 CTUs lost and no concealment

23

Figure 21: Frame 3 with 8 CTUs missing and "CTU copy" concealment

24

Figure 22: Frame 3 with 8 CTUs lost and "CTU MV copy" concealment

3.2 PSNR results

The following table shows the PSNR values of video frames from the example

above. Frame 3 suffers from packet loss, which is evident in the table as the PSNR of this

frame is much lower than that of frame 2. Note that concealment (both with and without

MV) gives much better PSNR than no concealment. Also note that frame 4, which did not

suffer from packet loss, still has considerably lower PSNR than frames 1 and 2. This effect

is known as error propagation and is caused by predictive encoding of frames using earlier

frames as the reference.

25

Frame Loss-free No concealment CTU copy CTU MV copy

1 49.60 49.60 49.60 49.60

2 42.62 42.62 42.62 42.62

3 38.20 9.94 31.34 32.62

4 40.07 11.03 31.20 33.69

Table 1: The PSNR values in dB for the four frames in the example

To get a better idea of the average behavior of the error concealment schemes,

Gilbert-Elliott packet loss model was applied to two video sequences, Foreman and Flower

Garden, both with CIF resolution at 30 fps. They were encoded at 800 kbps using the

IPPP… group of pictures (GOP) structure, with the first frame being intra-coded and all

subsequent frames being inter-coded as P-frames. The packet loss probability P was

taken in the range from 0 to 0.2 and the average burst length L was taken to be 5 packets.

Figures 23 and 25 show the average PSNR vs. packet loss percentage for these two

sequences. Error bars along the PSNR curves indicate the standard deviation of PSNR

for each packet loss percentage. This result was obtained by running 8 packet loss

realizations over 300 frames of each sequence.

The graphs show that PSNR is reducing as the packet los percentage increases,

as expected. It is also seen that concealment can give about 5-10 dB advantage in PSNR

over no concealment. Figure 23 shows the PSNR of CTU copy and CTU MV copy are

very close. Figure 24 zooms into the region near 2% packet loss, where it is seen that on

average, CTU MV copy has a 0.1 dB advantage over CTU copy. This indicates that a

simple copying of the MV from the nearest available vertically neighboring CTU is not a

particularly successful strategy for exploiting motion information for error concealment,

and that other strategies should be explored to further improve the performance. Another

factor that brings the PSNR down is the IPPP… GOP structure. In practice, the system

would likely need to refresh the coding loop with an I-frame at regular intervals, or when

packet loss is detected.

26

Figure 23: PSNR (dB) vs. packet loss percentage for Foreman

Packet loss in percent0 5 10 15 20 25

PS

NR

0

5

10

15

20

25

30

35

40

45Packet loss rate vs PSNR (foreman cif.yuv)

no errorNo concealmentCTU copyCTU copy with MV

27

Figure 24: PSNR (dB) near 2% packet loss percentage for Foreman

Figure 25 shows the corresponding PSNR vs. packet loss percentage curves for

Flower Garden. Similar conclusions can be drawn from this figure – concealment provides

significant improvement compared to no concealment, while CTU copy and CTU MV copy

offer similar performance.

1.9 1.95 2 2.05 2.1 2.15 2.2 2.25

35.8

36

36.2

36.4

36.6

36.8

Packet loss in percent

PS

NR

Packet loss rate vs PSNR (foreman cif.yuv)

no error

No concealmentCTU copy

CTU copy with MV

28

Figure 25: PSNR (dB) vs. packet loss percentage for Flower Garden

3.3 Encoder complexity

HM encoder is rather slow, as it does not have hardware acceleration. On the other

hand, x265 incorporates x86 hardware acceleration, which makes it much faster than the

HM encoder. Below is the side by side comparison between the HM encoder and the x265

encoder for encoding 10 frames (YUV 4:2:0, CIF resolution) of three popular video

sequences. The target bitrate was 800 kbps for all three sequences.

Packet loss in percent0 5 10 15 20 25

PS

NR

0

5

10

15

20

25

30Packet loss rate vs PSNR (flower cif.yuv)

no errorNo concealmentCTU copyCTU copy with MV

29

Sequence HM Encoder (seconds) x265 Encoder (seconds)

Foreman 926.466 0.811

Mobile Calendar 717.088 1.168

News 758.475 0.729

Table 2: HM and x265 encoder execution time comparison

From the above table, we see the encoding does not take exactly same time for

each frame - it varies depending on the motion and texture complexity of the sequence.

The results show that the x265 encoder is roughly three orders of magnitude faster than

the HM encoder, which is expected given its optimized implementation.

30

Chapter 4. Conclusion

HEVC-SIM is a software tool that can help to understand and explore HEVC

encoder and decoder capabilities. Its graphical interface allows users to easily change

various encoder parameters and analyze the encoded bitstream. The software can

emulate packet loss on the bitstream using the Gilbert-Elliott channel model and save the

results in an output file that can later be used in testing network-based video bitstream

transmission. Further, the software is able to accept any user-supplied packet loss

realization, which can come from more sophisticated channel models or even real

measured packet traces.

Beyond integrating open source HEVC encoders and decoders under a common

user interface, two procedures for concealing the effects of packet loss are implemented

in HEVC-SIM. Both are based on utilizing the available CTU data to estimate pixel values

in lost CTUs. The two concealment methods were compared on several test sequences,

and the results showed that they can bring about 5-10 dB gain in PSNR compared to no

concealment. Both of these methods represent simple temporal concealment. In future

work, more sophisticated concealment methods could be implemented in HEVC-SIM. For

example, smaller units could be employed in the concealment, which could allow better

handling of objects and structures that are smaller than CTU. Also, rather than just using

an existing MV from a neighboring unit, a MV predictor could be employed to estimate the

motion of the lost CTU. Finally, spatio-temporal concealment may be able to provide

further improvement.

In other experiments, execution time of HM and x265 encoders was compared and

it was found that x265 is about three orders of magnitude faster. Overall, HEVC-SIM is an

excellent learning tool for HEVC, allowing the users to test and experience various HEVC

features.

31

References

[1] G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, “Overview of the high efficiency video coding (HEVC) standard,” IEEE Trans Circuits Syst. Video Technol., vol. 22, no. 12, pp. 1649–1668, Dec. 2012..

[2] Wikipedia contributors. High Efficiency Video Coding, [Online] Available: http://en.wikipedia.org/wiki/High_Efficiency_Video_Coding, 6 Mar, 2015.

[3] V. Sze and M. Budagavi, Design and Implementation of Next Generation Video Coding Systems (H.265/HEVC Tutorial), ISCAS Tutorial 2014, http://www.rle.mit.edu/eems/wp-content/uploads/2014/06/H.265-HEVC-Tutorial-2014-ISCAS.pdf.

[4] HM1.6 source code, https://hevc.hhi.fraunhofer.de/trac/hevc/browser/tags/HM-16.3, Jan, 2015.

[5] H. Hadizadeh, I. V. Bajić, and G. Cheung, “Video error concealment using a computation-efficient low saliency prior,” IEEE Trans. Multimedia, vol. 15, no. 8, pp. 2099-2113, Dec. 2013.

[6] x265 source code, https://bitbucket.org/multicoreware/x265, Apr, 2014.

[7] PSNR source code, https://code.google.com/p/webm/source/browse/src/psnr.c?repo=vpx-codec-comparison, Mar. 2015.

[8] Wikipedia contributors. Peak signal-to-noise ratio, [Online] Available: http://en.wikipedia.org/wiki/Peak_signal-to-noise_ratio, Accessed Mar. 10, 2015

[9] OpenGL source code, http://trac.unfix.net/browser/yuvplayer/win/yuvplayer, Feb. 2014.

[10] G. Hasslinger and O. Hohlfeld, “The Gilbert-Elliott model for packet loss in real time services on the internet,” in Measuring, Modelling and Evaluation of Computer and Communication Systems (MMB), 2008 14th GI/ITG Conference -, 31 2008-April 2, pp. 1–15.

[11] K. Stuhlmüller, N. Farber, M. Link, and B. Girod, "Analysis of video transmission over lossy channels," IEEE Journal on Selected Areas in Communications, vol. 18, no. 6, pp. 1012-32, June 2000.

[12] H. Hadizadeh and I. V. Bajić, "Burst loss resilient packetization of video," IEEE Trans. Image Processing, vol. 20, no. 11, pp. 3195-3206, Nov. 2011.

32

[13] H. Hadizadeh and I. V. Bajić, "NAL-SIM: An interactive simulator for H.264/AVC video coding and transmission," Proc. IEEE CCNC'10, Las Vegas, NV, Jan. 2010

[14] Cisco Forecast highlights, [Online] Available: http://www.cisco.com/web/solutions/sp/vni/vni_forecast_highlights/index.html, Accessed May 12, 2015

[15] Definition of YUV and RBG, [Online] Available: http://www.pcmag.com/encyclopedia/term/55166/yuv-rgb-conversion-formulas, Accessed Mar 14, 2015

[16] Y.-M. Chen and I. V. Bajić, "Motion vector outlier rejection cascade for global motion estimation," IEEE Signal Processing Letters, vol. 17, no. 2, pp. 197-200, Feb. 2010.

[17] Y.-M. Chen and I. V. Bajić, "A joint approach to global motion estimation and motion segmentation from a coarsely sampled motion vector field," IEEE Trans. Circuits Syst. Video Technol., vol. 21, no. 9, pp. 1316-1328, Sep. 2011.

[18] S. H. Khatoonabadi and I. V. Bajić, "Video object tracking in the compressed domain using spatio-temporal Markov random fields," IEEE Trans. Image Processing, vol. 22, no. 1, pp. 300-313, Jan. 2013.

[19] S. H. Khatoonabadi, N. Vasconcelos, I. V. Bajić, and Y. Shan, “How many bits does it take for a stimulus to be salient?,” Proc. IEEE CVPR’15, pp. 5501-5510, Boston, MA, Jun. 2015.

[20] S. H. Khatoonabadi, I. V. Bajić, and Y. Shan, “Compressed-domain correlates of human fixations in dynamic scenes,” Multimedia Tools and Applications, 2015. http://dx.doi.org/10.1007/s11042-015-2802-3

HEVC-SIM: A simulation and analysis tool for H.265 packet...

Documents

Transcript of HEVC-SIM: A simulation and analysis tool for H.265 packet...