H.265/HEVC video transmission over 4G cellular networks

H.265/HEVC video transmission over4G cellular networks

by

Aman Jassal

Dipl.Ing., Ecole Superieure d’Ingenieurs en Informatique et Genie desTelecommunications, 2008

A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OF

MASTER OF APPLIED SCIENCE

in

The Faculty of Graduate and Postdoctoral Studies

(Electrical and Computer Engineering)

THE UNIVERSITY OF BRITISH COLUMBIA

(Vancouver)

January 2016

c© Aman Jassal 2016

Abstract

Long Term Evolution has been standardized by the 3GPP consortium since

2008, with 3GPP Release 12 being the latest iteration of LTE Advanced,

which was finalized in March 2015. High Efficiency Video Coding has been

standardized by the Moving Picture Experts Group since 2012 and is the

video compression technology targeted to deliver High-Definition video con-

tent to users. With video traffic projected to represent the lion’s share of

mobile data traffic in the next few years, providing video and non-video

users with high Quality of Experience is key to designing 4G systems and

future 5G systems.

In this thesis, we present a cross-layer scheduling framework which de-

livers video content to video users by exploiting encoding features used by

the High Efficiency Video Coding standard such as coding structures and

motion compensated prediction. We determine which frames are referenced

the most within the coded video bitstream to determine which frames have

higher utility for the High Efficiency Video Coding decoder located at the

user’s device and evaluate the performances of best effort and video users

in 4G networks using finite buffer traffic models. We look into throughput

performance for best effort users and packet loss performance for video users

to assess Quality of Experience. Our results demonstrate that there is sig-

ii

Abstract

nificant potential to improve the Quality of Experience of best effort and

video users using our proposed Frame Reference Aware Proportional Fair

scheme compared to the baseline Proportional Fair scheme.

iii

Preface

I hereby declare that I am the author of this thesis. This thesis is an original,

unpublished work under the supervision of Dr. Cyril Leung. In this work,

I played the primary role in designing and performing the research, doing

data analysis and preparing the manuscript under the supervision of Dr.

Cyril Leung.

iv

Table of Contents

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv

Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii

List of Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii

Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii

Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2 Basics of H.265/HEVC . . . . . . . . . . . . . . . . . . . . . . 4

2.1 Syntax Structures and Syntax Elements . . . . . . . . . . . . 4

2.2 Coding Structures and Reference Picture Lists . . . . . . . . 7

2.2.1 Coding Structures . . . . . . . . . . . . . . . . . . . . 8

2.2.2 Reference Picture Lists . . . . . . . . . . . . . . . . . 10

v

Table of Contents

2.3 Motion Compensated Prediction . . . . . . . . . . . . . . . . 13

2.4 Operation with Networking Layers . . . . . . . . . . . . . . . 15

3 Cross-Layer Frame Reference Aware Scheduling Framework 18

3.1 Mathematical Formulation of the Shared Resource Allocation

Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.2 Solution to the proposed Shared Resource Allocation Problem 25

4 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.1 H.265/HEVC Video Content Generation . . . . . . . . . . . 28

4.2 LTE-Advanced System Model . . . . . . . . . . . . . . . . . 30

4.2.1 Network Model . . . . . . . . . . . . . . . . . . . . . 30

4.2.2 Traffic Model . . . . . . . . . . . . . . . . . . . . . . . 34

4.2.3 Channel Model . . . . . . . . . . . . . . . . . . . . . 35

4.2.4 Feedback Model . . . . . . . . . . . . . . . . . . . . . 40

5 Simulation Results and Analysis . . . . . . . . . . . . . . . . 42

5.1 Simulation Assumptions . . . . . . . . . . . . . . . . . . . . . 43

5.2 Simulation Results and Discussion . . . . . . . . . . . . . . . 48

5.2.1 Results for video users . . . . . . . . . . . . . . . . . 49

5.2.2 Results for Best Effort users . . . . . . . . . . . . . . 54

6 Conclusions and Future Work . . . . . . . . . . . . . . . . . . 60

6.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 60

6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

vi

List of Tables

2.1 Generic NAL unit syntax, adapted from [3] . . . . . . . . . . 5

2.2 Reference Picture Sets for the Hierarchical-B Coding Struc-

ture of GOP-size 8 . . . . . . . . . . . . . . . . . . . . . . . . 11

2.3 Reference Picture Lists for the Hierarchical-B Coding Struc-

ture of GOP-size 8 . . . . . . . . . . . . . . . . . . . . . . . . 13

4.1 H.265/HEVC Table of Video Test Sequences . . . . . . . . . 28

4.2 H.265/HEVC Parameters . . . . . . . . . . . . . . . . . . . . 30

4.3 FTP Traffic Model 1 . . . . . . . . . . . . . . . . . . . . . . . 33

4.4 H.265/HEVC Traffic Model . . . . . . . . . . . . . . . . . . . 35

5.1 LTE-Advanced Parameters . . . . . . . . . . . . . . . . . . . 46

5.2 Offered Load and corresponding Resource Utilization . . . . . 49

vii

List of Figures

2.1 Frame dependencies in the reference coding structure. . . . . 9

2.2 Uni- and bi-predictive inter-prediction illustration from adja-

cent pictures, adapted from [4] . . . . . . . . . . . . . . . . . 14

2.3 RTP Single NAL unit packet structure . . . . . . . . . . . . . 16

2.4 H.265/HEVC system layer stack . . . . . . . . . . . . . . . . 17

4.1 Hexagonal Network Grid Layout . . . . . . . . . . . . . . . . 31

4.2 Wrap Around of Hexagonal Network . . . . . . . . . . . . . . 32

4.3 LTE Downlink PRB allocation illustration . . . . . . . . . . . 33

5.1 Video users’ active download time . . . . . . . . . . . . . . . 50

5.2 Satisfied Video User Percentage . . . . . . . . . . . . . . . . . 51

5.3 CRA LDU Loss Ratio . . . . . . . . . . . . . . . . . . . . . . 53

5.4 Average throughput for Best Effort users . . . . . . . . . . . 55

5.5 Coverage throughput for Best Effort users . . . . . . . . . . . 56

5.6 Illustration of the outer 10% of the coverage area . . . . . . . 57

5.7 Average BE user throughput in Cell-Edge region . . . . . . . 58

viii

List of Acronyms

3GPP Third Generation Partnership Project.

ADT Active Download Time.

BE Best Effort.

CB Coding Block.

CDF Cumulative Distribution Function.

CQI Channel Quality Indicator.

CSI Channel State Information.

CVS Coded Video Sequence.

DASH Dynamic Adaptive Streaming over HTTP.

EESM Exponential Effective SNR Mapping.

FDD Frequency Division Duplex.

GOP Group of Pictures.

ix

List of Acronyms

H.264/AVC Advanced Video Coding.

H.265/HEVC High Efficiency Video Coding.

HTTP Hypertext Transfer Protocol.

IETF Internet Engineering Task Force.

IP Internet Protocol.

ITU-R International Telecommunications Union Radiocommunications Sec-

tor.

JCT-VC Joint Collaborative Team on Video Coding.

KPIs Key Performance Indicators.

LDU Logical Data Unit.

LTE Long Term Evolution.

LTE-A LTE Advanced.

MANE Media Aware Network Element.

MIESM Mutual Information Effective SNR Metric.

MIMO Multiple Input Multiple Output.

MOS Mean Opinion Score.

MPEG Moving Picture Experts Group.

x

List of Acronyms

MU-MIMO Multi User Multiple Input Multiple Output.

NAL Network Abstraction Layer.

NGMN Next Generation Mobile Networks.

OFDMA Orthogonal Frequency Division Multiple Access.

OSI Open Systems Interconnection.

PB Prediction Block.

PLR Packet Loss Ratio.

PMI Precoding Matrix Indicator.

POC Picture Order Count.

PRB Physical Resource Block.

QAM Quadrature Amplitude Modulation.

QoE Quality of Experience.

QoS Quality of Service.

QPSK Quaternary Phase Shift Keying.

RBSP Raw Byte Sequence Payload.

RI Rank Indication.

RTP Real Time Protocol.

xi

List of Acronyms

RU Resource Utilization.

SINR Signal to Interference and Noise Ratio.

SNR Signal to Noise Ratio.

SRST Single RTP stream on a single media transport.

SU-MIMO Single User Multiple Input Multiple Output.

TCP Transmission Control Protocol.

UDP User Datagram Protocol.

UMTS Universal Mobile Telecommunications System.

VCL Video Coding Layer.

Wi-Fi Wireless Fidelity.

xii

Acknowledgements

I would like to take this opportunity to express my utmost gratitude and

sincerest thanks to my supervisor, Dr. Cyril Leung, who has given me great

support, encouragement and guidance throughout my work and my M.A.Sc

program. My discussions with him were a constant source of inspiration

and his insights helped make this research work more valuable. Without his

invaluable knowledge and understanding in this research area, this thesis

would have never been possible.

I would also like to thank Dr. Ahmed Saadani for his guidance and

support throughout my engineering program and at Orange Labs where he

gave me the opportunity to do research work on 4G systems. My former

colleagues, Mr. Sebastien Jeux and Dr. Sofia Martinez Lopez, and more

generally all the research community involved in research and standardiza-

tion with the 3GPP, have had a great influence on me and without their

inspiration I would have never undertaken my program at the University of

British Columbia.

All of the work that has been done in this thesis was supported in part by

the Natural Sciences and Engineering Research Council (NSERC) of Canada

under Grant RGPIN 1731-2013.

xiii

Dedication

To my parents and my sister

xiv

Chapter 1

Introduction

With the emergence of Long Term Evolution (LTE) and its subsequent it-

erations standardized by the Third Generation Partnership Project (3GPP)

consortium, video services are fast becoming the dominant data services

in 4G mobile networks and mobile video traffic is projected to account for

72% of the total mobile data traffic by 2019 [1]. The transmission of video

services over cellular networks is challenging due to the large bandwidth

requirement, the low latency required due to protocol stack inter-operation

and the effect of error propagation within the video sequence in the event

of packet losses. The current dominant standard for video coding is Ad-

vanced Video Coding (H.264/AVC) [2] and is used to deliver a wide range

of video services. However, H.264/AVC requires extremely high bandwidth,

making the delivery of High-Definition (HD) video services impractical. Its

successor, High Efficiency Video Coding (H.265/HEVC) [3], was standard-

ized by the Moving Picture Experts Group (MPEG) in 2012 and is expected

to reduce the bit rate compared to H.264 High Profile by about 50% while

maintaining comparable subjective quality [4]. Therefore H.265/HEVC is a

more practical choice for delivering HD and Ultra High-Definition (UHD)

video content to consumers using wired and wireless networks.

1

Chapter 1. Introduction

As we move towards 5G, one of the key targets that we need to achieve

is to provide a more consistent user experience across the whole network

as well as higher Quality of Experience (QoE) [5]. Cross-layer QoE-aware

resource allocation schemes have been proposed for Orthogonal Frequency

Division Multiple Access (OFDMA) systems [6], where the scheduling al-

gorithm uses the Mean Opinion Score (MOS) as a way to provide QoE.

Other attributes that the research community has been focusing on in order

to improve the QoE of video users are the playback buffer status and the

rebuffering time [7]-[8]. One of the limitations in these works is the reliance

on video traces that were generated for low-definition video sequences en-

coded using H.264/AVC, which are not representative of the targets that

5G networks are supposed to satisfy. Rather they are aimed at delivering

HD or UHD video services anywhere anytime. Other works have considered

H.265/HEVC video streaming over Wi-Fi wireless networks and shown that

the QoE of video sequences, reflected through the use of MOS, is very sen-

sitive to network impairments such as packet losses. Nightingale et al. [9]

assumed that packet losses are random; however in cellular networks this

assumption is rarely valid as the combination of traffic load, the characteris-

tics of the video sequence and the individual user’s link quality will dictate

the overall performance that can be achieved.

In this thesis, we focus on the use-case of H.265/HEVC video trans-

mission over 4G networks. Existing works have not used the compression

properties of H.265/HEVC, specifically in terms of exploiting the tempo-

ral inter-dependence between frames within coding structures, or evaluated

how well video services can be delivered in 4G/beyond-4G networks with

2

Chapter 1. Introduction

dynamic user arrivals. We use performance evaluation methodologies which

use Key Performance Indicators (KPIs) that have been recommended by the

Next Generation Mobile Networks (NGMN) Alliance for 5G networks [5].

The main novel contributions of this thesis are as follows:

1. The definition of a cross-layer scheduling framework exploiting frame

referencing to deliver video content

2. The evaluation of capacity for the delivery of H.265/HEVC video ser-

vices over beyond-4G networks

3. The joint-assessment of the QoE of video users and Best Effort users

The remainder of this thesis is organized as follows. Chapter 2 out-

lines the basics of the H.265/HEVC standard that are relevant to this work.

Chapter 3 presents the proposed cross-layer scheduling framework for video

content transmission. The simulation model is presented in Chapter 4. Sim-

ulation results, analysis and discussions are provided in Chapter 5. Conclu-

sions and future work are presented in Chapter 6.

3

Chapter 2

Basics of H.265/HEVC

In this chapter, we describe the features of the H.265/HEVC standard that

are directly relevant to this thesis and to the problem formulation that

will be presented and developed in Chapter 3. Specifically, we present the

high-level syntax used to represent the video data, the motion prediction

techniques used for video compression and the coding structures and refer-

ence picture lists used to perform the motion-predicted compensation task in

H.265/HEVC [3]. The main point to understand is that the encoder knows

about the specifics of the coding structure and it has to provide the decoder

about the information needed to reconstitute it. This is done through using

a given coding order (which is implicitly embedded in the way LDUs are

ordered) and through using Reference Picture Sets and Reference Picture

Lists (the former are explicitly transmitted and the latter are derived dur-

ing the decoding process). In this chapter we will explain how all of these

features work.

2.1 Syntax Structures and Syntax Elements

H.265/HEVC uses so-called syntax structures to represent the encoded video

data. An H.265/HEVC encoder generates syntax structures encapsulated

4

2.1. Syntax Structures and Syntax Elements

Table 2.1: Generic NAL unit syntax, adapted from [3]nal unit(NumBytesInNalUnit) {

forbidden zero bitnal unit typenuh layer idnuh temporal id plus1NumBytesInRbsp=0for(i=2; i < NumBytesInNalUnit; i++)

if(i+2 < NumBytesInNalUnit && next bits(24) == 0x000003) {rbsp byte[NumBytesInRbsp++]rbsp byte[NumBytesInRbsp++]i+=2emulation prevention three byte /* equal to 0x03 */} else

rbsp byte[NumBytesInRbsp++]}

inside logical data units called Network Abstraction Layer (NAL) units.

An H.265/HEVC decoder decapsulates NAL units and consumes syntax

structures to reconstitute a given picture1. The sequence of NAL units

can be viewed as a text written in a specific language with a syntax and

semantics that the decoder can read and understand. The syntax is the set

of words the decoder knows and the semantics tells the decoder how the

syntax is to be used. The information conveyed by the combination of the

syntax and the semantics is recovered through the decoding process, which

is fully specified in [3].

Table 2.1 illustrates the syntax structure of a generic NAL unit and the

syntax elements it carries, syntax elements are highlighted in bold. Syntax

elements have associated descriptors which are used for parsing purposes

but these are not covered in this thesis and the interested reader is in-

1In this thesis, we will interchangeably use the terms ”Picture” and ”Frame”.

5

2.1. Syntax Structures and Syntax Elements

vited to refer to [4] (Chapter 5) for more details. Every NAL unit carries

NumBytesInNalUnit bytes, which further breaks down into a 16-bit header

made of 4 syntax elements and a payload which is the Raw Byte Sequence

Payload (RBSP) data structure, carrying NumBytesInRbsp bytes. The

first syntax element is the forbidden zero bit (forbidden zero bit). The

second syntax element is nal unit type, which is written over 6 bits and

carries the type of the RBSP contained in the NAL unit. The values that

it can take are specified in Table 7-1 of [3], NAL unit types belong either to

Video Coding Layer (VCL) or non-VCL. VCL types comprise all NAL units

that contain coded video data whereas non-VCL types contain parameter

information. The third syntax element is the layer identifier, nuh layer id,

which is written over 6 bits. Its value is always 0 although other values

can be specified by future recommendations of ITU-T that relate to future

scalable or 3D video coding extensions of [3]. The fourth and final syntax

element of the header is the temporal identifier, nuh temporal id plus1,

which is written over 3 bits. Its value is typically 1, which means that there

is only one temporal layer. We assume that this is the case throughout the

thesis. The temporal identifier for the NAL unit, TemporalID, is obtained

as:

TemporalID = nuh temporal id plus1− 1 (2.1)

The payload of NAL units is the RBSP, denoted as the rbsp byte

syntax element, where rbsp byte contains NumBytesInRBSP bytes and

rbsp byte[i] is the ith byte of the RBSP. Because there are various types of

NAL units, the RBSP itself can be viewed as a syntax structure carrying syn-

6

2.2. Coding Structures and Reference Picture Lists

tax elements. For each nal unit type, the H.265/HEVC standard provides

the description of the associated syntax structure. For instance, the RBSP

of a Video Parameter Set has a dedicated syntax structure (Section 7.3.2.1

of [3]), the RBSP of a Clean Random Access NAL unit has a dedicated

syntax structure further broken into a slice segment header, a slice segment

data and trailing bits (Section 7.3.2.9 of [3]), etc. In order to guarantee

that every NAL unit has a unique start identifier byte, the H.265/HEVC

standard uses dedicated bytes called emulation prevention three byte.

During the decoding process, this byte is usually discarded. In this thesis,

we assume that a bitstream is only made of generic VCL NAL units and

from this point onwards, a NAL unit will be referred to as Logical Data Unit

(LDU).

2.2 Coding Structures and Reference Picture

Lists

An H.265/HEVC bitstream is made up of several entities called Coded Video

Sequence (CVS). A CVS is the coded representation of a sequence of pictures

which can be decoded using pictures within that sequence. Similarly, a coded

picture is the coded representation of a picture, which typically consists of

multiple LDUs. A coded picture is embedded in a so-called access unit which

contain all the LDUs associated with that picture. In this section we will

present some of the tools used by the H.265/HEVC standard for motion

compensated prediction: coding structures and reference picture lists.

7


2.2.1 Coding Structures

H.265/HEVC relies on temporal coding structures to perform its video com-

pression task. A coding structure designates a set of consecutive pictures

with clearly defined dependencies between pictures and a given coding or-

der. The purpose of having pictures depend on others is for prediction, which

can be done from one picture or two pictures (called uni-prediction and bi-

prediction respectively). Coding structures define a coding order, which is

different from the output order: the coding order is the order in which pic-

tures are encoded while the output order is the order in which pictures are

displayed on the screen. Because of this difference, the H.265/HEVC stan-

dard uses a Picture Order Count (POC) to uniquely identify a given picture

in output order. From this point onwards and for the sake of convenience,

we will refer to the picture whose POC is equal to n as pocn.

The definition of a coding structure bears a strong similarity to that of a

Group of Pictures (GOP) in H.264/AVC. In earlier video compression stan-

dards such as H.264/AVC, a GOP designates a set of consecutive pictures

with clearly defined dependencies where the first picture is an intra-coded

picture (or equivalently an I-Frame). The difference between a GOP and

a coding structure is that the first picture in a coding structure does not

have to be an I-Frame. Basically, the pictures that belong to a coding struc-

ture only reference other pictures within the coding structure for prediction

purposes. In this case, the coding structure is called a closed GOP. The

H.265/HEVC standard also allows cases where a picture within a coding

structure references a picture from another coding structure, in which case

8


Figure 2.1: Frame dependencies in the reference coding structure.

the coding structure is called an open GOP. Throughout this chapter, we

will use the hierarchical-B coding structure that was used by the Joint Col-

laborative Team on Video Coding (JCT-VC) for the Main Profile Random

Access encoder configuration as described in [10]. All figures and tables

will refer to that specific coding structure. For simplicity, throughout the

remainder of this thesis, we will refer to this coding structure simply as the

reference coding structure.

9


Fig. 2.1 depicts four illustrations of frame dependencies in the reference

coding structure. Referenced pictures are denoted by a (*) and arrows point

from the referenced picture to denote all direct dependent pictures. Depen-

dent pictures can be either before or after the referenced picture in display

order. The reference coding structure is actually an open GOP coding struc-

ture and by design it operates with a GOP size of 8. We can see the open

side of the reference coding structure in Fig. 2.1 on the examples where poc0,

poc4 and poc6 are the referenced pictures. They are referred by pictures be-

yond the GOP size: poc0, poc4 and poc6 are all referenced by poc16. The

reference coding structure uses I-Frames and B-Frames. The coding order

of this coding structure is defined as {pocn, pocn−4, pocn−6, pocn−7, pocn−5,

pocn−2, pocn−3, pocn−1}. poc0 is a special case and constitutes a GOP on

its own since there are no pictures before poc0. Using this definition, we

can easily identify that after poc0, the next GOP is comprised of {poc8,

poc4, poc2, poc1, poc3, poc6, poc5, poc7}. The reference coding structure is

then be applied periodically on the succeeding pictures throughout the video

sequence. The encoder can change the coding structure if it yields better

performance but we assume that it remains unchanged throughout the en-

coding of a video sequence. The decoder at the receiver side will extract the

information regarding the referenced pictures from Reference Picture Lists,

which we describe in the next section.

2.2.2 Reference Picture Lists

Coding structures specify the coding order and the dependencies between a

given set of pictures. The decoder does not have any knowledge about the

10


Table 2.2: Reference Picture Sets for the Hierarchical-B Coding Structureof GOP-size 8

Reference Picture Set Reference POCs

0 pocn−8, pocn−10, pocn−12, pocn−161 pocn−4, pocn−6, pocn+4

2 pocn−2, pocn−4, pocn+2, pocn+6

3 pocn−1, pocn+1, pocn+3, pocn+7


5 pocn−2, pocn−4, pocn−6, pocn+2


7 pocn−1, pocn−3, pocn−7, pocn+1

coding structure that was used by the encoder, it must derive this informa-

tion from the LDUs that carry the encoded video data. In this section, we

explain how the encoder transmits the information regarding the dependen-

cies between pictures.

At the receiver end, as a picture gets decoded, it is either displayed on

the screen or stored in the Decoded Picture Buffer until it is eventually

output. Any picture located in the Decoded Picture Buffer can be reused as

reference for prediction. Pictures that are available for inter prediction are

listed in a so-called Reference Picture Set. The Reference Picture Set is sent

in the Sequence Parameter Set and each picture indexed in there is explicitly

identified using its POC value. Table 2.2 lists the different Reference Picture

Sets defined for the reference coding structure that was used by the JCT-VC

for the Main Profile Random Access encoder configuration as described in

[10]. Eight Reference Picture Sets are defined and for a given picture pocn,

the corresponding referenced POCs are given. Since poc0 is the first POC

of a video sequence, there can be no negative POC, therefore if poci with

11


i < 0 were to be in a Reference Picture Set, the picture would simply not

be included.

The LDUs of a given picture carry a header that specifies which Reference

Picture Set to activate. H.265/HEVC uses two Reference Picture Lists for

inter prediction, called List0 and List1. The decoder reconstructs these

lists from the Reference Picture Sets that were supplied in the Sequence

Parameter Set and this process is specified in Section 8.3.4. of [3]. The

main difference between a Reference Picture Set and a Reference Picture

List is that a Reference Picture List is a subset of the Reference Picture

Set which is actually used for inter prediction. For uni-predicted frames

(P-Frames) only List0 is activated while for bi-predicted frames (B-Frames)

both List0 and List1 are activated. Motion compensated prediction is then

performed using the activated lists. The resulting prediction can be either

made from one picture only or a combination of pictures. Using these lists,

the hierarchy between pictures can be recovered. Table 2.3 depicts the

hierarchical-B coding structure of size 8 that was used by the JCT-VC for

the Main Profile Random Access encoder configuration as described in [10].

This is the reference coding structure that we use throughout this thesis for

all our video sequences. For each picture, we provide the Reference Picture

Set that is used and the POCs of the pictures in the Reference Picture

Lists. The first picture of a coded video sequence is usually an I-Frame

and I-Frames do not use Inter Prediction. Therefore it does not have any

associated Reference Picture Set and its associated Reference Pictures Lists

are empty. poc8 and poc16 both use the same Reference Picture Set, however

for poc8, three of the pictures do not exist therefore poc8 only references

12

2.3. Motion Compensated Prediction

Table 2.3: Reference Picture Lists for the Hierarchical-B Coding Structureof GOP-size 8

POC RPS used List0 POCs List1 POCs

0 - N/A N/A

8 0 0 0

4 1 0, 8 8, 0

2 2 0, 4 4, 8

1 3 0, 2 2, 4

3 4 2, 0 4, 8

6 5 4, 2 8, 4

5 6 4, 0 6, 8

7 7 6, 4 8, 6

16 0 8, 6, 4, 0 8, 6, 4, 0

12 1 8, 6 16, 8

10 2 8, 6 12, 16

9 3 8, 10 10, 12

... ... ... ...

poc0. By combining the information in Table 2.2 and Table 2.3, one can

easily reconstitute the direct dependencies that we illustrated earlier in Fig.

2.1 for the reference coding structure.

2.3 Motion Compensated Prediction

There are two types of prediction used in video compression: Intra(-frame)

Prediction and Inter(-frame) Prediction. Intra prediction is used for intra-

coded frames (I-Frames) whereas inter prediction is used for all other frames,

which can be uni-predicted frames (P-Frames) or bi-predicted frames (B-

Frames). Inter prediction in H.265/HEVC relies on Motion Compensated

Prediction in order to perform efficient compression. The main idea be-

hind inter prediction is that a given picture uses another picture as ref-

13

2.3. Motion Compensated Prediction

Figure 2.2: Uni- and bi-predictive inter-prediction illustration from adjacentpictures, adapted from [4]

erence, searches for the block in that reference picture that best matches

the predicted area and encodes the information of the motion of that block

between both pictures. In H.265/HEVC, a given picture may use one or

two pictures as reference for inter prediction. Fig. 2.2 illustrates the con-

cept of uni-predictive and bi-predictive inter prediction. This is achieved

using the coding structures that we introduced in Section 2.2.1. poc does

uni-prediction from picture poc− 2 and does bi-prediction from its adjacent

pictures poc − 1 and poc + 1. Note that bi-prediction does not require the

pictures to be adjacent to poc, one CB from poc uses poc− 2 and poc− 1 for

bi-prediction.

The H.265/HEVC standard operates on a block-basis. The most basic

block used in H.265/HEVC is called a Coding Block (CB). Each picture is

partitioned into multiple CBs. Each CB is further partitioned into smaller

blocks called Prediction Block (PB). After the picture has been partitioned

into PBs, the encoder will then perform prediction on a PB-basis from the

reference pictures whose POCs are given in the Reference Picture Lists.

14

2.4. Operation with Networking Layers

The encoder will look through the reference pictures for the same area as

the one in the PB on a PB-basis using a rate-distortion criterion. Once

it finds the area which presents the lowest amount of rate-distortion, it

encodes the information of the shift as the tuple of the motion vector and

the reference picture’s POC. The motion vector is the shift between the

area corresponding to the PB and the area in the reference picture which

presented the lowest amount of rate-distortion. The basic idea behind rate-

distortion optimization is that the encoder looks for the best possible coding

mode that reduces the loss of video quality, i.e. the distortion, and the

required bit rate to encode that area, i.e. the rate. It is beyond the scope

of this thesis to delve into rate-distortion algorithms and their specifics and

the interested reader is invited to refer to [11] and to [4] (Chapter 2) for

more details on the application of rate-distortion in video compression.

2.4 Operation with Networking Layers

Video compression techniques such as H.264/AVC and H.265/HEVC operate

at the Application layer, which sits at the highest level in the Open Sys-

tems Interconnection (OSI) model [12]. The encoder generates LDUs which

are then sent to the lower layers for transmission over packetized networks

based on the Internet Protocol (IP). One of the commonly used solutions

for delivering video content over IP networks is to use the Real Time Proto-

col (RTP). The Internet Engineering Task Force (IETF) has formulated the

RFC 6184, which details the operation of RTP for delivering H.264/AVC

content [13]. Similarly the IETF has formulated a draft RFC for the op-

15


eration of RTP for delivering H.265/HEVC content [14]. We will look into

the specifics of RTP operation for delivering H.265/HEVC content. In this

thesis, we assume that that for all users we have a Single RTP stream on a

single media transport (SRST) and all LDUs are sent in RTP packets that

use the Single NAL unit packet structure. Fig. 2.3 shows the structure of

such an RTP packet. The PayloadHdr field is the bit-exact copy of the LDU

header, the DONL field is optional and carries the 16 least significant bits

of the Decoding Order Number. We assume that this field does not exist.

The NAL unit payload data field is the payload of the LDU and the last

field is also optional and included for the purpose of padding. We assume

that all RTP packets have a padding field occupying 10 bytes. Given that

the RTP specification for H.265/HEVC is still at a draft-level at the time of

writing, we allow ourselves to make some modifications and introduce a new

field in the Single NAL unit packet structure: the RefCount field. Since the

encoder knows exactly what coding structure is used to compress a video

sequence, it can also keep track of the number of times a given picture is

referenced within the video sequence and propagate that information to the

Figure 2.3: RTP Single NAL unit packet structure

16


Figure 2.4: H.265/HEVC system layer stack

RTP packets. We assume that the RefCount field occupies 2 bytes.

For live streaming services, RTP is used in conjunction with the User

Datagram Protocol (UDP) to supply packets to IP. Another solution that

has been developed for buffered streaming services by MPEG is Dynamic

Adaptive Streaming over HTTP (DASH). DASH performs video streaming

over the Hypertext Transfer Protocol (HTTP) using adaptive bit rate and is

codec-agnostic. Since this solution is based on HTTP, packets are supplied

to IP using the Transmission Control Protocol (TCP). IP packets can then

be supplied to different wireless access technologies, such as LTE or Wireless

Fidelity (Wi-Fi). Fig. 2.4 gives an illustration of how the protocol stacks

are set up. In this thesis, we will focus on using video streaming services

to cellular users. We assume the use of RTP and UDP to supply packets

over IP, using the modified Single NAL unit packet structure for the RTP

payload, and LTE-A as the air interface.

17

Chapter 3

Cross-Layer Frame Reference

Aware Scheduling

Framework

In the previous chapter, we presented some of the features of the H.265/HEVC

standard that are relevant for video compression. We presented coding

structures, syntax structures and syntax elements, which are used to en-

code video content. We also presented motion compensated prediction for

more bandwidth-efficient encoding and reference picture lists for helping the

decoder track which pictures to use as reference when doing motion predic-

tion. Using these features, we define a cross-layer scheduling framework

that exploits these features and delivers video content based on their de-

pendencies between each-other. In this chapter, we propose a mathematical

formulation of the shared resource allocation problem for delivering video

content and derive the optimal solution to this problem.

18

3.1. Mathematical Formulation of the Shared Resource Allocation Problem

3.1 Mathematical Formulation of the Shared

Resource Allocation Problem

Let us consider S to be the set of users actively sharing resources. Let

us consider a user k and let the channel capacity of user k for time-slot n

be denoted by Ck(n). Kelly [15] has provided a mathematical formulation

of the shared resource allocation problem, which has been widely used by

the research community for tackling rate control problems in communication

networks. This shared resource allocation problem, which we will call SRAP,

is formulated as the following constrained optimization problem and solved

at the beginning of every time-slot n.

SRAP:

maximize F (~r(n)) ,∑k∈S

Uk(rk(n)) (3.1)

subject to rk(n) < Ck(n), rk(n) ≥ 0, k ∈ S (3.2)

F is the objective function that we are trying to maximize, Uk(rk(n)) denotes

the utility function of user k and rk(n) is the average throughput of user k

up to time-slot n. Constraint (3.2) ensures that the rate of the user does

not exceed the channel capacity Ck(n) that user k is experiencing during

time-slot n. Under the assumptions that the objective function F in (3.1)

is strictly concave and differentiable and that the feasible region in (3.2) is

compact, we know from Nonlinear Programming Theory [16] that an optimal

solution exists for SRAP and Kelly has provided an explicit optimal solution

to this problem using Lagrangian methods [15].

19


In wireless networks, the channel capacity and the number of users ac-

tively sharing resource varies with time. This is due to the random nature

of the wireless channel and the network’s traffic. As a result, the optimal

solution to SRAP also varies with time. Hosein [17] proposed a solution

to SRAP by observing that finding the optimal solution consists in finding

the user which maximizes the gradient of the objective function. Hosein

developed his solution by introducing update equations using exponential

smoothing filters in order to keep track of each user’s throughput, whose

expression is given as follows

rk(n+ 1) =

(1− 1

τ)rk(n) +

dk(n)

τif user k is served,

(1− 1

τ)rk(n) otherwise.

(3.3)

dk(n) is the throughput of user k estimated for time-slot n in bits per sec-

ond. τ > 1 is the time constant of the exponential smoothing filter. rk(n)

is the average throughput of user k up to time-slot n. Because the objec-

tive function is strictly concave, Hosein showed that all we need to find is

the direction, i.e. the user, which maximizes the gradient of the objective

function. If we denote this user as user k∗ then

k∗ = argmaxk∈S

{∇F (~r)}. (3.4)

As an example, if the utility function Uk of each user k is defined as the

logarithmic function of the rate of that user log(rk), then the maximum

gradient direction, i.e. the user maximizing the gradient function, is given

20


by:

k∗ = argmaxk∈S

{dk(n)

rk(n)

}(3.5)

(3.5) is the well-known Proportional Fair metric, widely used for scheduling

in cellular networks such as Universal Mobile Telecommunications System

(UMTS) and LTE. An alternate way of finding this result is as follows2. The

utility function Uk of each user k is defined as the logarithmic function of

the rate of user k and we know how the rate of each user is computed. Let

us assume that user i is selected at time-slot n, the new utility value will be

∑k∈Sk 6=i

log((1− τ−1)rk(n)) + log((1− τ−1)ri(n) + τ−1di(n)). (3.6)

By adding and subtracting log((1− τ−1)ri(n) in Eq. (3.6), the sum will be

performed for all users and Eq. (3.6) then becomes

∑k∈S

log((1− τ−1)rk(n)) + log

((1− τ−1)ri(n) + τ−1di(n)

(1− τ−1)ri(n)

). (3.7)

After some simplifications, Eq. (3.7) eventually boils down to

∑k∈S

log((1− τ−1)rk(n)) + log

(1 +

1

(τ − 1)

di(n)

ri(n)

). (3.8)

From Eq. (3.8), it is obvious to see that the overall utility is maximized if

user i maximizes di(n)ri(n)

, which is the Proportional Fair metric. Hosein [17]

also proposed the use of barrier methods in order to account for Quality of

Service (QoS) constraints. In nonlinear programming, barrier methods are

2The author of this simple and elegant proof is Dr. Cyril Leung.

21


used on optimization problems in order to force the solutions to remain in

the interior of the feasibility region. Another alternative to barrier methods

are penalty methods, which forces the solutions to remain in a certain area

of the feasibility region by imposing large penalties to solutions that lie

outside of that area. In this thesis, we propose to use barrier functions in

order to deliver video content by exploiting frame references. For a detailed

discussion of penalty and barrier methods, the interested reader is invited

to refer to Chapter 13 of [16].

In order to deliver video content, we extend the formulation of SRAP

to account for frame reference awareness and call this new problem SRAP-

FRA. We introduce a new constraint on the frame reference count of user k,

ck(n), to account for the fact that the network does not hold transmission

queues of infinite size. This also prevents the scenario where a video user

watches an infinitely long video sequence. This aspect is modelled through

Finite-Buffer traffic models and these will be discussed in greater detail in

Chapter 4. Just like SRAP, SRAP-FRA is also solved at the beginning of

every time-slot n. The expression of SRAP-FRA is as follows.

SRAP-FRA:

maximize F (~r(n),~c(n)) ,∑k∈S

Uk(rk(n), ck(n)) (3.9)

subject to rk(n) < Ck(n), rk(n) ≥ 0. (3.10)

ck(n) < C ′k(n), ck(n) ≥ 0. (3.11)

C ′k(n) is the constraint on the number of frame references the transmission

22


queue of user k can hold at any given time-slot n, ck(n) is the average number

of frame references that user k has been receiving up to time-slot n and

Uk(rk(n), ck(n)) is the combined utility function of user k that we introduce

for our frame reference aware scheduling framework. For our scheduling

framework, we need to track for each user whether its transmission queue

is holding any frame that is referenced within the video sequence user k is

watching and take any decision based on that. Essentially, we are building

a scheduling framework where users watching video content get sent content

that the decoder needs to perform its task as efficiently as possible and by

incurring as little delay as possible in playback. To that end, we use barrier

functions and express the combined utility function for each user k as

Uk(rk(n), ck(n)) = Uk,1(rk(n)) + Uk,2(ck(n)), (3.12)

where

Uk,1(rk(n)) , log(rk(n)), Uk,2(ck(n)) , −λ exp(−µ(ck(n)− cmin)). (3.13)

In (3.13), Uk,2 is a generalized expression of a barrier function, λ and µ are

positive-valued parameters for adjusting the penalty for leaving the feasible

region. Hosein [17] has proposed the use of such functions for delivering

QoS though there is no indication in the literature to suggest that this type

of approach is the most optimal way of accounting for QoE constraints.

Other approaches and methodologies should definitely be investigated for

addressing such issues. Our motivation for using a barrier function based

23


approach is to provide a simple scheduling framework.

In parallel to the update equation of the rate of user k, we also introduce

an exponentially smoothed update equation for keeping track of the frame

reference count of user k.

ck(n+ 1) =

(1− 1

T)ck(n) +

tk(n)

Tif user k is served,

(1− 1

T)ck(n) otherwise,

(3.14)

where ck(n) is the frame reference count of user k at the beginning of time-

slot n, cmin is the minimum number of frame references that we force the

system to provide to each video user, T > 1 is the time constant of the

exponential smoothing filter and tk(n) is the number of frame references

being transmitted to user k at time-slot n. Due to the assumptions that we

made regarding the proposed combined utility function, the formulation of

SRAP-FRA can be rewritten as

SRAP-FRA:

maximize F (~r(n),~c(n)) ,∑k∈S

(Uk,1(rk(n)) + Uk,2(ck(n))

)(3.15)

subject to rk(n) < Ck(n), rk(n) ≥ 0. (3.16)

ck(n) < C ′k(n), ck(n) ≥ 0. (3.17)

This is the cross-layer scheduling framework that we propose to solve in this

thesis and for which we derive a solution in the following section.

24

3.2. Solution to the proposed Shared Resource Allocation Problem

3.2 Solution to the proposed Shared Resource

Allocation Problem

In this section, we are going to derive the solution to the proposed optimiza-

tion problem SRAP-FRA (3.15). We need to find the user that maximizes

the gradient of the objective function. Since we constructed our combined

utility function as the sum of two separate utility functions (3.12), maxi-

mizing the combined utility can be written as maximizing∑

k∈S Uk,1(rk(n))

and∑

k∈S Uk,2(ck(n)) individually. We already know the solution to the

maximization of the sum of the first utility function Uk,1(rk(n)). We will

focus on deriving the solution to the maximization of the sum of the second

utility function Uk,2(ck(n)). Let us call j the user selected to be served at

time-slot n by the network. Let us call β the parameter with which we

parameterize the movement of the sum of the second utility functions in the

direction of serving user j. The objective function F can then be written

as:

Fj,2(β) = Uj,2(cj(n) + β(cj(n+ 1)− cj(n)))+∑k∈Sk 6=j

Uk,2(ck(n) + β(ck(n+ 1)− ck(n))) (3.18)

25


User j is served and all other users are not. Given the update equations of

the frame reference count (3.14), (3.18) simplifies to

Fj,2(β) = Uj,2(cj(n) + βtj(n)− cj(n)

T)+∑

k∈Sk 6=j

Uk,2(ck(n)− β ck(n)

T). (3.19)

Taking the partial derivative of Fj,2 with respect to β and setting β to 0, we

get:

∂Fj,2∂β

=tj(n)− cj(n)

TU ′j,2(cj(n))−

∑k∈Sk 6=j

ck(n)

TU ′k,2(ck(n)). (3.20)

Eq. (3.20) can be rewritten as:

∂Fj,2∂β

=tj(n)

TU ′j,2(cj(n))−

∑k∈S

ck(n)

TU ′k,2(ck(n)). (3.21)

Since we are looking to maximize∂Fj,2

∂β , we can ignore the second term of

(3.21) as this term is a sum which is common to all users in the network. We

also know the expression of Uk,2, so the expression of the maximum gradient

direction is

k∗ = argmaxk

{λµ

Ttk(n) exp(−µ(ck(n)− cmin))

}. (3.22)

Essentially, this means that the system will maximize the utility of users

by prioritizing the transmission of referenced frames ahead of unreferenced

frames. As we saw in Section 2.2.1, there is a clear hierarchy in the way

26


frames depend upon each-other in video sequences. If a video user is pro-

vided frames which the decoder can always decode or if the decoder does not

have to wait for other frames before being able to decode those frames, then

video users can watch video sequences with no perceptible delay and this

will enhance the Quality of Experience of video users. This sort of procedure

helps counter error propagation within the video decoding process, therefore

the proposed cross-layer scheduling framework can be seen as a form of error

resilience. Using (3.5), (3.12) and (3.22), the final expression of the metric

for the proposed scheduling framework (3.15) can then be expressed as:

dk(n)

rk(n)+λµ

Ttk(n) exp(−µ(ck(n)− cmin)). (3.23)

For the rest of this thesis, we shall refer to our proposed scheduling scheme

as Frame Reference Aware Proportional Fair (FRA-PF).

27

Chapter 4

System Model

In this chapter, we describe our system model and simulation methodology

for evaluating the performance of our proposed scheduling framework. Our

evaluation methodology is centered upon using system-level simulations. In

this chapter we will cover the components that are of utmost relevance to this

thesis. More in-depth and complete description of system level simulation

methodologies can be found in [18], [19] and [20].

4.1 H.265/HEVC Video Content Generation

Analytical traffic models have been proposed for near-real time video stream-

ing in [18], where the packet sizes and packet inter-arrival times are based

on truncated Pareto distributions. While this analytical model captures the

Table 4.1: H.265/HEVC Table of Video Test SequencesSequence length Frame rate Resolution

(frames) (fps) (px x px)

FourPeople 600 60 1280x720

Johnny 600 60 1280x720

KristenAndSara 600 60 1280x720

SlideShow 200 20 1280x720

SlideEditing 300 30 1280x720

28

4.1. H.265/HEVC Video Content Generation

variability in the packet sizes coming from the video source, it is agnostic to

the specifics of the H.265/HEVC standard and therefore cannot be relied on

for generating realistic video traffic. Moreover, our objective is to evaluate

the application level experience of H.265/HEVC video users and to this end,

we use HM 14.0 to generate video bitstreams [21]. We use different video

test sequences which were used for development and testing purposes by

MPEG: FourPeople, Johnny, KristenAndSara, SlideShow and SlideEditing.

The characteristics of these video test sequences are given in Table 4.1. For

each of these video sequences, we generate the corresponding bitstream and

trace files using HM 14.0 [21], from which we extract the information of the

Reference Picture Lists, as defined in Section 2.2.2, for all frames in order

to determine the frame reference dependence structure.

For simplicity, we assume that each frame consists of only one slice seg-

ment (see Section 2.1), so that each frame is encoded inside one LDU. The

GoP size is set to 8, the Intra-Period is defined as the interval between two

consecutive I-Frames in terms of frames. The Intra-Period is always set so

that an I-Frame can be found approximately every second. Its value de-

pends on the frame-rate of the video sequence: for a frame rate of {20, 24,

30, 50, 60} fps, the Intra-Period is set to {16, 24, 32, 48, 64} (respectively).

Aside from I-Frames, we use B-Frames only. Using the bitstreams generated

from the video sequences we selected, we create a custom Traffic Model for

each video sequence and use it as input to our LTE-A simulator, which is

described below. The H.265/HEVC parameters used to generate the bit-

streams are summarized in Table 4.2. Other parameters needed to run HM

14.0 are left to their default values as in [10].

29

4.2. LTE-Advanced System Model

Table 4.2: H.265/HEVC ParametersHigh Efficiency Video Coding Parameters

Video Sequence Length 10 secondsSliceMode 0Coding Unit size 64 pixels x 64 pixelsGoP size 8Quantization Parameter 32Frame Structure IBB...BIBB...BDecoding Refresh Type Clean Random Access

4.2 LTE-Advanced System Model

In order to evaluate the performance of our proposed scheduling framework,

we use system-level simulations based on openWNS and IMTAphy [22]-

[23]. The performance evaluation methodology is based on the simulation

methodology described in Annex A of the 3GPP Technical Report 36.814

[19] and in the Evaluation Methodology Document of IEEE 802.16m [18].

In this section, we will describe some of the components and features that

we use in our performance evaluation. Evaluation methodologies based on

system-level simulations require many components to capture aspects of the

physical layer and the protocols implemented at the link layer.

4.2.1 Network Model

We consider a downlink LTE Advanced (LTE-A) system using Frequency

Division Duplex (FDD) with N = 19 base stations. Each base station is

assumed to have three sectors each in order to provide coverage, thus there

is a total of 57 sectors in the network. An illustration of the hexagonal

grid layout is provided in Fig. 4.1. To ensure that all cells experience

30


Figure 4.1: Hexagonal Network Grid Layout

similar interference and that we accurately model the impact of outer-cells,

we implement a wrap-around technique. The full system is actually modelled

as a network consisting of 7 clusters, where each cluster is made of N = 19

base stations. The central cluster is where the users are created and where

all of the statistics are collected. Fig. 4.2 illustrates the concept of wrap-

around. Virtual clusters are depicted in grey while the central cluster is

depicted in white, the central base station of each cluster is depicted in

yellow. The surrounding clusters are virtual clusters in the sense that no user

is actually dropped there. All the cells in the virtual clusters are copies of the

31


Figure 4.2: Wrap Around of Hexagonal Network

original cells in the central cluster. Everything the virtual cells have is the

same in terms of antenna configuration, traffic and fast-fading, with the only

difference being the location. Users are dropped independently at uniformly

random locations in the central cluster. For all base stations, we assume that

each sector uses 4 transmit antennas and each user uses 2 receive antennas.

This corresponds to a 4x2 Multiple Input Multiple Output (MIMO) system.

The system bandwidth B is assumed to be 10 MHz. Resource Allocation

32


Figure 4.3: LTE Downlink PRB allocation illustration

Type is assumed to be 0, i.e. that we allocate groups of Physical Resource

Block (PRB) to users. For a system bandwidth of 10 MHz, the 3GPP

standard specifies that users are allocated groups of 3 contiguous PRBs.

Fig. 4.3 depicts a PRB allocation with 4 users in a system with 10 MHz of

bandwidth. Note that at 10 MHz, the last group only contains 2 PRBs as

the total number of PRBs at 10 MHz of bandwidth is 50.

Table 4.3: FTP Traffic Model 1

Parameter Statistical Characterization

File size 2 Megabytes

User arrival rate λbe Poisson distributed process with rate λbeNumber of downloads 1 (each user downloads a single file)

33


4.2.2 Traffic Model

We model two types of traffic: Best Effort (BE) traffic and video traffic.

Traffic type assignment probability between BE and video is 0.5 each. Usu-

ally users are assumed to be active for the entire duration of the simulation,

i.e. they are created at the beginning of the simulation and dropped at the

end of the simulation, as stated in [18]. In this thesis, we decided to use more

realistic traffic models. Users are created at random time instants accord-

ing to a Poisson distributed random process. Users remain in the network

until they have completed their session or until they are dropped from the

network. For the BE traffic model, we use FTP Traffic Model 1 defined in

the 3GPP Technical Report [19] and whose parameters are summarized in

Table 4.3.

Similarly, we define a traffic model for video users; in this thesis we use

our own custom traffic model. Because we need information about frame

reference dependencies, we turn to HM 14.0 to generate realistic video bit-

streams for use in our performance evaluation. Section 4.1 covers the actual

generation of the video bitstreams in more detail. We wrap the video bit-

streams around six times as each bitstream individually carries 10 seconds’

worth of video data. This helps us generate video traffic representing one

minute’s worth of video data. Video users remain in the network until there

are no more packets left for them to receive. The parameters of our video

traffic model are summarized in Table 4.4.

34


Table 4.4: H.265/HEVC Traffic ModelParameter Statistical Characterization

Video duration 1 minute

User arrival rate λv Poisson distributed process with rate λvNumber of sessions 1 (each user watches a single video once)

4.2.3 Channel Model

For every user in the network, we need to model the effects of the large-scale

and small-scale fading. Depending on the simulated scenario, the propaga-

tion and fading characteristics of the channel may be different. In this thesis,

we focus on the Urban Macrocell scenario, also referred to as Case 1 by the

3GPP, as defined by the 3GPP in Table A.2.1.1-1 of [24]. It should be

noted that Urban Macrocell is also a scenario defined by the International

Telecommunications Union Radiocommunications Sector (ITU-R) in report

M.2135 [25]. The ITU-R scenario defines users traveling at vehicular speeds

(30 km/h) whereas the 3GPP Urban Macrocell scenario defines users as

traveling at pedestrian speeds (3 km/h). The reason for using the 3GPP

Urban Macrocell scenario is because we consider services which require high

data rates, which are more practical if the users are moving at pedestrian

speed. System-level simulations typically rely on stochastic channel models

such as the Spatial Channel Model [26] to capture these aspects. Typically,

channel models capture the number of clusters3 and their spatial character-

istics such as the delay spread, the angular spread and the power carried

by each cluster. The original implementation of the system-level simulation

tool we used, IMTAphy, uses the channel model specified by the ITU-R in

3In this thesis, we will interchangeably use the terms ”Cluster” and ”Tap”.

35


report M.2135 [25]. In [25], the channel model for the Urban Macrocell sce-

nario is defined as a 20-tap model, whereas the channel model we decided to

use is the Spatial Channel Model [26], which is a 6-tap model. There are two

reasons for choosing the 3GPP Spatial Channel Model. The first reason is

that although the ITU-R Channel Model is more accurate, it requires a large

memory footprint in terms of storing cluster and ray specific information. It

also requires high computational power due to having to sum a large num-

ber of clusters for every link, for every subcarrier and for every time-slot.

The second reason is that we are looking to do a fair comparison between

two different scheduling schemes. The relevant aspect of the channel model

that we need in order to do this is to accurately capture statistical char-

acteristics of the channel such as Delay Spread and Angular Spread rather

than to provide accurate performance predictions in real environments. The

radio channel can typically be described through its large-scale and small-

scale characteristics. Large-scale characteristics are captured through the

path-loss and the shadow fading distribution. The deterministic path-loss

formula used for the Urban Macrocell scenario is defined in [24] as follows

PL(d) = 128.1 + 37.6 log10(d) (4.1)

where PL denotes the mean path loss in dB between a given user and a given

base station and d denotes the distance between the user and the base station

in kilometers. This mean path-loss formula is valid for carrier frequencies

around 2 GHz. The distance between a user and a base station must always

be at least 35 meters. The short-term statistics are characterized by small-

36


scale parameters. Let us denote the number of clusters in a link by N . The

generation of the parameters required to compute the channel coefficients

is documented in [26] and [20]. The eventual channel impulse responses

account for the aspects of modelling a MIMO channel and are given for a

given pair of antennas s and u (resp. station and user) and a given cluster

n:

hu,s,n(t) =

√

1

KR + 1hNLoSu,s,n (t) +

√KR

KR + 1hLoSu,s,n(t) n = 1,√

1

KR + 1hNLoSu,s,n (t) 2 6 n 6 N,

(4.2)

where KR is the Ricean factor, hNLoSu,s,n is the non line-of-sight component of

the channel and hLoSu,s,n is the line-of-sight component of the channel, which

is applied only to the first cluster. The way the Spatial Channel Model is

designed, the first cluster is the cluster for which the delay is the shortest.

The non line-of-sight channel component is expressed for a given cluster and

for a given pair of transmit-receive antenna elements as follows [26]:

hNLoSu,s,n (t) =

√PnM

M∑m=1

Frx,u,V (θn,m)

Frx,u,H(θn,m)

T

exp(jΦvvn,m)

√κ−1 exp(jΦvh

n,m)√κ−1 exp(jΦhv

n,m) exp(jΦhhn,m)

Ftx,s,V (φn,m)

Ftx,s,H(φn,m)

exp(jds2πλ

−10 sin(φn,m)) exp(jdu2πλ−10 sin(θn,m)) exp(j2πνn,mt) (4.3)

where Pn is the power of the nth cluster, M is the number of rays within the

cluster, Frx,u,V and Frx,u,H are the field patterns of the uth antenna element

37


at the receiver side in the vertical and horizontal polarizations respectively,

Ftx,s,V and Ftx,s,H are the field patterns of the sth antenna element at the

transmitter side in the vertical and horizontal polarizations respectively,

θn,m and φn,m are the arrival and departure angles of the mth ray in the

nth cluster, ds and du are the distance between antenna elements at the

transmitter and receiver side respectively, νn,m is the Doppler frequency

component of the mth ray of the nth cluster and t is the time instant.

Φvvn,m, Φvh

n,m, Φhvn,m and Φhh

n,m are uniformly generated random phases used

for initialization purposes.

In a similar fashion to the non line-of-sight channel component, the line-

of-sight channel component for a given pair of transmit-receive antenna el-

ements and is expressed as follows [26]:

hLoSu,s,n(t) =

Frx,u,V (θLoS)

Frx,u,H(θLoS)

T exp(jΦvv

LoS) 0

0 exp(jΦhhLoS)

Ftx,s,V (φLoS)

Ftx,s,H(φLoS)

exp(jds2πλ

−10 sin(φLoS)) exp(jdu2πλ−10 sin(θLoS)) exp(j2πνLoSt) (4.4)

where Frx,u,V and Frx,u,H are the field patterns of the uth antenna element

at the receiver side in the vertical and horizontal polarizations respectively,

Ftx,s,V and Ftx,s,H are the field patterns of the sth antenna element at the

transmitter side in the vertical and horizontal polarizations respectively,

θLoS and φLoS are the arrival and departure angles of the line-of-sight ray,

ds and du are the distances between antenna elements at the transmitter

and receiver respectively, νLoS is the Doppler frequency component of the

line-of-sight ray and t is the time instant. ΦvvLoS and Φhh

LoS are uniformly

38


generated random phases used for initialization purposes.

The channel impulse responses given by (4.2) are expressed in the time-

domain. Since we are considering an LTE-A air interface, which is based on

OFDMA, we need frequency domain channel coefficients. The frequency do-

main channel coefficients are obtained by applying a Fast Fourier Transform

on the time domain channel impulses responses. The equivalent frequency

domain channel matrix at the kth subcarrier for a 4x2 MIMO system are

given as:

H(k) =

H1,1(k) H1,2(k) H1,3(k) H1,4(k)

H2,1(k) H2,2(k) H2,3(k) H2,4(k)

, k ∈ {1, 2, ..., NFFT } (4.5)

where NFFT is the Fast Fourier Transform size. Let us denote the Fast

Fourier Transform by F . Each individual component of the channel transfer

function H(k) at a given time-instant t is a function of the channel impulse

responses given by (4.2) and is expressed as follows [20]

Hu,s(k) = F [hu,s,1(t), hu,s,2(t), ..., hu,s,N (t)], k ∈ {1, 2, ..., NFFT }. (4.6)

In the specific case of LTE, the subcarrier spacing is defined as 15000 Hz.

For a system bandwidth of size 10 MHz, we need a sampling rate that is at

least higher than 10 MHz and that is a multiple of the subcarrier spacing, i.e.

15000 Hz. Since Fast Fourier Transforms are optimized for lengths that are

integer powers of 2, we use a Fast Fourier Transform of size NFFT = 1024.

39


4.2.4 Feedback Model

Critical to the performance of most wireless communications systems are

mechanisms for delivering Channel State Information (CSI) to the transmit-

ter. It is shown in Chapter 8 of [27] that with CSI knowledge at the trans-

mitter, one can extract the maximum performance available from MIMO

systems. The 3GPP standard has outlined several control signalling mech-

anisms for each of the transmission modes it defines. In this thesis, we use

Transmission Mode 10 with 4-Tx Release 12 linear precoding matrices [28].

The 3GPP standard defines an implicit feedback mechanism to operate the

Uplink control signalling. What is meant by ”implicit” is that instead of

sending information about the channel matrix itself, the user sends quan-

tized information about different channel statistics that can help the net-

work make appropriate scheduling decisions. The 3GPP standard defines

the content of the control signalling through 3 indicators [28]:

• Rank Indication (RI),

• Precoding Matrix Indicator (PMI),

• Channel Quality Indicator (CQI).

The RI is the rank of the channel matrix, i.e. the number of degrees of

freedom that it can carry. The PMI is the index of the Precoding Matrix

that maximizes the received power at the receiver and the CQI is the spectral

efficiency that the receiver would be able to achieve. The PMI and CQI

reports are conditioned upon the value of the RI. The reporting mode we

use in this thesis is the Aperiodic CSI Reporting Mode 3-1, as defined in

40


Section 7.2.1 of [28]. Other reporting modes are also defined by the 3GPP

[28].

Aperiodic CSI Reporting Mode 3-1 consists in a single RI report, a single

PMI report and several subband CQI reports. The size of a subband is

specified by the 3GPP standard to be 6 PRBs for a system bandwidth of 10

MHz in [28]. Thus, a single CSI report from the user will contain one value

for the RI, one value for the PMI and nine values for the CQI (one CQI

value per subband). In this thesis, we assume that the periodicity of the

CSI reports is set to 5 ms. The RI is typically a statistic that is reported less

frequently than the PMI or the CQI and its periodicity is set to 20 ms. For

the subband CQI reports, we assume non-ideal channel estimation, which is

obtained by modelling a noisy sample of the interference covariance matrix

in the equalizer vector using the complex Wishart distribution [29].

41

Chapter 5

Simulation Results and

Analysis

Some of the key targets specified by the NGMN Alliance for 5G networks

can be broadly summarized as providing consistent user experience and en-

hanced Quality of Experience. These targets are defined and outlined in [5].

As an example, one target is for the network to be able to provide a certain

user throughput for 95% of the time across 95% of the coverage area. This

is typically referred to as the 5th percentile of the Cumulative Distribution

Function (CDF) of the user throughput. We also look at the average user

throughput as an indicator of the overall user experience.

In this chapter, our simulation assumptions and results are described,

including insights gained from our results. So far, all the works in the field

of video transmission over wireless networks use Full Buffer methodologies

to evaluate performance. The main problem with Full Buffer methodologies

is that they only capture performance metrics (for instance user throughput

and served cell throughput) in a range where the network is operating at full

load. Since cellular networks experience different types of loads depending

on the time of the day, it is useful for carriers to have a more complete

42

5.1. Simulation Assumptions

view of performance at different traffic load points. One motivation for

using traffic models where user arrivals are modelled according to a Poisson

distributed random process is to capture performance at traffic load points

that are meaningful to carriers.

Intuitively, we expect that performance will be good at low traffic load

points because there is a small number of users in the network, which results

in low interference and high user throughputs. This ensures that users that

enter the network are served quickly and leave quickly. This scenario is

not attractive to carriers because although the Quality of Experience is

excellent, they are earning little revenue due to the small number of users.

Conversely, we expect that performance will be bad at high traffic load points

because there is a large number of users in the network, which results in high

interference and low user throughputs. This scenario is also unattractive to

carriers because although revenues are high due to the large number of

users accessing their spectrum, the Quality of Experience is mediocre and

this will lead to customer dissatisfaction. The desirable scenario for carriers

is intermediate traffic loads: where the number of users on the network leads

to a reasonable revenue for the carrier; the resulting moderate interference

leads to acceptable throughputs and users can enjoy reasonably good Quality

of Experience.

5.1 Simulation Assumptions

In this section, we outline some of the assumptions made in our simulations.

The main components of our system model are described in Chapter 4. Here,

43


we describe some of the other assumptions made. We assume that the base

station in our LTE-A network is a Media Aware Network Element (MANE).

A MANE is a network node which has the ability to parse an encoded

video bitstream and identify specific LDUs. Since our LTE-A base stations

can parse video bitstreams, they can specifically look for each user’s LDUs

and keep track of the RefCount field in the LDUs. Using the information

carried by the RefCount field, the LTE-A system can then keep track of the

referenced frames being sent to each video user, using exponential smoothing

update equations (3.14) and allocate resources accordingly. In the simulation

of our proposed scheduling framework, the following parameter values are

used: λ = 25, µ = 1, T = 25 and cmin = 50.

Our motivation in this work is to model a realistic 4G/beyond-4G sys-

tem. Although several research projects on 5G have been initiated, there is

no air interface specified yet for a 5G system. Therefore we use a 4G air

interface with as many up-to-date features as possible to do our performance

evaluation using metrics which have been proposed for 5G systems. For our

LTE-A system, we decide to model a 4x2 MIMO system. We also assume the

use of Single User Multiple Input Multiple Output (SU-MIMO), as opposed

to Multi User Multiple Input Multiple Output (MU-MIMO). It is shown

in Chapter 7 of [27] that in MIMO systems, the availability of both multi-

ple transmit antennas and multiple receive antennas can provide additional

spatial dimensions for communication. These additional degrees of freedom

can be exploited by spatially multiplexing different data streams onto the

MIMO channel. The main difference between SU-MIMO and MU-MIMO

is that SU-MIMO will focus on sending multiple data streams towards the

44


same user whereas MU-MIMO will focus on sending data streams towards

spatially separate users. We also assume the use of Transmission Mode 10

and assume the use of 4-Tx Release 12 Precoding Matrices [28]-[30]. Trans-

mission Mode 10 is a mode where the system allows the use of so-called

non-codebook based precoding with up to 8 layers. It is beyond the scope

of this thesis to describe the physical layer procedures and processing fea-

tures that are relevant for the operation of Transmission Mode 10. More

detailed description of Transmission Mode 10 and the associated physical

layer procedures are provided in [31]-[28]. For system-level simulations, we

need link-to-system models that can accurately translate an instantaneous

Signal to Noise Ratio (SNR) value into a corresponding instantaneous block

error rate value. Several methods exist in the literature such as Exponen-

tial Effective SNR Mapping (EESM) [32] and Mutual Information Effective

SNR Metric (MIESM) [33]. In this thesis, we use EESM. The basic idea

behind EESM is as follows: let us assume a user received a transmission

over Nsc subcarriers with instantaneous SNR value γk at the kth subcarrier.

The instantaneous effective SNR γeff using EESM is obtained as:

γeff = −β ln

(1

Nsc

Nsc∑k=1

exp

(− γkβ

)), (5.1)

where β is a correction parameter used for tuning a specific modulation.

The resulting γeff is then mapped to a corresponding block error rate. The

values of the β parameters depend on the modulation and the code rate,

e.g. β = 1.49 for Quaternary Phase Shift Keying (QPSK) with a code rate

of 13 or β = 7.68 for Quadrature Amplitude Modulation (QAM)-16 with a

45


Table 5.1: LTE-Advanced Parameters

LTE Advanced Parameters

System Bandwidth 10 MHzChannel Model Spatial Channel Model [20]Scenario Urban Macro-cell [24]Carrier Frequency 2 GHzLink-to-System Interface Exponential ESMTraffic Model Finite BufferReceiver Type Wishart-IRC [29]MIMO scheme 4x2 SU-MIMOTransmission Mode TM 10Precoding Codebook 4-Tx Release 12 [30]CSI Reporting Mode Aperiodic Mode 3-1 [28]

code rate of 45 . These values can be found in Table 19.13, Chapter 19 of

[20]. Several sources exist for the values of β that can be applied in an

LTE or LTE-A system, for our simulations we use the β values given in [32].

Parameter values for our LTE-A simulations are summarized in Table 5.1

and reflect those used in study items that 3GPP technical groups have used

for 3GPP Release 12.

As discussed in Section 4.2.2, we use traffic models that generate user

arrivals according to Poisson processes. The traffic assignment probability

is 0.5 each and in our simulations, the user arrivals rates for the two traffic

models, i.e. BE and video, are equal. This ensures that the average number

of users generated for each traffic type is the same. The length of the

simulation is chosen such that we generate at least 8000 users for each traffic

type. This was done to ensure that all the metrics that are reported in this

thesis are obtained within a 95% confidence interval of ±10% around the

mean value.

46


We use offered load per sector and Resource Utilization (RU) as our

reference points. This is because for finite buffer traffic models the 3GPP

consortium decided to evaluate performance based on the RU values a cel-

lular network goes through and we decided to align our methodology with

those assumptions. RU is defined as the ratio of the aggregated number of

radio resource blocks allocated for data traffic to the total number of ra-

dio resource blocks in the system bandwidth available for data traffic [19].

We first ran simulations using the Proportional Fair scheme and determine

the offered loads corresponding to RU values between 40% and 70%. Then

we ran simulations using the proposed scheme for those offered loads and

compare the resulting performance and QoE for both BE users and video

users. These offered loads are listed in Table 5.2. It can be seen that for

the PF scheme, the offered load per sector values range between 5.88 Mbps

per sector and 6.94 Mbps per sector. The 95% confidence interval for the

reported RU values is within ±3.2% of the reported values.

For video users, we report the Active Download Time (ADT), the satis-

fied video user percentage and the packet loss ratio of Clean Random Access

NAL units. A user is considered to be satisfied if its MOS is greater than

4. Conversely a user is considered to be unsatisfied if its MOS is lower than

3. Nightingale [9] showed that even a slight degradation in radio conditions,

i.e. a packet loss ratio of 3%, is enough to make the Quality of Experi-

ence mediocre. Clean Random Access NAL units carry the encoded video

data of I-Frames and represent the largest percentage of the bitstream in

terms of bit rate. Since the decoding of the whole video sequence is basi-

cally reliant on the correct decoding of these LDUs, the packet loss ratio of

47

5.2. Simulation Results and Discussion

these LDUs provides a good indication of how much video content becomes

non-viewable.

For BE users, we report the absolute values of the average user through-

put and the 10th-percentile of the user throughput CDF. We also report the

average user throughput in the outer region of every cell. The reason we

choose to report the 10th-percentile instead of the 5th-percentile is because

much longer simulations would be required to generate results within a 95%

confidence interval. As an example, simulations generating on average 16000

users (8000 video users and 8000 BE users respectively) take between 48 to

72 hours of run time. In order to generate results where the 95% confidence

intervals of the 5th-percentile of the user throughput are within ±10%, we

would need to generate possibly over 30000 users. This could potentially

lead to simulation run times of over a week, which is highly impractical. In

this thesis, we will refer to the 10th-percentile of the user throughput CDF

as the coverage user throughput. A given BE user’s throughput is calculated

as the ratio of the total volume of the transferred data to the download time.

For BE users, the download time is defined as the difference between the

time instant of the last packet correctly received by the user and the time

instant of the first packet transmitted to the user.

5.2 Simulation Results and Discussion

In this section, we present our simulation results and discuss the main find-

ings. We will present our results for video users followed by those for BE

users.

48


Table 5.2: Offered Load and corresponding Resource Utilization

Offered Load Resource Utilization(Mbps / Sector) (%)

5.88 40.0PF 6.27 50.0

scheme 6.58 60.06.94 70.0

5.88 35.4FRA-PF 6.27 41.9scheme 6.58 47.8

6.94 53.7

5.2.1 Results for video users

For the performance evaluation of video users, we consider two metrics. The

first metric that we introduce is the ADT: which is the time a video user

spends actively downloading video content. The second metric is the MOS

provided by users about their viewing experience.

The 95% confidence intervals for the active download time are within

±6% of the reported values. Fig. 5.1 shows the active download times video

users spend downloading video content while they are in the network. Using

the Proportional Fair scheme, video users spend between 3.5 seconds and

8 seconds downloading video content (for offered loads between 5.9 Mbps

per sector and 6.9 Mbps per sector respectively). These numbers can be

explained by the fact that with the Proportional Fair algorithm tries to be

fair to all users, video and BE alike. Resources end up being shared by all

users. Using our proposed scheme, video users are given higher importance

if their transmission queues carry referenced frames. This is due to the

barrier functions we introduced in our scheduling framework. Therefore

49


5.6 5.8 6 6.2 6.4 6.6 6.8 7 7.22

3

4

5

6

7

8

9

Offered Load [Mbps / Sector]

Vid

eo

Ac

tive D

ow

nlo

ad

Tim

e [

s]

PF

FRA−PF

Figure 5.1: Video users’ active download time

if a base-station is serving both video and BE users, video users will be

prioritized over BE users as long as they have referenced frames to receive.

Resource allocation is focused on video users first, which results in them

being served more quickly, as Fig. 5.1 shows. For offered loads between 5.9

Mbps per sector and 6.9 Mbps per sector, video users spend between 2.2

seconds and 4.2 seconds downloading video content. This is very significant

as any time video users do not spend downloading video content means that

the resources available at that time can be allocated to BE users.

Possibly the most important aspect in the performance evaluation of

video services is the MOS which reflects the quality of the viewing experience

from the users’ perspective. We are going to look into the MOS that users

would give based on the Packet Loss that they experience, which we denote

50


5.6 5.8 6 6.2 6.4 6.6 6.8 7 7.250

55

60

65

70

75

80

85

90

95

100


Sati

sfi

ed

Vid

eo

User

Perc

en

tag

e [

%]

PF − MOS > 4

FRA−PF − MOS > 4

Figure 5.2: Satisfied Video User Percentage

as the satisfied video user percentage. The 95% confidence intervals of the

satisfied video user percentage results are within ±8% of the reported values.

It was shown in [9] that the MOS is very sensitive to the Packet Loss Ratio

(PLR). The findings in [9] were that for PLRs below 1.5% correspond to a

MOS above 4 (perceptible degradation but not annoying). Assuming that

a video user’s MOS is only affected by the PLR it experiences, we can state

that the QoE of a video user will be high if the PLR is below 1.5% (i.e.,

its MOS will be greater than 4, and the video user will be satisfied). The

QoE will be low if the PLR is higher than 1.5% (i.e., its MOS will be lower

than 4, and the video user will experience significant degradation). Fig. 5.2

shows the results in terms of video user percentage for which the MOS is

greater than 4.

Our proposed FRA-PF scheme leads to a higher percentage of satisfied

51


video users, which is expected as video users have unconditional priority

over BE users. As can be seen from Fig. 5.2, for offered loads around 5.9

Mbps per sector, both PF and FRA-PF schemes are able to satisfy over

90% of video users. However the performance of the PF scheme degrades

more quickly as the load increases: for offered loads around 6.8 Mbps per

sector, the FRA-PF scheme can satisfy over 80% of video users whereas the

PF scheme satisfies less than 60% of video users.

Another aspect that we look into is the percentage of Clean Random

Access (CRA) LDUs lost. I-Frames are typically carried inside CRA LDUs

and they represent the most significant portion of the bitstream in terms

of bits. Because of the way the video compression process is defined in

the H.265/HEVC standard, I-Frames are the frames that are referenced the

most throughout a video sequence and the loss of an I-Frame causes error

propagation within the decoding process at the receiver end. We aligned

our settings for the Intra-Period so that two I-Frames are one second apart

from each other [10].

Intuitively, the loss of an I-Frame causes the loss of about one second

of video content to the end user because all subsequent B-Frames reference

an I-Frame, directly or indirectly. Those B-Frames could, strictly speaking,

still be usable by the decoder to produce a picture. The problem is that

those B-Frames could potentially be incomplete, i.e. some sections could

be missing Luminance or Chrominance sample information. The whole idea

behind H.265/HEVC is to use motion compensated prediction in as many

frames as possible. Fig. 5.3 shows the results obtained for CRA LDU loss

ratio. Since the proposed FRA-PF scheme is able to locate referenced frames

52


5.6 5.8 6 6.2 6.4 6.6 6.8 7 7.20

1

2

3

4

5

6

7

8

9

10


CR

A L

DU

Lo

ss R

ati

o [

%]

PF

FRA−PF

Figure 5.3: CRA LDU Loss Ratio

and transmit them with higher priority, FRA-PF has a lower CRA LDU loss

ratio.

Let us consider the bitstream of the video sequence FourPeople as an

example. The original video bitstream contains 9 CRA LDUs and 600 LDUs

in total. Since we wrap the bitstream around 6 times, this results in a total

of 54 CRA LDUs for a given user. With the PF scheme, the CRA LDU

loss ration goes from 1.6% to 9.0% out of the total 54 CRA LDUs as the

offered load changes from 5.9 to 6.9 Mbps per sector. This corresponds to

at least 1 LDU or at worst 5 LDUs. For offered loads near 7 Mbps per

sector, this means that as much as 5 seconds of video content becomes non-

viewable because of the loss of CRA LDUs. With the proposed FRA-PF

scheme, the CRA LDU loss ratio goes from 0.1% to 1.18% out of the total

54 CRA LDUs as the offered load changes from 5.9 to 6.9 Mbps per sector.

53


This means that in either case up to 1 LDU is lost. For offered loads near

7 Mbps per sector, this means that as much as 1 second of video content

becomes non-viewable because of the loss of CRA LDUs. This highlights

how the proposed FRA-PF scheme provides the decoder with the reference

frames to facilitate the task of decoding and also how the proposed scheme

locates the packets with greater importance for the H.265/HEVC decoder.

Providing referenced frames with greater priority helps maintain continuous

playback at the end user and contributes to enhance the viewing experience

of video users. From the user’s perspective, non-continuous video playback

will always constitute a source of dissatisfaction. Our proposed FRA-PF

scheme reduces the loss of packets carrying referenced frames, which will

help maintain continuous playback.

5.2.2 Results for Best Effort users

For BE users, we report the absolute gains of the average throughput and the

coverage throughput (which we defined in Section 5.1). The 95% confidence

intervals of the average throughput and coverage throughput are within±3%

and ±9% respectively of the reported values.

The average throughput is plotted as a function of the offered load in

Fig. 5.4. The offered load values of interest to us are in the range of 5.9

to 6.9 Mbps per sector. From Fig. 5.4, it can be seen that the with the

PF scheduling scheme, BE users can expect to get throughputs on average

between 15 Mbps and 10 Mbps. With our proposed FRA-PF scheme, users

can expect to get throughputs on average between 16 Mbps and 12 Mbps.

This is explained by the fact that our proposed FRA-PF scheme serves video

54


5.6 5.8 6 6.2 6.4 6.6 6.8 7 7.210

11

12

13

14

15

16


Th

rou

gh

pu

t [M

bp

s]

PF

FRA−PF

Figure 5.4: Average throughput for Best Effort users

traffic more quickly, as shown in Fig. 5.1. As video users are served more

quickly, radio resources then become available to BE users. The availabil-

ity of more radio resources helps BE users leave the network more quickly

and therefore experience higher throughputs. Put simply: allocating the

resources to the right users at the right time will benefit all users. This is

shown by the results we have obtained in terms of the Resource Utilization

by the network and the average throughputs users can get on average.

Fig. 5.5 shows the coverage throughput results for offered load values

between 5.9 and 6.9 Mbps per sector. As expected, the coverage throughput

is much lower compared to the average throughput. In an LTE-A system us-

ing the PF scheduling scheme, users can expect to get coverage throughputs

between 4.9 Mbps and 1.5 Mbps. In an LTE-A system using our proposed

FRA-PF scheme, for the same offered load values, users can expect to get

55


5.6 5.8 6 6.2 6.4 6.6 6.8 7 7.21

2

3

4

5

6

7


Th

rou

gh

pu

t [M

bp

s]

PF

FRA−PF

Figure 5.5: Coverage throughput for Best Effort users

coverage throughputs between 6.1 Mbps and 3.6 Mbps. This result is a lot

more significant than the average user throughput we have shown earlier.

It shows that 90% of users can expect a throughput of at least 3.6 Mbps,

which is more than double the throughput with the PF scheduling scheme.

Because we model a BE type of service, this improvement in throughput

translates into latency reduction since the volume of data to download is

fixed. For other services, e.g. Web Browsing, higher throughput can trans-

late into noticeably faster loading times and enhanced Quality of Experience.

We stated that the 95% confidence intervals of the coverage throughput are

within ±9% of the reported values, this is due to the fact that the statistics

of users that experience relatively low Signal to Interference and Noise Ratio

(SINR) are very sensitive. We model random user arrivals in our simula-

56


Figure 5.6: Illustration of the outer 10% of the coverage area

tions, which leads to inter-cell interference that varies with time. For users

experiencing low SINR, even slight improvements or degradations can have

very significants impacts on the eventual throughput they experience.

Finally, we examine the statistics for users that are geographically lo-

cated within the area covering the outer 10% of the coverage area, as de-

picted in Fig. 5.6, we will call this region the cell-edge region. The area

A of a hexagon is calculated as A = 2√

3a2, where a is the apothem of the

hexagon. Using a hexagonal network deployment as shown in Fig. 4.1 and

knowing that the inter-site distance is equal to 500 meters, we can easily

find that the apothem size is then 250 meters. The users in the cell-edge

region are those who lie outside the inner hexagon, i.e. outside the hexagon

of apothem a′ ' 237 meters. The results are shown in Fig. 5.7. For offered

loads ranging from 5.9 to 6.9 Mbps per sector, users in the cell-edge region

experience throughputs ranging from 11.4 to 8 Mbps with the baseline PF

scheme. With our proposed FRA-PF scheme, users in the cell-edge region

experience throughputs ranging from 12.2 to 9.9 Mbps per sector for the

same offered load values. The trend is consistent with those for the average

throughput and coverage throughput. However it is interesting to note that

the average throughput of users in the cell edge region is higher than the

57


5.6 5.8 6 6.2 6.4 6.6 6.8 7 7.28

8.5

9

9.5

10

10.5

11

11.5

12

12.5


Th

rou

gh

pu

t [M

bp

s]

PF

FRA−PF

Figure 5.7: Average BE user throughput in Cell-Edge region

coverage throughput values reported in Fig. 5.5. This is because we generate

users randomly over time, which leads to inter-cell interference varying over

time. As a result, a user located in the cell-edge region but not interfered by

neighbouring cells will still experience reasonably high throughputs, as Fig.

5.7 shows. Usual Full Queue simulation methodologies operate in a range

where every cell in the network is always transmitting at all times, therefore

every cell is always interfering users in neighbouring cells at all times. As

a result, only an individual user’s radio conditions will determine whether

high throughputs are achievable or not. Users located closer to their serving

cell would suffer from lower path loss, this would translate to higher average

SNR and higher throughput. Using Finite Buffer simulation methodologies,

this is no longer true due to users arriving randomly in the network and be-

58


ing subject to the inter-cell interference that is present during the time the

user is in the network. Of course, path loss always plays a significant role in

dictating overall performance but this is now tempered by the fact that users

arrive randomly in the network, which affects the inter-cell interference.

In order to provide better QoE to all users, resource allocation schemes

should target users that require the lowest amount of resources in order to be

satisfied. This will help the system deliver better user experience to all users

in the network. The QoE of all users improves thanks to the departure of

other users and our proposed scheme does that by serving video users faster.

This benefits all users in the network and helps provide a more consistent

user experience across the whole network, which is in line with the objectives

of future 5G networks.

59

Chapter 6

Conclusions and Future

Work

This chapter summarizes the main contributions of the thesis and provides

some suggestions regarding for future work.

6.1 Contributions

In this thesis, we addressed the topic of transmitting video content in 4G and

beyond-4G networks by exploiting information about the way H.265/HEVC

operates. Using knowledge of the coding structures, reference picture lists

and the process through which the H.265/HEVC encoder transmits this

information to the decoder, we proposed a cross-layer scheduling frame-

work which allocates resources to video users that need to receive referenced

frames.

Our performance evaluation of H.265/HEVC video-content delivery was

made in a mixed-traffic environment using random user arrivals and finite-

buffer traffic models. To the best of our knowledge, there is no similar work

reported in the literature. Results showed that both video and BE users

benefit from the proposed scheduling framework. Video users benefit from

60

6.2. Future Work

reduced losses on packets carrying referenced frames while BE users benefit

from improved throughput. The improvement for video users is achieved by

tracking referenced frames and focusing resource allocation towards video

users whenever their transmission queues have packets carrying referenced

frames in the video sequence. As long as there are such frames in the

transmission queue of a video user our proposed framework prioritizes these

users and allocate resources to them. This allows video users to download

video content more quickly and allows BE users to access resources more

quickly, leave the network more quickly and enjoy higher throughputs on

average as a result.

As we go towards 5G networks, the expectation from cellular networks

is that they provide a consistent user experience across the coverage area.

Results showed that 90% of BE users can expect to get between 1 Mbps

to 2 Mbps higher throughput using FRA-PF, which can potentially be the

difference between excellent and mediocre in the Quality of Experience the

user is getting. In addition, it was found that BE users in the cell-edge

region of each cell actually experience much higher throughputs than the

10th percentile of the user throughput CDF. This shows that users that

experience lower throughputs are not necessarily located in the cell-edge

region but can in fact be much closer to the base-station.

6.2 Future Work

Several future directions can be pursued, depending on which side of the

problem one wishes to focus on.

61

6.2. Future Work

If one were to focus on the communications side, one direction for future

work could be to use an air-interface that is actually going to be used in

5G systems. In this work we considered the use of a LTE-A air interface

with some 3GPP Release-12 features such as the Release 12 4-Tx Linear

Precoding. This is because at the time the work was undertaken, 3GPP

was still working on Release 13 and no air-interface had yet been proposed

for 5G systems so we did not have the opportunity to evaluate performance

for such systems. Instead we focused more on performance evaluation using

realistic traffic models over an up-to-date LTE-A air-interface and look at

the performance metrics to be used in 5G networks.

In our performance evaluation, we did not compare our proposed FRA-

PF scheme with a scheduling scheme that would strictly prioritize users

requesting video services over best effort users. It would be interesting

to see whether such a scheduling scheme achieves improvements for both

video users and best effort users. We also did not consider any admission

control policies in our traffic models, which would regulate traffic arrival in

high load situations and can have a significant impact on user experience.

Another direction for future work could be to look into traffic offloading

schemes. Since 3GPP Release 8, the 3GPP community has been introducing

support for heterogeneous networks. Smaller base-stations can be deployed

in the cell-edge region in order to provide coverage to users with stringent

QoS or QoE requirements. For example: macro base-stations can offload

specific users in the coverage area of small base-stations in order to provide

better QoE to its own users, and therefore provide a more consistent user

experience across the whole network, something that 5G networks will be

62

6.2. Future Work

required to provide. The more general problem to address is to design

scheduling frameworks which will provide the best user experience and at

the same time maximize revenue for carriers.

If one were to focus on the video encoding or video compression side,

one direction for future work could be the actual evaluation of subjective

quality. No subjective quality testing was performed in our work. The

major stumbling block that needs to be overcome is to get the reference

implementation of the H.265/HEVC decoder to produce a viewable video

sequence of a bitstream with missing LDUs. The reference decoder imple-

mentation is not designed to be robust against any form of packet loss and

aborts the decoding process at the slightest error or absence of an LDU. If we

can reconstitute samples of bitstreams with missing LDUs and output the

corresponding video sequence, it would be possible to do subjective quality

testing and gain insights into how the loss of specific packets impacts the

viewing experience. This will give much clearer insights into how packet loss

and Quality of Experience are related for video services, and more specifi-

cally how much the loss of packets carrying I-Frames hurts the Quality of

Experience.

63

Bibliography

[1] Cisco, “Cisco Visual Networking Index: Global Mobile Data Traffic

Forecast Update, 2014-2019,” February 2015.

[2] ITU-T, Advanced Video Coding for generic audio visual services - Rec-

ommendation ITU-T H.264. February 2014.

[3] ITU-T, High Efficiency Video Coding - Recommendation ITU-T H.265.

April 2013.

[4] M. Wien, High Efficiency Video Coding - Coding Tools and Specifica-

tions. Springer, May 2014.

[5] N.-G. M. Networks, “NGMN 5G White Paper,” February 2015.

[6] M. Rugelj, U. Sedlar, M. Volk, J. Sterle, M. Hajdinjak, and A. Kos,

“Novel Cross-Layer QoE-Aware Radio Resource Allocation Algorithms

in Multiuser OFDMA Systems,” IEEE Transactions on Communica-

tions, September 2014.

[7] S. Singh, O. Oyman, A. Papathanassiou, D. Chatterjee, and J. G. An-

drews, “Video Capacity and QoE Enhancements over LTE,” IEEE In-

ternational Conference on Communications, June 2012.

64

Bibliography

[8] M. Salem, P. Djukic, J. Ma, and M. Hawryluck, “QoE-Aware Joint

Scheduling of Buffered Video on Demand and Best Effort Flows,” IEEE

International Symposium on Personal, Indoor and Mobile Radio Com-

munications, September 2013.

[9] J. Nightingale, Q. Wang, C. Grecos, and S. Goma, “The Impact of

Network Impairment on Quality of Experience (QoE) in H.265/HEVC

Video Streaming,” IEEE Transactions on Consumer Electronics, May

2014.

[10] F. Bossen, “Common HM test conditions and software reference con-

figuration,” April 2012.

[11] G. Sullivan and T. Wiegand, “Rate-distortion optimization for video

compression,” IEEE Signal Processing Magazine, pp. 74–90, November

1998.

[12] T. Schierl, M. M. Hannuksela, Y.-K. Wang, and S. Wenger, “System

Layer Integration of High Efficiency Video Coding,” IEEE Transac-

tions on Circuits and Systems for Video Technology, pp. 1871–1884,

December 2012.

[13] Y.-K. Wang, R. Even, T. Kristensen, and R. Jesup, RTP Payload For-

mat for H.264 Video. IETF, May 2011.

[14] Y.-K. Wang, Y. Sanchez, T. Schierl, S. Wenger, and M. Hannuksela,

RTP Payload Format for H.265/HEVC Video. IETF, August 2015.

65

Bibliography

[15] F. Kelly, “Charging and rate control for elastic traffic,” European Trans-

actions on Communications, pp. 33–37, 1997.

[16] D. G. Luenberger and Y. Ye, Linear and Nonlinear Programming.

Springer, 3rd ed., 2008.

[17] P. A. Hosein, “QoS Control for WCDMA High Speed Packet Data,”

IEEE International Workshop on Mobile and Wireless Communications

Network, 2002.

[18] R. Srinivasan, J. Zhuang, L. Jalloul, R. Novak, and J. Park, “IEEE

802.16m Evaluation Methodology Document (EMD),” July 2008.

[19] “3GPP TR 36.814 v9.0.0 - Technical Specification Group Radio Ac-

cess Network - Evolved Universal Terrestrial Radio Access (E-UTRA)

- Further advancements for E-UTRA physical layer aspects,” March

2010.

[20] F. Khan, LTE for 4G Mobile Broadband. Cambridge University Press,

2009.

[21] “HM 14.0, HEVC Test Model Reference Implementation.” https://

hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/. Accessed: 2014-

09-30.

[22] “IMTAphy, LTE/LTE-Advanced system level simulator.” http://www.

lkn.ei.tum.de/personen/jan/imtaphy/index.php. Accessed: 2014-

05-24.

66

https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/

https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/

http://www.lkn.ei.tum.de/personen/jan/imtaphy/index.php

http://www.lkn.ei.tum.de/personen/jan/imtaphy/index.php

Bibliography

[23] “openWNS, open Wireless Network Simulator, open source system

level simulation platform for performance evaluation and comparison

of wireless and multi-cellular mobile communication systems.” https:

//launchpad.net/openwns. Accessed: 2014-05-24.

[24] “3GPP TR 25.814 v7.1.0 - Technical Specification Group Radio Access

Network; Physical layer aspects for evolved Universal Terrestrial Radio

Access (UTRA),” December 2006.

[25] ITU-R, “Guidelines for evaluation of radio interface technologies for

IMT-Advanced,” December 2009.

[26] “3GPP TR 25.996 v9.0.0 - Spatial channel model for Multiple Input

Multiple Output (MIMO) simulations,” December 2009.

[27] D. Tse and P. Viswanath, Fundamentals of Wireless Communications.

Cambridge University Press, March 2010.

[28] “3GPP TS 36.213 v12.2.0 - Technical Specification Group Radio Ac-


- Physical Layer Procedures,” June 2014.

[29] “3GPP TR 36.829 v11.1.0 - Technical Specification Group Radio Access

Network - Enhanced performance requirement for LTE User Equipment

(UE),” December 2012.

[30] A. Roessler, J. Schlienz, S. Merkel, and M. Kottkamp, “LTE-Advanced

(3GPP Rel.12) Technology Introduction - White Paper,” June 2014.

67

https://launchpad.net/openwns

https://launchpad.net/openwns

Bibliography

[31] “3GPP TS 36.211 v12.2.0 - Technical Specification Group Radio Ac-


- Physical channels and modulation,” June 2014.

[32] J. Olmos, A. Serra, S. Ruiz, M. Garcia-Lozano, and D. Gonzalez, “Ex-

ponential Effective SIR Metric for LTE Downlink,” IEEE International

Symposium on Personal, Indoor and Mobile Radio Communications,

September 2009.

[33] W. Lei, T. Shiauhe, and M. Almgren, “A fading-insensitive performance

metric for a unified link quality model,” IEEE Wireless Communica-

tions and Networking Conference, April 2006.

68

H.265/HEVC video transmission over 4G cellular networks

Documents

Transcript of H.265/HEVC video transmission over 4G cellular networks