Study of TCP Available Bandwidth Using NS2 and Its ... · medium is a highly demanded topic of...

Study of TCP Available Bandwidth Using NS2 and Its Forecasting Based on Genetic Algorithm

Cristian Hernandez Benet

Francisco Domingo Sanchez Vizcaino

Faculty of Health, Science and Technology

Computer Science

30 ECTS

Andreas Kassler and Enrica Zola

Donald Ross

140603

Study of TCP Available Bandwidth using NS2 and its

forecasting based on Genetic Algorithm

University

This thesis is submitted in partial fulfilment of the requirements

for the Master’s degree in Computer Science. All material in this

thesis which is not our own work, has been identified and no

material is included for which a degree has previously been

conferred.

Approved, June 03, 2014

Opponent: Jonathan Vestin

Advisor: Andreas Kassler

Co-advisor: Enrica Zola

Examiner: Donald Ross

Abstract

On the one hand, the available bandwidth in a bandwidth-limited medium as the wireless

medium is a highly demanded topic of study. On the other hand, the Transport Control Protocol

(TCP) is one of the most used transport protocols on the Internet. The available bandwidth study

and TCP constitute the most typical scenario in the Wireless Local Area Networks (WLAN). This

Thesis locates the study in the 2.4GHz frequency band where Primary Users can be present

modifying the behaviour of the WLAN medium. This band is unlicensed and, as a consequence of

this, the congestion is considerable. Nowadays, several studies of this band are related to Cognitive

Users operating in this band. However, this thesis studies the impact of Primary Users in the TCP

available bandwidth in a classic IEE 802.11g WLAN network. The second part of the dissertation

takes a step forward, a tool to forecast the TCP available bandwidth for WLAN based on a genetic

algorithm has been developed. This tool is able to estimate the future available bandwidth finding

the best function that will fit better to the future behaviour of the network. A genetic algorithm

programmed specifically for this purpose finds this function. A significant number of tests have

been carried out in the study. The TCP available bandwidth study shows a relation between MAC

busy time and available bandwidth in some cases. In addition, the study shows that the TCP

available bandwidth increases if the idle periods are longer. Reliable results in the forecasting have

been achieved with a limitation in some specific scenarios.

Contents

CHAPTER 1 INTRODUCTION ................................................................................................................ 1

1.1 PROJECT OVERVIEW ..................................................................................................................................... 1

1.2 MOTIVATION ............................................................................................................................................. 5

1.3 OBJECTIVES ............................................................................................................................................... 5

1.4 METHODOLOGY.......................................................................................................................................... 5

1.5 RESULTS .................................................................................................................................................... 5

1.6 ORGANIZATION OF THE DISSERTATION ............................................................................................................. 5

CHAPTER 2 BACKGROUND AND RELATED WORK .................................................................................. 7

2.1 INTRODUCTION........................................................................................................................................... 7

2.2 BACKGROUND ............................................................................................................................................ 8

2.2.1 Wireless network and protocols ................................................................................................... 8

2.2.1.1 IEEE 802.11g ................................................................................................................................................... 8

2.2.1.1.1 IEEE 802.11 MAC protocol ..................................................................................................................... 8

2.2.1.1.2 Throughput at MAC layer ...................................................................................................................... 9

2.2.1.1.3 Physical layer ........................................................................................................................................ 10

2.2.1.2 Transmission Control Protocol .................................................................................................................... 10

2.2.1.2.1 Throughput TCP .................................................................................................................................... 12

2.2.1.2.2 TCP Retransmission TimeOut .............................................................................................................. 12

2.2.2 Genetic Algorithm ...................................................................................................................... 13

2.2.2.1 Concept of genetic algorithm ...................................................................................................................... 13

2.2.2.2 Elements and biological translation ........................................................................................................... 13

2.2.2.2.1 Chromosome ........................................................................................................................................ 14

2.2.2.2.2 Genes .................................................................................................................................................... 14

2.2.2.2.3 Genotype and phenotype .................................................................................................................... 14

2.2.2.2.4 Generation ............................................................................................................................................ 15

2.2.2.2.5 Population ............................................................................................................................................. 15

2.2.2.2.6 Summarize of GA vocabulary .............................................................................................................. 16

2.2.2.3 Genetic algorithm structure ........................................................................................................................ 16

2.2.2.4 Operations of the genetic algorithm .......................................................................................................... 17

2.2.2.4.1 Initial population .................................................................................................................................. 17

2.2.2.4.2 Fitness Function .................................................................................................................................... 18

2.2.2.4.3 Selection ................................................................................................................................................ 18

2.2.2.4.4 Reproduction ........................................................................................................................................ 19

2.2.3 Time series and forecasting ........................................................................................................ 20

2.2.3.1 Time series .................................................................................................................................................... 20

2.2.3.2 Phase space and attractor construction ..................................................................................................... 21

2.2.3.2.1 Time delay ............................................................................................................................................. 23

2.2.3.2.2 Embedding dimension ......................................................................................................................... 23

2.2.3.3 Forecasting ................................................................................................................................................... 24

2.2.3.4 Forecasting methods ................................................................................................................................... 24

2.2.4 Network Simulator NS2 .............................................................................................................. 26

2.2.4.1 Architecture of NS2 ...................................................................................................................................... 26

2.2.4.2 PHY layer and MAC layer parameters ........................................................................................................ 27

2.2.4.3 NS2-CRAHN implementation ...................................................................................................................... 27

2.2.4.3.1 Modifications of the MAC layer .......................................................................................................... 28

2.2.4.3.2 Primary Users Activity block ................................................................................................................ 28

2.2.4.3.3 Spectrum Manager ............................................................................................................................... 30

2.3 RELATED WORK ........................................................................................................................................ 31

2.3.1 Traffic modelling and forecasting using genetic algorithms for next-generation cognitive radio

applications 31

2.3.2 End-to-end Protocols for Cognitive Radio Ad Hoc Networks: An Evaluation Study .................... 32

2.4 SUMMARY ............................................................................................................................................... 33

CHAPTER 3 DESIGN AND IMPLEMENTATION ...................................................................................... 35

3.1 INTRODUCTION......................................................................................................................................... 35

3.2 DESIGN OF THE WIRELESS SCENARIO ............................................................................................................. 36

3.2.1 Scenario 1: Primary users ........................................................................................................... 36

3.2.2 Scenario 2: Secondary users with real traffic inside the sensing area ........................................ 37

3.2.3 Scenario 3: Secondary users with real traffic outside the sensing area and only one affected .. 38

3.3 SIMULATION IMPLEMENTATION IN NS2 ........................................................................................................ 38

3.3.1 General simulation outline ......................................................................................................... 39

3.3.2 Configuration of NS2 for IEEE 802.11g ....................................................................................... 41

3.3.3 MAC implementation in NS2 ...................................................................................................... 42

3.3.3.1 Implementation for Primary Users ............................................................................................................. 43

3.3.3.1.1 PU spectrum management .................................................................................................................. 43

3.3.3.1.2 PU activity detection ............................................................................................................................ 45

3.3.3.1.3 PU activity pattern and busy time ....................................................................................................... 45

3.3.3.2 MAC busy time measurement .................................................................................................................... 45

3.3.4 No Ah-Hoc Routing Agent (NOAH) ............................................................................................. 46

3.3.5 Acquisition and playback of real wireless traffic ........................................................................ 47

3.3.5.1 Acquisition and conversion of traffic trace ................................................................................................ 47

3.3.5.2 Playback of real traffic in NS2 ..................................................................................................................... 47

3.3.6 Data analysis and graphic representation ................................................................................. 48

3.3.7 Implementation of the wireless scenario in NS2 ........................................................................ 48

3.4 GENETIC ALGORITHM IMPLEMENTATION ........................................................................................................ 49

3.4.1 General GA outline ..................................................................................................................... 49

3.4.2 Encoding of the chromosome ..................................................................................................... 51

3.4.3 Definition of the fitness function ................................................................................................ 53

3.4.4 Generation of N random chromosomes ..................................................................................... 57

3.4.5 Calculate the fitness by means of the fitness function for each one of the chromosomes ......... 59

3.4.5.1 Generation of the chromosome set ........................................................................................................... 59

3.4.5.2 Calculation of the chromosome phenotype .............................................................................................. 59

3.4.5.3 Restrictions in the calculation ..................................................................................................................... 60

3.4.5.4 Fitness function ............................................................................................................................................ 60

3.4.6 Elitism process ............................................................................................................................ 61

3.4.7 Mating pool ................................................................................................................................ 62

3.4.8 Selection process ........................................................................................................................ 63

3.4.8.1 Rank-based roulette wheel selection ......................................................................................................... 64

3.4.8.2 Exponential ranking wheel selection .......................................................................................................... 66

3.4.8.3 Example of effects on the selection method ............................................................................................. 67

3.4.9 Crossover operator ..................................................................................................................... 68

3.4.10 Mutation operator...................................................................................................................... 69

3.4.11 New population .......................................................................................................................... 69

3.4.12 Stopping criteria and error evaluation ....................................................................................... 70

3.4.13 Prediction ................................................................................................................................... 70

3.4.14 Interface and parameters ........................................................................................................... 71

3.5 SUMMARY ............................................................................................................................................... 75

CHAPTER 4 EVALUATION .................................................................................................................. 76

4.1 INTRODUCTION......................................................................................................................................... 76

4.2 EVALUATION OF AVAILABLE TCP BANDWIDTH FOR DIFFERENT ON/OFF PU ACTIVITY PATTERNS. ............................. 77

4.2.1 Deterministic patterns ................................................................................................................ 77

4.2.1.1 50 percent fixed ON-OFF rate ..................................................................................................................... 77

4.2.1.2 25 percent fixed ON rate ............................................................................................................................. 79

4.2.1.3 Fixed alpha and different beta values ........................................................................................................ 79

4.2.1.4 Wide range of alpha and beta values ......................................................................................................... 81

4.2.1.4.1 Available bandwidth ............................................................................................................................ 81

4.2.1.4.2 MAC busy time ..................................................................................................................................... 82

4.2.1.4.3 Available bandwidth vs MAC busy time ............................................................................................. 83

4.2.2 Randomly generated patterns .................................................................................................... 84

4.2.2.1 50 percent fixed ON-OFF rate ..................................................................................................................... 84

4.2.2.2 25 percent fixed ON ..................................................................................................................................... 86

4.2.2.3 Fixed alpha and different beta values ........................................................................................................ 86

4.2.2.4 Wide range of alpha and beta values ......................................................................................................... 87

4.2.2.4.1 Available bandwidth ............................................................................................................................ 87

4.2.2.4.2 MAC busy time ..................................................................................................................................... 89

4.2.2.4.3 Available bandwidth vs MAC busy time ............................................................................................. 89

4.2.2.5 Comparison between TCP and UDP throughput ....................................................................................... 90

4.2.3 Available TCP throughput over time, Congestion window, RTO multiplicative factor and

Smoothed RTT analysis. ..................................................................................................................................... 92

4.3 EVALUATION OF AVAILABLE TCP BANDWIDTH WITH REAL TRAFFIC SECONDARY USERS ............................................ 92

4.3.1 Sender inside the sensing area ................................................................................................... 93

4.3.2 Sender outside the sensing area ................................................................................................. 94

4.4 GENETIC ALGORITHM EVALUATION AND TESTS ................................................................................................ 95

4.4.1 Periodic and symmetric function evaluation .............................................................................. 97

4.4.1.1 Evaluation of the fitness function ............................................................................................................... 97

4.4.1.1.1 Evaluation of the fitness function for same initial population.......................................................... 97

4.4.1.1.2 Evaluation of the fitness function for random initial population ..................................................... 99

4.4.1.2 Evaluation of the selection method and diversity for the cosine function.............................................. 99

4.4.1.3 Example of prediction with symmetric and periodical function ............................................................ 101

4.4.2 Non-random traffic with ON-OFF pattern evaluation .............................................................. 102

4.4.2.1 Evaluation of the fitness function ............................................................................................................. 103

4.4.2.1.1 Evaluation of the fitness function for non-random and same initial population .......................... 103

4.4.2.1.2 Evaluation of the fitness function for non-random and random initial population ...................... 107

4.4.2.2 Evaluation of the selection method for the non-random traffic ............................................................ 108

4.4.2.3 Example of one of the best predictions ................................................................................................... 109

4.4.3 Random traffic with ON-OFF pattern evaluation ..................................................................... 110

4.4.3.1.1 Evaluation of the fitness function for random pattern and same initial population .................... 111

4.4.3.1.2 Evaluation of the fitness function for random pattern and random initial population ................ 114

4.4.3.2 Evaluation of the selection method for the random ON-OFF pattern traffic ........................................ 114

4.4.4 Real traffic from NS2 evaluation .............................................................................................. 116

4.4.4.1.1 Evaluation of the fitness function for real traffic and same initial population .............................. 117

4.4.4.1.2 Evaluation of the fitness function for real traffic and random initial population ......................... 119

4.4.4.2 Evaluation of the selection method and diversity for the real traffic .................................................... 120

4.4.5 Limitation ................................................................................................................................. 122

4.5 SUMMARY ............................................................................................................................................. 123

CHAPTER 5 CONCLUSIONS .............................................................................................................. 125

5.1 FINAL CONCLUSIONS ................................................................................................................................ 125

5.1.1 Conclusions ............................................................................................................................... 126

5.1.2 Project Evaluation .................................................................................................................... 130

5.1.3 Problems found during the project ........................................................................................... 130

5.2 FUTURE WORK ....................................................................................................................................... 131

REFERENCES 133

SCENARIO IMPLEMENTATION IN NS2 ............................................................................. 146

PHYSICAL AND MAC LAYER PARAMETERS FOR 802.11 .................................................... 151

TESTS OF TCP THROUGHPUT OVER TIME AND OTHER PARAMETERS FOR DIFFERENT PU

ACTIVITY ON-OFF PATTERNS. ....................................................................................................................... 153

A. 50% ON TIME TCP AVAILABLE THROUGHPUT RESULTS FOR DIFFERENT VALUES ............... 153

B. TCP AVAILABLE THROUGHPUT OVER TIME, CONGESTION WINDOW, CURRENT RTO

MULTIPLICATIVE FACTOR, SMOOTHED RTT FACTOR AND SLOW-START THRESHOLD. ..................................... 156

C. ANALYSIS OF A WIDE RANGE OF ALPHA AND BETA VALUES AND RANDOM GENERATED

PATTERNS 160

AVERAGE ON TESTS RESULTS ......................................................................................... 163

GENERAL OVERVIEW OF GA RESULTS ............................................................................. 165

List of Figures

Figure 1: General diagram of the thesis work ................................................................................. 5

Figure 2: MAC layer throughput Max rate vs throughput [18] .................................................... 10

Figure 3: Evolution of TCP's congestion window [23] ................................................................ 12

Figure 4: Example phenotype and genotype ................................................................................. 15

Figure 5: Genetic algorithm structure [37] ................................................................................... 17

Figure 6: One point crossover example ........................................................................................ 19

Figure 7: Mutation example .......................................................................................................... 20

Figure 8: From parents to offspring process ................................................................................. 20

Figure 9: Example of Lorenz chaotic attractor [53] ...................................................................... 21

Figure 10: Binding between C++ and OTcl .................................................................................. 26

Figure 11: NS2-CRAHN schema [4] ............................................................................................ 28

Figure 12: PU-log file first part .................................................................................................... 29

Figure 13: ON-OFF model............................................................................................................ 29

Figure 14: Example of ON-OFF model distribution in time ........................................................ 29

Figure 15: ON-OFF distribution programmed in NS2-CRAHN .................................................. 30

Figure 16: TCP throughput for different alpha beta combinations for CRAHN [4] .................... 33

Figure 17: Map of Scenario 1 ....................................................................................................... 37

Figure 20: Simulation overview diagram ..................................................................................... 40

Figure 21: Primary Users flow chart and location ........................................................................ 44

Figure 22: MAC busy time ........................................................................................................... 46

Figure 23: GA general outline ...................................................................................................... 51

Figure 24: Encoded space and solutions space ............................................................................. 52

Figure 25: Shifted window and training set .................................................................................. 54

Figure 26: Chromosome generator function ................................................................................. 58

Figure 27: Fitness calculation ....................................................................................................... 61

Figure 28: Elitism process ............................................................................................................ 62

Figure 29: Mating pool creation ................................................................................................... 63

Figure 30: Effect of the SP on the probability .............................................................................. 65

Figure 31: Effect of C on the probabilities ................................................................................... 66

Figure 32: Ranking vs Exponential............................................................................................... 67

Figure 33: Three selection method example ................................................................................. 68

Figure 34: New population ........................................................................................................... 69

Figure 35: Best chromosome for the prediction ........................................................................... 73

Figure 36: Available bandwidth for 50% PU ON ......................................................................... 78

Figure 37: Available bandwidth for 25% PU ON ......................................................................... 79

Figure 38: Available bandwidth for alpha equal to 0.0768 and different beta values .................. 80

Figure 39: Available bandwidth for alpha equal to 2.8 and different beta values ........................ 80

Figure 40: 3D graphs with the available bandwidth for different alpha/beta combinations and non-

random patterns ............................................................................................................................. 81

Figure 41: 3D graph with the MAC business for different alpha/beta combinations and non-random

patterns .......................................................................................................................................... 83

Figure 42: 3D graph with the MAC business and normalized throughput different alpha/beta

combinations and non-random patterns ........................................................................................ 84

Figure 43: Available bandwidth for 50% ON and random generation ......................................... 85

Figure 44: Available bandwidth for 25% ON and random generation ......................................... 86

Figure 45: Available bandwidth for alpha equal to 2.8 and different beta values with random

generation ...................................................................................................................................... 87

Figure 46: 3D graphs with the available bandwidth for different alpha/beta combinations and

random patterns ............................................................................................................................. 88

Figure 47: 3D graph with the MAC business for different alpha/beta combinations and random

patterns .......................................................................................................................................... 89

Figure 48: 3D graph with the MAC business and normalized throughput different alpha/beta

combinations and random patterns ............................................................................................... 90

Figure 49: alpha=1.5 and beta=0.5 ............................................................................................... 91

Figure 50: alpha=1.5 and beta=0.5 ............................................................................................... 91

Figure 51: Throughput over time of the real traffic played back .................................................. 93

Figure 53: Fitness selection evaluation for the same initial population with periodic and symmetric

function ......................................................................................................................................... 98

Figure 54: Fitness selection evaluation for random initial population with a periodic and symmetric

function ......................................................................................................................................... 99

Figure 55: Results for the selection method with and without diversity for a periodic and

symmetric function ..................................................................................................................... 100

Figure 52: Cosine function.......................................................................................................... 101

Figure 56: Throughput from non-random pattern with alpha 2.2 and beta 1.1 .......................... 103

Figure 57: Fitness selection evaluation for the same initial population with Non-random ON-OFF

pattern ......................................................................................................................................... 104

Figure 58: Best and worst prediction equation 2 for the same initial population ....................... 105

Figure 59: Example of Equation 2 comparing the MSE ............................................................. 106

Figure 60: Fitness selection evaluation for random initial population with non-random ON-OFF

pattern ......................................................................................................................................... 107

Figure 61: Random initial population for non-random alpha 2.2 and beta 1.1 with MAPE less than

100% ........................................................................................................................................... 108

Figure 62: Results for the selection method with and without diversity for non-random ON-OFF

pattern ......................................................................................................................................... 109

Figure 63: Throughput from a random pattern with alpha 2.2 and beta 0.08 ............................. 111

Figure 64: Training set and prediction zone for random traffic with alpha 2.2 and beta 0.08 ... 112

Figure 65: Fitness selection evaluation for the same initial population with Random ON-OFF

pattern ......................................................................................................................................... 113

Figure 66: Fitness selection evaluation for the same initial population with Random ON-OFF

pattern (with errors criterion) ...................................................................................................... 113

Figure 67: Fitness selection evaluation for random initial population with random ON-OFF pattern

..................................................................................................................................................... 114

Figure 68: Results for the selection method with and without diversity for random ON-OFF pattern

..................................................................................................................................................... 115

Figure 69: Throughput from the real traffic simulated in NS2 ................................................... 117

Figure 70: Training set area and prediction area for real traffic with scaled values ................... 118

Figure 71: Fitness selection evaluation for the same initial population with real traffic ............ 119

Figure 72: Fitness selection evaluation for random initial population with real traffic.............. 120

Figure 73: Results for the selection method with and without diversity for real traffic ............. 121

Figure 74: Random activity pattern for an alpha 2.6 and beta 1.3 .............................................. 123

Figure 75: Graphs of TCP throughput over time for 50% PU ON ............................................. 156

Figure 76: TCP throughput heat-map ......................................................................................... 161

Figure 77: UDP throughput heat-map......................................................................................... 162

List of Tables

Table 1: MAC layer throughput [18] ............................................................................................ 10

Table 2: Genetic Algorithm vocabulary [36, p. 7] ........................................................................ 16

Table 3: Example of reconstructed vector .................................................................................... 22

Table 4: Example training set window ......................................................................................... 54

Table 5: Chromosome set for a two chromosome population and a training set of ten ............... 59

Table 6: Genotype and phenotype of the chromosome set ........................................................... 60

Table 7: Main parameters ............................................................................................................. 72

Table 8: Set of graphs for the user ................................................................................................ 72

Table 9: Prediction and statistical results...................................................................................... 75

Table 10: Test results for Scenario 2 ............................................................................................ 94

Table 11: Test results for Scenario 3 ............................................................................................ 94

Table 12: Nomenclature of the fitness equations .......................................................................... 96

Table 13: Nomenclature of the selection method ......................................................................... 96

Table 15: GA set-up for the cosine function without diversity and same initial population ........ 98

Table 14: Example 1 for a cosine function ................................................................................. 102

Table 18: Parameters GA for non-random using same initial population .................................. 104

Table 19: Best and worst result for equation 2 with the same initial population........................ 105

Table 20: Worst prediction and another chromosome for tests with equation 2 ........................ 106

Table 22: Best and worst results from random and same initial population ............................... 108

Table 24: Best prediction example for a non-random traffic with ON-OFF pattern .................. 110

Table 25: GA set-up for the random traffic alpha 2.2 and beta 0.08 with the same initial population

..................................................................................................................................................... 112

Table 28: Best prediction example for a random traffic with ON-OFF pattern ......................... 116

Table 29: GA get-up for the real traffic scenario with the same initial population .................... 118

Table 31: Best and worst results from random and same initial population with real traffic ..... 120

Table 33: Best prediction example for a real traffic played back in NS2 ................................... 122

Table 34: Frame parameters of 802.11g [15].............................................................................. 151

Table 35: Timing parameters of IEEE 802.11g standard [15] .................................................... 152

Table 36: Throughput over time, Congestion window, Current RTO multiplicative factor and

Slow-start threshold for alpha 1.5 and beta 0.5 .......................................................................... 157

Table 39: Results of equation 2 with same initial population with non-random alpha 2.2 and beta

1.1................................................................................................................................................ 164

Table 40: Average of equation 2 resulting of run the GA 50 times ............................................ 164

Table 41: General fitness functions results ................................................................................. 167

Table 42: General results for the diversity and selection method ............................................... 168

Table 43: MAPE best and worst results ...................................................................................... 168

Chapter 1 Introduction

1.1 Project overview

The available bandwidth at Transport Control Protocol (TCP) layer is a very important topic

in Wireless Local Area Networks. This thesis locates the study of the available bandwidth in the

2.4 GHz band, also called Industrial Scientific and Medical (ISM) band [1]. This band is unlicensed

and, as a consequence of this, the congestion is considerable. Additionally to the WLAN users -

also called Secondary Users-, Primary or Licensed Users can use the 2.4 GHz frequency band.

PUs are users with preference to operate in the band and that will interfere Secondary Users leading

to the loss of all data. The Primary Users will use the medium without taking into account the

Secondary Users in the network, but by interfering the Secondary Users. Nowadays, several

studies are related to Cognitive Users [2] operating in this band [3] [4] [5] [6]. However, this thesis

will study the impact of this Primary Users in the TCP available bandwidth in a classic IEEE

802.11g WLAN. IEEE 802.11 does not define a Primary-Secondary user management. IEEE

802.11 uses Carrier Sense Multiple Access with Collision Avoidance (CSMA/CA), and it makes

all users equal in terms of opportunity to access the network. However, Primary Users do not

necessarily have to implement a similar strategy. Therefore, this project considers that if a Primary

User is active in the wireless medium, all the secondary users in the same frequency band and

inside the transmission range of the Primary Users will lose all the sent packets due to

2 CHAPTER 1. INTRODUCTION

interferences. The activity of the Primary Users is defined by means of ON-OFF patterns governed

by a Birth-death Markovian process. The use of these patterns, that are completely defined by only

two parameters (alpha and beta), offers the advantage of covering a wide range of cases in an easily

to be controlled way. A classic situation of a WLAN with only Secondary Users with and without

the “hidden node problem” is also studied. Some interesting conclusions have been drawn.

The second part of the thesis takes a step forward: a tool to forecast the TCP available

bandwidth for WLAN by means of a genetic algorithm (GA) has been developed. This tool is able

to estimate the future available bandwidth finding the best function that will fit better to the future

behaviour of the network. A genetic algorithm that has been specifically programmed for this

purpose finds this function. This GA has been tested in different scenarios simulated in NS2.

Furthermore, different GA functionalities such as fitness equations, selection methods and

diversity methods are studied. These functionalities are tested and evaluated for the scenarios

proposed. Figure 1 depicts a very general outline of the most important work process done in this

thesis.

NS2 simulation set-up

Genetic algorithm set-up

Simulation of TCP availble throughput

Find the best function for forecasting

Forecasting of TCP available throughput

PU activity

pattern

TCP available throughput and MAC busy

time calculation

NS2 traffic

Figure 1: General diagram of the thesis work

CHAPTER 1. INTRODUCTION 3

1.2 Motivation

On the one hand, the available bandwidth in a bandwidth-limited medium as the wireless

medium is a much-demanded topic of study. On the other hand, the Transport Control Protocol

(TCP) is one of the most used transport protocols on the Internet. Both of them constitute the most

typical scenario in the Wireless Local Area Networks (WLAN). This Thesis presents a step further

to the typical situation of a WLAN, locating the study in a very congested frequency band where

Primary Users(PU) can be present modifying the behaviour of the WLAN medium in a particular

way. Most of the current studies are related to Cognitive Users operating in this band. However,

this thesis will study the impact of the Primary Users in the TCP available bandwidth in a classic

IEE 802.11g WLAN network. These reasons motivated this team to analyse the performance of

TCP in terms of available bandwidth in these situations and to develop a powerful tool in order to

forecast the available bandwidth in the immediate future of the WLAN environment.

This study could help in the future to identify which types of PU traffic are more detrimental

to WLAN. In addition, it might be possible to identify the available bandwidth of a user that wants

to get into a network. This might be done by identifying the pattern of the PU activity in the

network and then checking the corresponding available bandwidth – from the presented results-

for that pattern. Furthermore, the study of the effects on the PUs on the Secondary Users could

help to understand the effect of those technologies that nowadays the ISM band [7] is using such

as the Bluetooth, Zigbee, WiMAX, DECT, among others. The Genetic Algorithm could help to

forecast the available bandwidth in those situations and be able to manage this information so as

to take the appropriate decisions according to the specific network requirements.

1.3 Objectives

This project is divided in two main blocks: the study of TCP available bandwidth with a

network simulator and the available bandwidth forecasting using a genetic algorithm. The goal is

to use the results of the former to be used in the latter as input and reference. The main aim is to

study the impact of different ON/OFF patterns of Primary Users (PU) on secondary users (SU)

available throughput. In order to accomplish this goal, an implementation in NS2 [8] network

simulator should be developed. This is immediately followed by the available throughput

forecasting for different patterns of ON-OFF PU activity using the GA. The purpose is to develop

the full algorithm step by step in MATLAB and to test it in several situations. In addition, the

intention is also to carry out a study on the relation between TCP available bandwidth and MAC

busy time. Another objective is to play back real captured WLAN traffic in the network simulator

and analyse the available bandwidth in different scenarios with only SU. The last point presented

in this thesis will be the test of the reliability of the GA in real situations with the aim of evaluating

the impact on quality/error for different available bandwidths with real traffic processed in the

simulator.

1.4 Methodology

The methodology is the procedure to be followed during the research and development work

[9]. The Engineering Design Process [10] has been the main method taken into account. This

method defines the problem. Later, a deep background research on the topic is carried out. Once

the background is properly understood, the requirements are specified and a Brainstorming of

possible solutions is performed. The next step is to choose the best solution for the problem so as

to be developed and built. The last step is to test the solution and redesign it whether it is required.

Within this process, we usually jump at any time from one step to another if something to be

improved or modified is found.

The research work is mainly based on the analysis of scientific research publications, books,

science magazines and online sources.

1.5 Results

This point present briefly the results obtained in this thesis. A large number of available

bandwidth results for different PU activity patterns combinations have been obtained. The

available bandwidth tends to grow lightly as the parameters alpha and beta do (alpha and beta are

the parameters that define the pattern behaviour) and the retransmission time has a fundamental

impact on the available bandwidth. Randomness in the PU activity makes predictions very

unreliable. Regarding the MAC busy time (time that the Media Access Control layer is busy), it

seems to be a relationship exists between TCP available bandwidth and MAC busy time.

CHAPTER 1. INTRODUCTION 5

In a scenario with only secondary users and real traffic, they share the bandwidth if they can sense

each other. This situation is opposed to a situation with the “hidden node problem” where the

hidden node affects harmfully to the available bandwidth. The results of the available bandwidth

with PU activity can be also applied to the “hidden node problem” because the response is similar,

but not equal because the hidden node will be affected by the transmission of acknowledgements

of the node that is suffering the interference.

The implementation of the GA is successful and its correct functioning has been proved. However,

some limitations have been encountered in the GA for the proposed scenario. Regarding the

available bandwidth for ON-OFF PU activity patterns tested with the GA, even though the

behaviour of the available bandwidth is chaotic and its forecasting is very complicated, acceptable

results have been obtained.

For further details, please refer to Chapter 4 and Chapter 5.

1.6 Organization of the dissertation

This dissertation is structured in a progressive way. In an early step, the basic concepts will

be presented in order to provide the reader a necessary knowledge to understand the subsequent

chapters. All the terminology is explained once is shown on the text. The thesis is divided into the

following parts:

• Introduction

The first part of the dissertation introduces the reason why the topic of the thesis was selected, the

main goals to be achieved, some guidelines about methodology and an outline of the results of the

study.

• Background and Related work

The aim of this chapter is to describe the basic knowledge about the developed topics in this thesis

so as to understand better the implementation presented in the next chapter.

• Design and Implementation

In this chapter, the whole design and implementation of the scenario developed in this thesis is

explained. This part tackles all the specific work done in order to get a software able to deploy the

tests that will be evaluated in the following chapter.

• Evaluation

In this part, the results and evaluation of several tests carried out within the developed

implementation in this thesis are presented.

• Conclusions

In the end, the dissertation draws the conclusions resulted from the implementation and results

of the whole project

Chapter 2 Background and Related work

In this chapter all the previous background knowledge considered necessary to properly

understand the implementation and the project scope of this master thesis is described. Also some

of the related work that are relevant for this thesis is pointed out.

2.1 Introduction

This chapter is divided into two parts: the background study and the related work. The former

is broken down again into four main parts. Firstly, general previous information regarding

standards and communications protocols is explained. Secondly, basic concepts about genetic

algorithms that may be helpful in understanding the project are presented. Thirdly, the time series

and forecasting are tackled and finally, some features and basic knowledge regarding the network

simulator used in this thesis are described. The second part of the chapter analyses some previous

studies that can be used as a reference for this thesis.

8 CHAPTER 2. BACKGROUND AND RELATED WORK

2.2 Background

The section describes the basic knowledge about the developed tools in this thesis that will

help to understand better the chapters concerning the implementation and evaluation.

2.2.1 Wireless network and protocols

In this part, the network standards and protocols that apply in this thesis implementation are

explained, including IEEE 802.11g wireless local area network standard and the transmission

control protocol.

2.2.1.1 IEEE 802.11g

IEEE 802.11 is a Wireless Local Area Network (WLAN) standard developed by the Institute

of Electrical and Electronics Engineers (IEEE) and published in 1997 [11]. IEEE 802.11 is a

standard containing a set of Media Access Control (MAC) and physical layer specifications for

the implementation of Wireless Local Area Networks (WLAN) [12] in the 2.4, 3.6, 5 and 60 GHz

frequency bands.

IEEE 802.11g [13] is an advanced version of 802.11 that supports data rates per stream of 6, 9, 12,

18, 24, 36, 48, 54 Mbps. IEEE 802.11g employs a transmission scheme based on Orthogonal

Frequency-Division Multiplexing (OFDM) in the 2.4 GHz frequency band, also called Industrial,

Scientific and Medical (ISM) band.

2.2.1.1.1 IEEE 802.11 MAC protocol

In WLAN, the MAC protocol – protocol used to manage the MAC layer- is what primarily

determines how optimum the bandwidth sharing of the wireless channel is [14]. IEEE 802.11

standard defines two access methods: the Distributed Coordination Function (DFC), which is for

distributed access, asynchronous and it uses contention measures; and the Point Coordination

Function (PCF) for centralized access without contention. As it is described in [15], the DCF

method uses Carrier Sense Multiple Access with Collision Avoidance (CSMA/CA). With this

method, before delivering any data packet, the station senses the medium for a DCF Interframe

CHAPTER 2. BACKGROUND AND RELATED WORK 9

Space (DIFS) [16] to detect if there are other transmissions ongoing. If the medium is sensed as

free for a DIFS duration, the transmitter station keeps sensing the medium a time corresponding

to a random multiple of the slot time – from 0 to the called Contention Window (CW) minus 1-,

this time is called the back-off interval and it is used to minimise the possibility of collisions. If

during the back-off interval the medium becomes busy, the time counter stops until the channel is

idle again for more than a DIFS duration.

The contention window size is not a fixed value. In fact, it changes following an exponential law,

where the CW value is set equal to a specified minimum value noted as CWmin that is used in the

first transmission. The CW is doubled every transmission up to the maximum contention window

- noted as CWmax - if the transmission does not fail. When the Back-off interval ends, the

transmitter can either transmit a data packet or send a Request to Send (RTS) that will be answered

with a Clear to Send (CTS) if the receiver is available. The RTS/CTS handshake is an optional

mechanism and its aim is to avoid problems with interferences and hidden nodes [17]. Once the

packet is received at the destination, a MAC layer acknowledgement (ACK) after a Short Inter-

frame Space (SIFS) will be transmitted.

2.2.1.1.2 Throughput at MAC layer

In Figure 2, what should be approximately the throughput at MAC layer for different

configurations of the maximum WLAN data rate per stream is shown. For 54 Mbps of maximum

WLAN data rate, the expected throughput is around 23 Mbps. The throughput is never the

maximum available data rate of the link as the data in Table 1 shows. There are several factors that

affect the throughput, for example: The ACK that is always sent at a lower data rate, Physical

Layer Convergence Protocol (PLCP) preamble that is sent at a lower data rate, overhead,

transmission range and interferences.

Max rate

[Mbps]

Throughput

6 5.12

9 7.23

12 9.1

18 12.28

24 14.98

36 18.86

48 22.13

54 23.31

Figure 2: MAC layer throughput Max rate vs throughput [18] Table 1: MAC layer throughput [18]

2.2.1.1.3 Physical layer

The PHY [19] is divided again into 2 sub-layers –as also is the MAC layer- called Physical

Layer Convergence Protocol (PLCP) and Physical Medium Dependent (PMD). On the one hand,

the PLCP layer is in charge of the Carrier Sense (CS) part of the CSMA/CA protocol described in

2.2.1.1.1. On the other hand, the PMD is the layer in charge of the modulation scheme management

and signal encoding.

2.2.1.2 Transmission Control Protocol

Transmission Control Protocol (TCP) is defined in RFC 793 [20] [21]. TCP is defined as “ a

reliable connection-oriented delivery service”. The RFC specified that, TCP was defined with the

aim of providing a very reliable host-to-host protocol for packet-switched computer

communication networks. The term connection-oriented means that a connection must be

established before hosts initiate the data transmissions. This connection is established by means of

the so-called three-way handshake. TCP uses segmentation of data packets. It implies that if one

segment of the whole packet is not received properly, the whole packet will be resent. Reliability

in TCP is achieved by assigning a sequence number to each segment to be transmitted. In order to

confirm that the packet is received free of errors, TCP sends an ACK for every received

packet/segment. TCP also preserves the packet order.

If an ACK is not received on time -before the retransmission timeout (RTO)-, the data are

retransmitted. RTO is based on the estimated round-trip time (RTT) between the sender and

receiver, as well as the variance in this round-trip time. This is explained in detail in 2.2.1.2.2.

TCP defines a sliding window [22] protocol that is an option to increase the received window size

–or receiver's advertised window - to avoid exceeding the data processing capacity of the receiver.

Each TCP packet/segment contains the current value of the receiver's advertised window. This is

useful because with this option the sender can send “bursts” with the maximum window size

without waiting for the ACK of each packet sent in the burst.

Additionally, a congestion window [23] in transmission side is defined to avoid exceeding the

capacity of the network. TCP uses a mechanism called slow start to increase the congestion

window faster after a connection is initialized and after a timeout event. It starts with a window of

two times the maximum segment size (MSS). MMS defines the largest possible amount of data to

avoid fragmentation, and TCP protocol defines it. The congestion window doubles for every RTT

until it reaches a threshold, this is depicted in Figure 3. When the congestion window is below the

threshold, the congestion window grows exponentially and when the congestion window is above

the threshold, the congestion window grows linearly -1 MMS each RTT -. Whenever a timeout

occurs, the threshold is set to one half of the current congestion window and the congestion

window is then set to one.

Figure 3: Evolution of TCP's congestion window [23]

2.2.1.2.1 Throughput TCP

The congestion window and the receiver window (receiver’s advertised window) limit the

TCP throughput. In the RFC 6349 sec 3.3 [24] it is defined how to the calculation for the TCP

throughput must be done. The equation 2.1 defines the calculation method.

TCP throughput = TCP RWND ∗ 8�� (2.1)

Where:

- TCP RWND is Receive Window size or receiver's advertised window

- RTT is the Round-Trip Time

2.2.1.2.2 TCP Retransmission TimeOut

As it is described in [25, pp. 199-200], for every received packet TCP send back an

Acknowledgement. If it is not received before a given Retransmission TimeOut (RTO), TCP

sender assumes that that packet is lost and it will retransmit it.

The computation of the current RTO value follows a trade-off between a small value that will

imply a lot of unnecessary retransmission and a high value that will result in a high latency in

packet loss detection. RTO is a function of the Round Trip Time (RTT). The RTT value is not the

same for all packets and Smoothed RTT and RTT variation [26] are calculated based on several

RTT samples. RTO depends on instantaneous RTT, Smoothed RTT, RTT variation and the Binary

Exponential Back-off (BEB). The BEB or RTO back-off multiplicative factor is set to 1 at the

beginning and doubled with every timeout until an ACK is received. As a result of this, for several

consecutive timeouts, the RTO will grow to a considerable value.

2.2.2 Genetic Algorithm

In this section, some definitions and explanations about the genetic algorithms such as the

basic structure, main terminology and the main functions are presented.

2.2.2.1 Concept of genetic algorithm

A genetic algorithm is a stochastic search algorithm that mimics the process of natural

selection and genetics to try to solve complex [27] problems [28]. It is based upon genetic process

of living beings. It uses historical data to find new points of search for an optimal solution of the

problem, trying to improve the results and converge into the best or expected value [29]. The

research procedure is based on Darwinian theories of natural selection and survival. According to

these theories, populations in nature will evolve according to the principles of natural selection

and survival of the fittest [30].

The main feature of this algorithm is the efficiency exploiting historical information in order to

speculate on new search points with the aim of finding a more optimal point [29]. In addition,

Genetic Algorithm can successfully be applied to a vast variety of problems from different areas.

2.2.2.2 Elements and biological translation

In order to make easier and understandable the terminology used during the implementation

and explanation of the algorithm, some biological concepts and the parallelism that exists between

the biological and the evolutionary algorithms are related in the following sections.

2.2.2.2.1 Chromosome

All the organisms are composed of one or more cells. Every one contains at least one

chromosome (DNA strands) where the genetic information is encoded [31]. This structure carries

hereditary factors or genes. This is used by the genetic algorithm to encode and store the solution.

Hence, the chromosome is the set of parameters defining a proposed solution to the problems to

be solved.

2.2.2.2.2 Genes

One chromosome can be divided into genes and functional blocks of DNA. Therefore, genes

are the basic units in which a chromosome can be split [32]. A gene is a position or set of positions

in a chromosome. For that reason, it exists in a chromosome as many genes as its number of

variable slots. Each one of these genes encodes a particular feature of an individual. The possible

values of a feature that a gene can take from a fixed set of symbols are known as alleles [33]. Each

one of the positions that a gene can take into a chromosome is called locus.

2.2.2.2.3 Genotype and phenotype

Genotype is the complete set of genes contained in a genome, and thereby the set of inherited

factors in an individual that can be manifested in the individual or not. The genotype will give the

encoded solution of the problem to solve [32]. This information will be copied at the time of

reproduction and will be passed from one generation to the next. For this reason, the genotype only

can be determined by biological tests, not observations as the phenotype.

The phenotype is the set of parameters represented in a chromosome, in other words, it

contains the required information to create an individual (e.g. eye colour) [29]. The phenotype will

give a decoded solution for the given problem [32].

The adaptation of an individual to the problem depends on the evaluation of the genotype,

which can be inferred from the phenotype (chromosome) using the fitness function.

In order to clarify the concepts of genotype and phenotype, the following examples are

presented:

•••• Example 1: Given an optimisation problem on integers, a set of integers would form the

set of phenotypes. In this manner, one of these phenotypes, for example the phenotype 27,

would be the genotype 11011. As it can be seen in this example, the phenotype space and

genotype space are different because a genotype could evolve giving a phenotype that is

not in the set of integers selected. This is because the evolutionary search is done in the

genotype space. Thus, the optimum solution of a phenotype is obtained by decoding the

genotype [34].

•••• Example 2: Given a child with haemophilia (a group of hereditary disorders that impair

the body’s ability to control blood coagulation), it could occur that the parents did not suffer

this disorder in their health, but they carried the haemophilia genes in their body. Then,

these parents have the same phenotype but not the same genotype [35].

Genotype 1 1 0 1 1

Evolution of genotype

0 1 0 1 1

Phenotype 27

Figure 4: Example phenotype and genotype

2.2.2.2.4 Generation

The chromosomes evolve through iterations (called generations) during the progression of the

genetic algorithm. In each iteration, the chromosomes are evaluated using some fitness measures

and being exposed to the genetic operations of selection, crossover and mutation giving as a result

new chromosomes in each generation. Therefore, the best individuals tend to survive and

reproduce in this way propagating their genetic material to future generations.

2.2.2.2.5 Population

A population is a set of chromosomes (possible solutions) that remains constant during the

evolution search (generation). As a starting point for the genetic algorithm, an initial population

has to be created.

2.2.2.2.6 Summarize of GA vocabulary

All the vocabulary explained above is summarized in Table 2 based on the table given by [36,

p. 7].

Genetic Algorithm Explanation

Chromosome Solution

Genes Part of the solution

Locus Position of the gene

Allele Value of the gene

Phenotype Decoded solution

Genotype Encoded solution

Population Set of chromosomes

Generation Iterations of the GA

Table 2: Genetic Algorithm vocabulary [36, p. 7]

2.2.2.3 Genetic algorithm structure

The general procedure of a genetic algorithm is depicted in Figure 5. First, all parameters,

such as the length of the chromosome, population size, probability of crossover and mutation, etc.,

are set. Afterwards, the initial population is randomly generated. This initial population is

evaluated by the fitness function. Once this is performed, if this population does not achieve the

selected criteria, such as the number of iterations, time elapsed or the optimisation criteria, the best

chromosomes are selected for reproduction. Then, these chromosomes are subjected to the

crossover process generating offspring. Finally, this offspring goes to the mutation process giving

a new population for the next generation. This new population is evaluated again and all the

processes previously explained (all of them with the exception of the parameters setting) are

repeated until the criteria are accomplished.

Figure 5: Genetic algorithm structure [37]

2.2.2.4 Operations of the genetic algorithm

All genetic algorithms have common elements such as the creation of a population of

chromosomes, the selection depending on their adaptation, the crossover to produce a new

generation and finally the random mutation in the new generation.

2.2.2.4.1 Initial population

The first population is obtained in a random process where the chromosomes are created. The

number of individuals in the population is related to the required computational resources,

increasing these requirements as the population increases. The number of possible solutions and

the search space is greater, despite the resources needed are higher, and the speed of the algorithm

decreases. Thus, it exists there a trade-off between efficiency and effectiveness [33].

Depending on the problem to be solved, different methods to encode the solution can be used. For

example, binary encoding, value encoding, tree encoding, permutation encoding, etc. [38].

2.2.2.4.2 Fitness Function

The fitness function, also called evaluation function, tests the performance of each

chromosome - potential solution - by measuring how good they are related to the current problem

domain [39].

2.2.2.4.3 Selection

This operator selects chromosomes in the population for reproduction. This is done randomly

done but favouring those chromosomes that have a better fitness, and having the fittest more

possibilities to be selected to reproduce. This selection can be done using different techniques as

in [40], among others, the roulette-wheel, ranking based and tournament selection [41, pp. 17-18].

The methods proposed in [27], [42] and [43] is the roulette wheel selection.

The roulette wheel selection creates a roulette wheel where all chromosomes of the generation are

placed and in which each chromosome has a proportion of the roulette wheel according to its

fitness. Therefore, the chromosome with a better fitness will have more probability to be chosen

because it has a bigger area of this roulette wheel [44]. Having n chromosomes, the roulette will

be split into n portions. In order to select the chromosome, the roulette wheel spins n times,

selecting the portion (chromosome) where it stops each time it spins. The probability of selection

of an individual � is given by the equation 2.2 [44].

�� = ��∑ �� (2.2)

Where, �� is the given probability to select the chromosome j and �� is the fitness of the

chromosome j.

The problem of this selection method is that maintains the diversity chance but with the possibility

that might converge very quickly [37].

2.2.2.4.4 Reproduction

The reproduction is performed by exchanging the genetic material of two chromosomes using

the crossover operator. The mutation is made with the outcome resulting from this operation. The

result of these operations are two new chromosomes with the features of both parents and, if it is

the case, with the random mutation.

2.2.2.4.4.1 Crossover

The crossover operator is the most important in GA, as it allows the exchange of features from

one generation to the next and thereby the evolution of the species [32]. The main objective is to

get an improvement in the fitness of the offspring.

In the crossover, those chromosomes selected for the reproduction are paired up and crossed over.

This is performed to produce offspring with a certain given probability, with a maximum of 1,

meaning that the parents will not survive.

Within this crossover process, there are different techniques to implement it such as the one point

crossover, two point crossover, uniform crossover [45] and arithmetic crossover [46]. The one

point crossover is now explained and a description of the other techniques can be found in [47].

The one point crossover strategy choses randomly a point along the chromosome (a locus) and

exchanges the genes before and after that point of the two chromosomes to create the new

offspring. One example of this technique is shown in Figure 6.

1 1 1 1 1 1 0 0 0 1 0 1 0 1 0 1 0 1

1 1 1 0 1 0 1 0 1 1 0 1 1 1 1 0 0 0

Figure 6: One point crossover example

2.2.2.4.4.2 Mutation

Once the crossover process is finished, the mutation procedure is carried out his process is

performed in order to avoid finding all solutions in a population into a local optimum fitness value

[48]. Moreover, it is beneficial because it results in the possibility of searching in those areas still

untreated. The mutation process involves the modification of some random genes with a certain

probability of mutation [49]. This process is done by changing one gene to another value or

interchanging two values for two genes placed in different locus. After this process, the

chromosomes are evaluated again using the fitness function.

There is a trade-off in the choice of the mutation probability. This is because if this mutation

probability is too high, the search will be random, as it will not guarantee the survival of the fittest

chromosomes for the next generation. One example of this technique is shown in Figure 7.

1 2 3 4 5 6 7 8 9

1 2 5 4 3 6 7 8 9

Figure 7: Mutation example

The steps explained until now can be illustrated as the Figure 8.

Parents Offspring

Selection Crossover Mutation

Figure 8: From parents to offspring process

2.2.3 Time series and forecasting

In this section, time series and how are used with the genetic algorithm are explained. This

tries to help in understanding how the genetic algorithm is used for forecasting.

2.2.3.1 Time series

A time series is a sequence of data points, { !} in case of discrete time and {X(t) for

continuous-time case}, belonging to a system, measured in different instants of time and ordered

chronologically which can represent variables such as physical, financial, etc. The main objective

of the time series analysis done here is to study the behaviour to the time series in order to try to

predict the future evolution, up to a certain time horizon (also called prediction horizon). Time

series analysis can be divided into linear and nonlinear, and univariate and multivariate [50]. Linear

time series follow usually stable patterns in regular intervals that depend linearly on the time series’

past values in contrast with the nonlinear time series, which display a chaotic behaviour [27].

It is possible to characterize the complexity of a system observing one variable belonging to the

system as was demonstrated in the theorem proposed by Takens (1981) [51].

2.2.3.2 Phase space and attractor construction

The theorem proposed by Takens states that the statistical properties of an attractor are

conserved in the delay coordinates1, these coordinates are reconstructed with a temporal series of

one variable belonging to that system.

Thus, to construct the attractor by means of the delay coordinates proposed by Packard et al. [52]

the time delay and the embedding dimension are needed. One example of this can be seen in Figure

9 with the construction of the Lorenz attractor by means of Takens’ theorem.

Figure 9: Example of Lorenz chaotic attractor [53]

1 In the vector space " = "�, "$, "%, … , "� , " is called point or vector and "�, "$, "%, … , "� are called the coordinates

of " where n is natural number and is called the dimension of the space.

Having the following scalar time series '�, '$, '%, … , '(, obtained from observations in constant

intervals, is possible to reconstruct a vector with embedding dimension m, into an m-dimensional

space, [54] [55] [56] by means of the method of delays as follows:

)*+,- = .'* , '*/0 , … , '*/+12�-03, 4* ϵ ℝ1 7 = 1,2, … , : − +, − 1-<

Where, )* is the reconstructed vector with the embedding dimension m, '* is the observed discrete

value at time n and < is the time delay or embedding time. Thus, the m coordinates of each '* are

samples from the time series separated by a fixed <. The result of that are a series of vectors as

follows:

) = )�, )$, … , )(2+12�-0 (2.4)

Where, N is the length of the original series. The purpose is to form the attractor that preserves the

topology properties of the original unknown attractor. Thus, the idea of such reconstruction is to

capture the original system states in each observation of the system output.

An example applied to evolution of a stock market can be seen the following:

Date Price

1/1/2014 10 000 2/1/2014 10 005

3/1/2014 10 008

4/1/2014 10 003 5/1/2014 10 006

Then, for d=1 and m=4, the reconstructed vectors are defined as:

)� = {10 003, 10 008, 10 005, 10 000}

)$ = {10 006, 10 003, 10 008, 10 005}

Table 3: Example of reconstructed vector

10000100011000210003100041000510006100071000810009

Stock Market

Therefore, two parameters have to be computed. These parameters are the time delay < and

embedding dimension m.

2.2.3.2.1 Time delay

The method used to calculate the delay is important in order to reconstruct the attractor from

scalar time series and once reconstructed, to be able to estimate the correlation dimension to

evaluate if the scalar time series is chaotic or stochastic.

There is a trade-off in the election of < because it has to be large enough to obtain the largest

amount of new information between 4� and 4�/0 and at the same time has to be small to not be

independent [57].

For the time delay <, different methods and techniques were set and implemented to calculate the

time delay and embedding dimension as the proposed in [55], [56] and [58]. The method used in

this thesis will be the series correlation approaches of autocorrelation and average mutual

information.

These methods are not explained in detail in this thesis but further information can be found in

[59] and [60]. These methods are used from Nonlinear Time Series Analysis (TISEAN) [61].

The calculation of the delay time is based on the average mutual information and it is determined

finding the position of the first minimum represented graphically [62]. Even if, as it was used in

[27] this parameter is set to 1 and it is compensated increasing the embedding dimension, as is

demonstrated also in [63, p. 31].

2.2.3.2.2 Embedding dimension

The other parameter to be found is the embedding dimension, which is the space dimension

where the attractor is reconstructed. Different methods and techniques were set and implemented

to calculate the embedding dimension in [56], [64] and [65]. Kennel et al. [64] proposed a method

used to calculate the minimum embedding dimension called FNN (False Nearest Neighbours).

This method is the selected for the calculation of the m, which is used the tool TISEAN and further

information about this method can be found in [56], [59] and [64].

2.2.3.3 Forecasting

The univariate analysis is carried out with a single variable with the objective of finding the

dynamic dependence of { !}, i.e. on its past value { !2�, !2�, … , !2(} [66]. On the other hand,

with the multivariate time analysis, more than one variable is studied and observed at a time [67].

The forecasting of the values is done using time series analysis. The work of Takens (1981) and

Casdagli (1989) [68] and others established the methodology for the creation of a dynamic model

from a chaotic time series [42]. According to the Takens theorem, nonlinear chaotic dynamic

systems can be reconstructed from a sequence of observations [27]. This theorem states that giving

a deterministic time series, there exist a function F(·) such that verifies the equation 2.5 [42].

! = �+ !20 , !2$0 , … , !210- (2.5)

Where, < is the delay factor and m is the embedded dimension. Therefore, the theorem guarantees

the forecasting of future values considering only its past values. The difficulty comes when trying

find a function F(·) where the genetic algorithm will be implemented and used in order to find a

good approximation of this desired function. The forecasts are carried out by deterministic models

directly built from observations of the system evolution [43].

2.2.3.4 Forecasting methods

Having an univariate time series ['!]!��? representing the observations, it is possible to predict

the next n points of this series, i.e. '?/�, '?/$, … , '?/�, with only some of the previous samples.

The prediction could be performed with different methods, such as the prediction of just one point,

'!/� – 1-step ahead [69]– , prediction of several points at one time -direct strategy - , and the

iterative prediction which is used the '!/� as the input for the next prediction until '!/�.

The method used during this thesis is the direct multistep-ahead prediction of several points, also

known as independent value prediction in [70] or direct strategy in [71]. For the forecasting of the

next samples it is applied the Takens theorem, trying to find a function that connects the previous

samples to find a pattern in that past values aiming at using it to predict the next ones.

'!/� = �+'?, '?2�, … , '?2+12�-- (2.6)

In order to forecast ['!]!/�?/� from a time series ['!]!��? , a training set is created from the time series

using a shifted window of length +, − 1- ∗ < [70],where m is the embedding dimension, < the

time delay and n the number of samples to predict. Therefore, the prediction horizon will be always

fixed when the training set is chosen and vice versa. The training set is necessary in order to find

a nonlinear function of the data set.

On the other hand, the iteration prediction [71], also known as multi-stage prediction [70], consists

in taking the predicted sample as in input for the next forecast until the n prediction. In this manner,

the first predicted sample is used along with the past samples to predict the next one. Hence, the

samples used to predict are shifted one time unit, adding the new sample predicted in each iteration.

This is described mathematically by the expressions in equations 2.7, 2.8 and 2.9 [71].

'?/� = �+'? , '?2�, '?2$ … , '?2+12�-- (2.7)

'?/$ = �+'?/�, '? , '?2� … , '?2+12$-- (2.8)

'?/% = �+'?/$, '?/�, '? … , '?2+12%-- (2.9)

Where from the previous equations could be written the general equation 2.10 described also in

'?/� = �+'?/+�2�-, '?/+�2$-, '?/+�2%- … , '?2+12�-- (2.10)

The main problem of this method is that the error is summing up in each iteration due to the

predicted sample that is being included in every iteration and the inherited error is added in the

next prediction [70]. In contrast to the iteration prediction, in the associated squared multistep-

ahead, error is minimised with the direct prediction. Despite the direct strategy implies more

computational resources [71] as more n samples are attempted to be predicted. This is because the

larger the n the larger the training data needed in order to obtain a good predicted model. This is

due to the considerable absence of samples between T and n. Nonetheless, a better function could

be found using the latter strategy. For this reason, and also following the work done by [42], [27]

and [43] the direct multistep-ahead prediction is used.

2.2.4 Network Simulator NS2

The network simulator NS2, as it is defined in [25, p. Preface 7], is an “open-source event-

driven simulator designed specifically for research in computer communications networks”. The

first version of NS, called NS-1 [72] started in 1995 and successive versions of the simulator were

released. Since 1995, the Defence Advanced Research Projects Agency of United States (DARPA)

supported NS and in 1996-1997, the first version of NS2 was released. The latest version of NS2

is the 2.35, released on the late 2011. A newer version of the whole simulator, called NS3 [73], is

available since 2008. Scripting compatibility between NS2 and NS3 is not possible but models

were imported. Nevertheless, NS2 will continue to be active and maintained by the NS developers,

so its use in research is still valid. NS2 offers [8] a good platform focused on the simulation of

TCP, UDP, routing and multi-cast protocols over wired and wireless networks.

2.2.4.1 Architecture of NS2

The simulator is invoked by means of the command ns - in the Linux terminal – followed by

the Tcl file name as input argument. The Tcl is a script with the set-up of a simulation. As a result

of this, the usual output is a simulation trace file containing all the information about the

transmission performed in the medium.

Figure 10: Binding between C++ and OTcl

NS2 is based on two languages [25, p. 37]: C++ and OTcl -Object Oriented Tool Command

Language-. C++ is used to implement all the models and internal mechanisms of the NS2 network

simulator while OTcl is used as a user interface to control and set the simulation. On the one hand,

C++ is a compiled language and therefore computationally efficient when executed, however, it is

slow to be changed because requires re-compilation after any change done. On the other hand,

OTcl is an interpreted language and therefore does not need compilation, so changes are done very

fast. However OTcl it is slower than C++. These are the reasons why they are used for different

purposes inside NS2.

The C++ and the OTcl are linked using TclCL - Tcl with classes - [25, pp. 37-38]. Every bounded

C++ class has a corresponding OTcl class. Variables inside an OTcl object are called handles and

are mapped into C++ objects. These handles are strings – all start with the symbol “_” - in the

OTcl domain and does not have functionalities, it is only the interface with the user and other OTcl

objects.

2.2.4.2 PHY layer and MAC layer parameters

In order to correctly configuring NS2 to an 802.11g environment is necessary to understand

the specific parameters of the original NS2 configuration. The main parameters for 802.11g are

listed in the Table 26 and Table 27 inside APPENDIX B, where in the last column of the tables

there is a short explanation of each one.

2.2.4.3 NS2-CRAHN implementation

The Implementation of a cross-layer MAC and channel allocation scheme for Cognitive Radio

Ad Hoc Networks (CRAHNs) [5] published in 2009 [4] will be used as a reference for the

modification of NS2 and the explanation of the useful parts is presented here.

As it is described in [4], NS2-CRAHN is an extension of NS2 that is designed to support realistic

simulation of CRAHNs. NS2-CRAHN contains an accurate and flexible modelling of the activities

of Primary Users (PUs) and of the cognitive cycle implemented by each Cognitive Radio (CR)

user. In the study, NS2-CRAHN was used to analyse the impact of CRAHN features over the route

formation process and was used for the study of the TCP performance over CRAHNs considering

the impact of different factors on several TCP variants. NS2-CRAHN implements a multi-radio

transceiver system model, with several operating channels. Any of these radios can be turned to

any of the channels as some the interfaces are switchable. This is shown in Figure 11. In the project,

multi-radio will not be used, but is important to understand the operation in order to modify the

code properly.

Figure 11: NS2-CRAHN schema [4]

In CRAHNs, usually there are two types of users: Primary Users (PUs) and Secondary Users (SU).

PUs are users with preference to operate in the band. PUs operation will interfere secondary users

leading to the loss of all data. Secondary Users – CR in the CRAHN implementation- are

unlicensed users that are able to use the spectrum whenever it is idle. In this project will not be

used CRAHN but the spectrum management implemented in NS2-CRAHN for PU activities will

be used as a reference for the implementation of the desired scenario.

2.2.4.3.1 Modifications of the MAC layer

The NS2-CRAHN includes several modifications to the MAC implementation found in NS2

by default (file mac/mac-802.11.cc). All these modifications are done in order to include the

spectrum management handlers (enabling switching channel while required), the multi-radio

implementation and packet dropping while PU interference is found. The part including the packet

dropping while a PU is interfering will be adapted for the implementation of this project.

2.2.4.3.2 Primary Users Activity block

The PUs activity in this block of NS2-CRAHN is implemented. The PU activity model is in

a file called PUmodel and is described in [74]. The PU information is inside a file called PU-log

file. PUmodel read that file and saves it into a data structure. In the PU-log file, the first part is

composed of entries with the format in Figure 12.

Figure 12: PU-log file first part

Where the variables are the following:

The second part of the file describes the activity of each PU over time specifying the number of

entries for PU, when they enter and leave the channel. It is important to clarify that the pattern of

the PU activity is not defined, only intervals of time when the PU can be active.

The PU activity follows an alternative exponential ON-OFF distribution model. These alternative

states are: ON when the PU is using the channel – the channel is busy- and OFF when the PU is

not using the channel -the channel is idle and is available for Secondary Users-. According to [4],

the ON-OFF switching is regulated by following a birth-death Markovian process [75].

Figure 13: ON-OFF model

Figure 14: Example of ON-OFF model distribution in time

channel_ID, x,y, xrec, yrec, alpha, beta, tx_range

-channel_ID: channel used by the PU

([1…MAX_CHANNELS[)

- x, y: position of the PU transmitter

- xrec, yrec: position of the PU receiver

-alpha, beta: parameters for the ON/OFF exponential

distribution

In a birth-death Markovian model distribution with two states (ON and OFF). This model is

represented in Figure 13 and Figure 14 where beta is the death rate for PU activity. The duration

of ON state follows an exponential distribution with mean 1/beta. Therefore, if alpha is the birth

rate for a PU activity, then the duration of OFF state follows an exponential distribution with mean

1/alpha.

In order to program this is an efficient way, in the implementation, alpha and beta are time values

instead of probabilities, and the time ON /OFF is calculated in a different way: �@AA*B@C , equation

2.11 gives the time for the next birth( or ON period) and �DEF@A!GAE. and equation 2.12 gives the

time that the state will be alive (duration of ON state). The Figure 15 depicts this distribution in

�@AA*B@C = HIJKH ∗ +; ILM+N-- (2.11)

�DEF@A!GAE � OPQH ∗ +; ILM+,-- (2.10)

(2.12)

Where N and , are uniform random real numbers in between the interval 0 and 1.

Figure 15: ON-OFF distribution programmed in NS2-CRAHN

2.2.4.3.3 Spectrum Manager

In the NS2-CRAHN implementation [4], the spectrum manager bock is in charge of

implementing the cognitive cycle that is sensing the medium and changing the transmission

channel to an idle one if a PU is active. The NS2-CRAHN spectrum manager is composed by three

blocks: the Spectrum Sensing Block, the Spectrum Decision Block and the Spectrum Mobility

Block.

The Spectrum Decision block is in charge of, in the case of PU activity detection, decide the actions

to be done: leave the channel to a new available channel one or stay on the channel. The Spectrum

Mobility block is in charge of the operation of the spectrum handoff [76] process to a new channel

and the spectrum sensing block is in charge of detecting the PUs activity in a given channel

exchanging information with the PU Activity block.

2.3 Related work

In this section, some related work that is useful to compare or as a reference for this thesis

work is analysed.

2.3.1 Traffic modelling and forecasting using genetic algorithms for next-

generation cognitive radio applications

Several time series from stock prices [77], traffic volume [78], electrical power [79] or

highway traffic [80] are involved to predict a sequence of future values using the historical values.

The difficulty of the prediction of these time series stems in its chaotic behaviour where they do

not follow a tendency or periodicity and conventional predictive models such as regression

analysis have their limits (in accuracy). In numerous articles such as [42], [27] or [43], a Genetic

Algorithm (GA) for forecasting time series of different nature is presented. The mentioned articles

concluded that the GA could be applied for highly dynamic systems where values are connected

to their long history.

In [27] a genetic algorithm for the next-generation cognitive radio applications is used. The

mentioned article highlights the importance of determining the availability of spare spectrum. To

do so, to estimate the future demand of the network is necessary.

The genetic algorithm is a stochastic search technique that could be used for short-term predictions

from chaotic nature of nonlinear and deterministic dynamic system with more accuracy than linear

stochastic models [42]. One of the advantages of this method is that it only needs some

observations of the system evolution to be able to carry out this forecasting. With this algorithm,

a function (equation) by means of only the limited data provided is obtained. In that function, the

connection of the past values with the present and future values can be examined. Another benefit

of this algorithm is the possibility of solving problems with multiple solutions.

The main difficulty in using this algorithm is to find the optimum set up parameters for the time

series analysed as the population size, probabilities of applying a certain operator, the evaluation

function, etc. Moreover, different methods inside the algorithm could be applied in each genetic

operation such as the kind of encoding, selection, mutation or crossover. Finding the optimum

becomes complicated with these methods. Several research approaches in genetic operations are

proposed by several authors as [44], [81], [82], [83] and [37].

2.3.2 End-to-end Protocols for Cognitive Radio Ad Hoc Networks: An

Evaluation Study

In [4] is written that most of the research in CRANH is focused on devising spectrum sensing

and sharing algorithms at the link-layer, so that CR devices can operate without interfering with

the transmissions of PUs. However, it is also important to consider the impact of such schemes on

the higher layers of the protocol stack, in order to provide efficient end-to-end data delivery.

Routing and transport layer protocols constitute an important study area that has not been

investigated in detail over CRAHNs.

In the study, an implementation of CRAHN in NS2 with multiple channel and multi-hop is used.

The study includes a part that analyses TCP performance with PU activity in the network using

the NS2 simulator.

Despite is not exactly the same scenario than the studied in this thesis. The TCP throughput for

different patterns of ON-OFF birth-death PU activity is analysed. This analysis is similar to the

one carried out in this thesis and therefore is related. As a result, the study shows a 3D graph

(Figure 16) with the TCP throughput of a set of alpha-beta combinations.

Figure 16: TCP throughput for different alpha beta combinations for CRAHN [4]

The drawback of the study is that, apart from scenario differences to the proposed in this thesis,

the analysis is done for low data rate – maximum TCP throughput achieved equal to 400Kbit/s-.

In addition, randomness in the ON-OFF patterns is not taken into account and may be an important

fact in the study.

2.4 Summary

The IEEE 802.11g is a standard for WLAN in the 2.4 GHz band. The MAC layer protocol

mainly determines the maximum throughput achieved, that is not equal to the maximum WLAN

data rate, and for a 54Mbit maximum WLAN data rate is 23 Mbps. TCP is a reliable connection-

oriented delivery service. The throughput at TCP layer relies on the Congestion window and the

retransmission time of data packets.

NS2 is a simulator for communication networks that it is mainly programmed in C++ and Tcl and

all its code can be modified by the researchers. In fact, NS2 has to be configured specially for

802.11g. The spectrum management NS-CRAHN implementation for NS2 will be used as a

reference to implement the PU activity management.

The genetic algorithm section begins defining the algorithm as a stochastic search that is based

on Darwinian theories of natural selection and survival. Time series are used with the objective of

study its own behaviour when they are applied to forecasting. This study is done in order to try to

predict the future evolution of time series (up to a certain time horizon or also called prediction

horizon).

The theorem proposed by Takens states how to reconstruct a chaotic system using the delay

coordinates from a single variable belonging to the system. For the forecasting, the Takens theorem

is used in order to find a function that connects the previous samples. This is carried out to find a

pattern in that past values aiming at using it to predict the next ones.

Chapter 3

Design and implementation

In this chapter, the whole design and implementation of the scenario developed in this Thesis

is explained. This chapter presents all work carried out during the project.

3.1 Introduction

This chapter explains the most important points of the project implementation and presents a

general overview of the implementation. In the first part of the chapter, the different wireless

scenarios used in this thesis are detailed. The next part is dedicated to the implementation of the

simulation software in NS2. It includes all the modifications, configurations and new features done

particularly for this thesis implementation. An example of how the different scenarios can be

programmed in NS2 is also included. The last part deals with all the issues related to the genetic

algorithm and forecasting implementation done in this thesis, including contains all the parts that

conform the genetic algorithm.

36 CHAPTER 3. DESIGN AND IMPLEMENTATION

3.2 Design of the wireless scenario

In order to properly characterise the scope of the Thesis research, the wireless scenarios that

has been used when doing the tests and simulations are described in this section. As mentioned

before, the proposed scenario is a wireless network, complying with the standard IEEE 802.11g.

This standard is detailed in the section 2.2.1.1. NS2 is not configured by default to comply with

802.11g so it must be configured. There will be three different scenarios (named as Scenario 1,

Scenario 2 and Scenario 3) that will be described in the following points. In these scenarios, there

will be some the following elements:

• Node 0: This node communicates with Node 1 using TCP or UDP. This a secondary

user that can use the network whenever it is sensed as idle.

• Node 1: This node communicates with Node 0 using TCP or UDP. This is also a

secondary user with the same characteristics as Node 0.

secondary user with the same characteristics as Node 0 but playing back real traffic

from a file.

secondary user with the same characteristics as Node 0.

• Primary users: There can be one or more pairs of PUs that in any moment are allowed

to use the wireless medium. In the case that both secondary users and PUs are

transmitting, secondary users will lose the sent packets. PUs are always in pairs.

Hence, having one PU will imply 2 nodes.

3.2.1 Scenario 1: Primary users

In this scenario, depicted graphically in Figure 17, in the wireless medium, there is a PU

transmitting inside the reception range of the Secondary users. The PU transmission area (range

of 300 meters) and the nodes that will serve to measure the available bandwidth are inside this area

and therefore will be affected by the PU activity is shown in Figure 17.

CHAPTER 3. DESIGN AND IMPLEMENTATION 37

Figure 17: Map of Scenario 1

3.2.2 Scenario 2: Secondary users with real traffic inside the sensing area

In this scenario, there are 4 nodes, 2 nodes will exchange real TCP traffic played back – nodes

in red colour- and the other two –nodes in black colour- are the used to measure the available

bandwidth. The nodes are located in such a way that they sense each other and therefore they

should share de medium.

3.2.3 Scenario 3: Secondary users with real traffic outside the sensing area and

only one affected

In this scenario, there are 4 nodes, 2 nodes will exchange real TCP traffic played back (nodes

in red) and the other two (nodes in black) are the used to measure the available bandwidth. The

nodes are located in such a way that they cannot sense each other and therefore they will interfere

each other. A significant number of collisions and low available bandwidth is expected for this

scenario.

3.3 Simulation implementation in NS2

The most important actions that have been done in order to configure and implement the

simulation in NS2 version 2.35 are described in this section. Regarding the modifications and

programming done, the following points may summarise the relevant stages:

• Configuration of NS2 to comply with IEEE 802.11g WLAN standard

• Include the cognitive folder –inherited from NS2-CRAHN- into NS2.

• Modify the PU Spectrum management functions of NS2-CRHAN to fit the

requirements of this thesis.

• Configure and modify the 802.11 MAC implementation to fit the requirements of this

thesis. Install No Ad Hoc Routing Agent (NOAH)

• Modify ns-lib.tcl

• MAC Busy Time measurement

• Real traffic acquisition

• Scenario scripting programming

3.3.1 General simulation outline

This section present, as a brief overview, the most relevant stages in the simulation process in

NS2. The simulation begins when the Tcl script is executed in the Linux terminal, all the simulation

and scenario parameters are set and this will be described in 3.3.7. After that, a MAC layer object

is created in order to manage all the operations at MAC layer for a specific node. Immediately, a

spectrum manager handler is created for this MAC. Once this is finished, the PU model

information is loaded in the system and the PU activity pattern for the whole simulation time is

calculated. This calculation is optional because a previously calculated file can be used meanwhile

the PU busy time is computed. At this point, the simulation is performed and at the last moment,

the total MAC busy time is calculated. This simulation gives as output an NS2 trace file with the

information of the simulation that will be used to calculate some others metrics. All this is depicted

in Figure 20.

Set simulation with Tcl file

Create MAC layer handler

Create Spectrum Manager handler

Load PU model

Calculate the PU activity pattern

Calculate the PU busy time

Calculate total MAC busy time

Output to a NS2 trace file

Calculate other metrics

Save MAC busy time into a

Send a packet

Are there any PU active

now + tx time?Drop packet

Calculate MAC busy time

Figure 20: Simulation overview diagram

3.3.2 Configuration of NS2 for IEEE 802.11g

In order to comply with the IEEE 802.11g standard, NS2 has to be configured properly. The

characteristics of this standard are explained in detail in 2.2.1.1. This configuration is done

changing, in the file ns-default.tcl is located in the folder tcl/lib/, the following parameters:

• MAC parameters:

• Physical parameters:

In order to calculate the reception range, it is necessary to calculate Reception threshold

(RXThresh) and Carrier Sense Threshold (CSThresh) as desired. NS2 has a C++ program intended

for the calculation for the RXThresh and CSThresh. That program is called threshold.cc [84], it

can be compiled and modified adding the specific parameters of the physical layer. The program

threshold.cc has to be executed specifying a propagation model. In the thesis, the Two Ray Ground

Model [85] is used. The Two Ray Ground Model gives a more reliable result that the Free Space

Mac/802_11 set CWMin_ 15

Mac/802_11 set CWMax_ 1023

Mac/802_11 set SlotTime_ 0.000009

Mac/802_11 set CCATime_ 0.000003

Mac/802_11 set RxTxTurnaroundTime_ 0.000002

Mac/802_11 set SIFS_ 0.000016

Mac/802_11 set PreambleLength_ 96

Mac/802_11 set PLCPHeaderLength_ 40

Mac/802_11 set PLCPDataRate_ 6.0e6

Mac/802_11 set RTSThreshold_ 3000

Mac/802_11 set MaxPropagationDelay_ 0.0000005

Mac/802_11 set ShortRetryLimit_ 7

Mac/802_11 set LongRetryLimit_ 4

Mac/802_11 set basicRate_ 6Mb

Mac/802_11 set dataRate_ 54Mb

Model [86]. As output of threshold.cc, we will get the RXThresh or CSThresh value. The threshold

used is for a reception distance of 500 meters so more that this distance, no packets will be

received. The sensing distance is set to an equal value so traffic further than 500 meters will not

be sensed. The other physical parameters will affect to that distance so they have to be taken into

account.

It is also necessary to change in this file the UDP and TCP packet size to 1400 bytes. For further

explanation of each parameter, please refer to APPENDIX B.

3.3.3 MAC implementation in NS2

The implementation of the MAC in NS2 has been modified in order to perform all of the

desired functions, such as the PU activity management or the MAC busy time calculation. This

implementation is done in the files mac-802_11.cc and mac-802_11.h contained inside the /mac

folder. The creation of a PU management handler for every node, the loading of the PU model log

file, the calculation of the PU activity pattern, the calculation of the PU busy time, the calculation

of the total busy time and finally the PU activity management have been added. In order to load

the PU model log file or calculate the PU activity pattern, the modification of the file ns-lib.tcl is

required to be able to execute the order calling it from the Tcl scripts. That information has the

structure explained in 2.2.4.3.2.

Phy/WirelessPhy set bandwidth_ 54e6

Phy/WirelessPhy set freq_ 2.4e+9

Phy/WirelessPhy set Pt_ 3.3962527e-2

Phy/WirelessPhy set RXThresh_ 2.75096e-12

Phy/WirelessPhy set CSThresh_ 2.75096e-12

Phy/WirelessPhy set L_ 1.0

Phy/WirelessPhy set CPThresh_ 10.0

3.3.3.1 Implementation for Primary Users

This part explains the changes done to NS2 in order to include PUs activity in the wireless

network. The implementation is done in such a way that, that whenever a PU is transmitting and

reaching the secondary users location; the secondary users must drop the packets because in a real

environment, the packet will not be received properly due to the interference of the PUs. This

implementation is done by changing the way the MAC layer works in NS2. In order to understand

how the implementation is done, Figure 21 shows the flow of operations whenever a packet is

received at MAC level.

Regarding the PU activity, the following modifications have been done to the 802.11 mac

implementation in NS2. This implementation is done in the files mac-802_11.cc and mac-

802_11.h contained inside the /mac folder. The most important changes are carried out in the

function recv(Packet *p, Handler *h), where for every received packet is checked if a PU is

interfering. In contrast, if PU activity detected, that packet is dropped.

3.3.3.1.1 PU spectrum management

The Spectrum management that will allow NS2 to model the presence of Primary Users in the

wireless scenario is done in the functions Is_PU_interfering(), sense, Is_PU_active() and

check_active(). These structure and functions have been taken and modified from the NS2-

CRAHN implementation. However, the implementation of the last function -check_active()- is

completely new because of the different functioning of both implementations. In conclusion, only

the part that implements the PU pattern calculation algorithm is inherited from NS2-CRAHN.

The flow chart depicted in Figure 21 represents the steps each packet pass in the spectrum manager.

When a packet is received, the proper loading of the PU model file is checked, if it is not, this

means that there are no PUs. Therefore, the packet will not be dropped and a negative message is

sent to the receiver function in order to not to drop the packet. In case of the existence of PUs, the

next step is to verify for every PU, if the node that is receiving is inside the transmission area of

the PU, and if it is not, a negative message is sent to the receiver function in order to not to drop

the packet. If the reception node is inside the PU transmission area the next step it to check if the

PU is active during the period that the packet would be received. If the PU is active at the same

time that the packet is sent, that function will send an affirmative message to the receiver function

to drop the packet. In the opposite case, the function will tell the receiver not to drop the packet.

recv()

Is_PU_interfering()

sense()

Is_PU_active()

check_active()

recv()

Packet received

Check if there are PUs interfering for this packet

Is the PU model file load?

Is the node inside the PU tx

radius?

Are there any PU active

now + tx time?

Are there more PUs?

Drop the packet Not PU interfering – No drop

Figure 21: Primary Users flow chart and location

3.3.3.1.2 PU activity detection

This final decision of the dropping or not of the packet is done in check_active(), that is located

in PUmodel.cc. This function reads the file PUactivity.txt, which contains the information about

the PU activity pattern for all the simulation time. This file is generated in the beginning of the

simulation if it is specified or a previously-calculated file can be used. After reading the file, the

function searches for any superposition between the PU activity and the packet transmission. If

this superposition occurs, information to drop the packet is sent back to the MAC handler.

3.3.3.1.3 PU activity pattern and busy time

In the function get_PU_busy(), the PU activity pattern following a Birth-death Markovian

process is calculated. This calculation is described in 2.2.4.3.2. The result is an ON-OFF

distribution. This function, that is called only one time, calculates the PU activity pattern for all

the simulation time and stores them in a file. The PU busy time along all the simulation time is

also calculated and given it as return data. To include the call to this function in the ns simulator

in order to be able to call it from the Tcl script, it is necessary to bind it to the function

Mac802_11::command() including it in ns-lib.tcl.

The function by default always gives the same random pattern of ON-OFF for an alpha-beta

combination. In order to have an option with very random results, every time the function is called,

a change of seed (in a heuristic way for speed) of the random generator has been added.

In the cases where randomness has to be deleted, a small modification in the algorithm will set the

values of the time to arrive and departure -explained in 2.2.4.3.2- to alpha and beta respectively,

obtaining a fixed periodic pattern.

3.3.3.2 MAC busy time measurement

The busy time measurement of the MAC layer will be used to compare with the TCP and UDP

bandwidth. The MAC business has been measured adapting the implementation done in [87]. This

adaptation has been done manually due to compatibility issues (compatibility of patches between

different NS2 versions is not possible). The measurement is done in the implementation of the

802.11 mac layer in NS2, the implementation adds all the periods when the MAC is active, from

(Eq. 3.1)

the transmission of the packet to the reception of the MAC layer acknowledgement, including the

SIFS and the DIFS. The timing structure [88] without RTS/CTS is represented in Figure 22.

Figure 22: MAC busy time

In the NS2 MAC, after each MAC layer ACK is received, the sender backs-off (BO in Figure 22).

This back-off is not included in the MAC busy time. In the modifications done, a condition has

been added to take into account that the time when the PUs are active must not be added to the

MAC busy time as it will be added to the PU busy time. Finally, the PU activity busy time and

the MAC busy time will be added giving as a result the Total MAC busy time as it is shown in

Eq.3.1. Adding the MAC busy time of the sender, receiver and the PU busy time.

�R�STUVW?*1E ��XYZ_�\DE_] ^�XYZ_�\DE_� ^�_`V_a(

The functions that allows saving the time where the MAC start and end a busy period into a file

has been added to the basic implementation. Also �XYZ_�\DE_],�XYZ_�\DE_�and �_`V_a( are

stored in a file if it is called the line $ns_ at TIME "$node_(X) compute-mac-busy” in the Tcl

script. To include the busy time calculation in the ns simulator in order to be able to call it from

the Tcl script, it is necessary to bind it to the function Mac802_11::command() including it in ns-

lib.tcl. This is necessary in order to be able to compute the busy time whenever it is interesting.

3.3.4 No Ah-Hoc Routing Agent (NOAH)

As only two nodes will be used- and therefore there are no hops- only two nodes no routing

agent is required. According to [89] [90] NOAH is a wireless routing agent that only supports

direct communication between wireless nodes or between base stations and mobile nodes in case

of use mobile IP. This allows simulating scenarios where multi-hop wireless routing is not

required. NOAH does not send any routing related packets. In our case, this is interesting because

in case of PU activity, if the routing packets are dropped, the node will not find the route and all

the packets will be dropped in the transmitter node. NOAH is not available in NS2 by default and

the set-up must be done manually as described in [89].

3.3.5 Acquisition and playback of real wireless traffic

This section contains the procedure used to acquire wireless traffic. In order to playback real

traffic, the acquisition and adaptation of traffic traces is required. The final data has to be presented

in NS2 binary trace format in order to be played back in NS2.

3.3.5.1 Acquisition and conversion of traffic trace

The wireless traffic packets have been captured using the program Wireshark [91]. This

program allows live capture and offline analysis of traffic in any interface of the computer. It also

allows packet filtering. After the traffic is acquired, its traffic trace must be exported. The

exportation has been done in a Coma-Separated Values (CSV) format [92], it make possible the

conversion from Wiresharks’ traffic traces to NS2 binary trace files.

The conversion from CSV ASCII format to NS2 binary trace file is done with a Perl [93]

script, the script has been modified from the original [94]. This script takes the time stamp of each

packet and its size and creates a new binary file that can be used in NS2 as an input.

3.3.5.2 Playback of real traffic in NS2

As it has been mentioned, the playback of real traffic gives the opportunity of having a more

realistic behaviour of the network and therefore its use is recommended. In order to include the

playback of traffic, the following lines have to be added:

set tfile1 [new Tracefile] Create a new trace file

$tfile1 filename "my_file.bin" Assign a name

set trac0 [new Application/Traffic/Trace] Create a new traffic trace app.

$trac0 attach-tracefile $tfile1 Attach the application to trace file

The third line creates an Application object [95], which is an object located on top of the transport

layer and can either generate traffic – e.g. CBR- or simulate an application – e.g FTP.

Application/Traffic/Trace application is able to take a binary trace file and play it back. This traffic

will have the same timing that the original one if the network allows it, otherwise, the played back

traffic will be affected by the state of the network and will see modified its original shape.

3.3.6 Data analysis and graphic representation

In order to calculate different parameters such as throughput, packet loss, from the trace files or

other files with huge amount of data, Linux shell script [96] and AWK [97] languages have been

The results have been plotted using Gnuplot [98] and Xgraph [99]. The former is run using a script

and needs pre-processing the output traces and the latter has to be called during the simulation

execution in order to get output files to be plotted later.

3.3.7 Implementation of the wireless scenario in NS2

The implementation of the scenario is done in NS2 by programming a TCL script. First, the

different options for the scenario, such as the propagation model, the number of nodes or the

antenna type are defined. After the variables are set, the main program is set up, the variables are

initialised and the PU model file is loaded. A God (General Operations Director) is created,

according to [100] “a God is the object that is used to store global information about the state of

the environment, network or nodes that an omniscient observer would have, but that should not be

made known to any participant in the simulation”.

The next step is to configure the nodes that will conform the network. All the nodes will be equals.

The only thing that will vary from one to another is the position. Immediately after this, the mobile

nodes are created. There will be created as many nodes as specified when the variables were set at

the very beginning of the Tcl script.

At this point, the topology, the traffic specifications and the agents are created. There are different

possibilities, configuration for TCP and for UDP and there are explained in detail in APPENDIX

A. Both TCP and UDP agents can be used with real traffic played back, just adding as application

a trace player as it is described in 3.3.5.2. When the simulation is completed, it is called the

function that calculates the total MAC busy time and stores it in a file.

The last commands that have to be executed tell the simulator to end all the agents, end the

simulation and close NS2. After all is set, the only step remaining is telling ns to start the simulation

that will use all the above described. All these processes are explained in detail in APPENDIX A

3.4 Genetic algorithm implementation

In this section, the most important steps of the design and implementation of the genetic

algorithm are described. The algorithm is programmed in MATLAB 2013 and its configuration is

based on the approach given by [27].

3.4.1 General GA outline

For the design and implementation of the genetic algorithm and its use for forecasting, the

next steps are followed. A brief explanation can be found in section 2.2.2.4.

• Step 1: Encode the solution by means of the chromosome.

• Step 2: Definition of the fitness function to quantify the performance of each chromosome

for the problem to solve.

• Step 3: Generation of N random chromosomes.

• Step 4: Calculate the fitness by means of the fitness function for each one of the

chromosomes generated. Then the population is sorted by their fitness.

• Step 5: Selected the best K chromosomes for the next generation. This is called elitism.

• Step 6: The best chromosomes are introduced in a mating pool: : ∗ b7"P1@!*�c F\\C, where 0 ≤ b7"P1@!*�c F\\C ≤ 1.

• Step 7: Apply the selection criteria in the mating pool to select a pair of parents according

to their fitness, i.e. the probability to be chosen.

• Step 8: Creation of two offspring by means of the selected pair of parents applying the

crossover function.

• Step 9: Applying the mutation operator for the new offspring.

• Step 10: The elitism chromosomes and the offspring are introduced to the next generation.

The rest of the mating pool is introduced to the next generation or new chromosomes are

generated for the next generation with the same size of this mating pool.

• Step 11: Go to step 4, and repeated the process until the termination criteria.

• Step 12: the prediction with the best function is done.

After describing these steps, each of them and their implementation and design will be

detailed. All the steps described above in different sections are summarized in a general outline

depicted in Figure 23.

No Yes

Generate Initial population

Set up parameters

Evaluate the chromosomes fitness

Selection

Crossover

Mutation

Termination criteria

Elitism

New Population

Diversity

Generation of new

chromosomes?

Generate new populationSelecting least fit

individuals

Selecting the best chromosome

Prediction

Figure 23: GA general outline

3.4.2 Encoding of the chromosome

The encoding of the chromosome is important because it represents the solution of the

problem to be solved, that is, the genotype (encoded space). This is essential because the

chromosome will evolve by manipulating the genotype so a bad encoding could result in a bad

solution. This is because the evaluation and the selection of the chromosome are performed based

on the phenotype (solutions space) which is illustrated in Figure 24.

The encoding conforms to the rules provided by [27], where chromosomes represent equations

with reverse Polish notation [101]. These equations are randomly generated to build an equation

composed of arguments, operators and functions. These arguments are either real numbers chosen

from a finite set between [-Z, Z], or values from the time series ['!]!��? . Moreover, the operators

could be f+, −, × or ÷ i and the functions f‘ sin+n- ’, ‘ cos+n- ‘, ‘ exp+'- ’ log+'- ‘/’i. These chromosomes are generated following three basic rules:

• The two first elements of the chromosome must be arguments and the last one is an

operator.

• The number of arguments on the left must be greater than the number of operators at any

position of the chromosome.

• The number of arguments must be the same as the number of operators plus one.

Encoded

spaceSolutions space

decoding

coding

Figure 24: Encoded space and solutions space

The input values are scaled using a scale factor in order to prevent higher numbers and therefore

to reduce the probability of infinite values. Moreover, this play down the importance of chose a

correct finite values [-Z, Z] for each data input range.

Hence, from the data that is going to be analysed it is used the equation 3.1 in order to obtained

the factor scale to be applied.

uHvQL�_7NJwQ = max _zHIwP/,H'+{HQH*�FG!- (3.1)

Where, uHvQL�_7NJwQ is the factor to scale the data that is going to be used, max_value is the

maximum value allowed during the GA operation, and ,H'+{HQH*�FG!- is the maximum of the

data that is going to be used.

Once the factor scaled is obtained, this is used with the data to be analysed in order to obtain the

data to use during the GA operation. This is done by means of the equation 3.2.

{HQHGVED � {HQH*�FG! ∗ uHvQL�_7NJwQ (3.2)

Where, {HQH*�FG! is the data that is going to be used (samples), {HQHGVED is the data scaled that

is going to be used during the GA operation.

3.4.3 Definition of the fitness function

Regarding the performance of each chromosome, different criteria are used. These criteria

will be explained in this section. One of the common performance criteria in all the fitness

functions is the error between the predicted sample and the real sample – from the time series - in

the training set.

As it was explained in section 2.2.3.4, for forecasting ['!]!/�?/� from a time series ['!]!��? a training

set of the time series using a shifted window of length +, − 1- ∗ <– is created, where m is the

embedding dimension, < the time delay and n the number of samples to predict-. This is shown in

Figure 25, where the green line is the shifted window that will be slid as far as the sample 29, and

the black line is where the training set starts.

This training set is created with length n, and is equal to the horizon prediction. The first sample

of the training set, of the time series ['!]!��? , could take any sample contained in the shifted window

to form the chromosome. This is described along with the input in the next table for the same time

series example and τ=1 to simplify this example. The input is a vector that indicates all the range

of past samples that the chromosome could take to be created. This is also illustrated in Figure 19

along with Table 5 for a general expression where the samples of the training set and the arguments

that the chromosome could take as an input for that sample of the training set and for the output

(prediction) are shown.

Figure 25: Shifted window and training set

Samples of the

training set

Input Output Predicted

sample

'?2�/� ='?2�, '?2�2�, … , '?2�2+12�-> ='?21/�, '?21, … , '?> '?/�

Second

sample

'?2�/$ ='?2�/�, '?2�, … , '?2�2+12$-> ='?21/$, '?21/�, … , '?/�> '?/$

sample

'?2�/% ='?2�/$, '?2$2� , … , '?2�2+12%-> ='?21/%, '?21/|, … , '?/$> '?/%

Fourth

sample

'?2�/| ='?2�/%, '?2%2� , … , '?2�2+12|-> ='?21/|, '?21/}, … , '?/%> '?/|

sample

'? ='?2�, '?2$, … , '?21> ='?21/�, '?2~/�/�, … , '?/�2�> '?/�

Table 4: Example training set window

Each chromosome, or function, is used to predict the values of this training set in order to validate

the efficiency of the function and try to minimise the error. Thereafter it will be used to forecast

the next n samples (last of set output) and thereby obtaining the predicted samples.

Therefore, each chromosome (function) is evaluated in each point of the training set and it is added

up the squared error (��(- in order to avoid negative results.

This step can be described mathematically as follows:

!� � u�+'!20 , '!2$0, … , '!210- (3.3)

,< + 1 ≤ Q ≤ � 1 ≤ � ≤ : (3.4)

�� = � + !� − '!-$?

!�10/� (3.5)

, where !� is the predicted sample at time t, '! is the original sample from the time series, T is ,<

+n – large of the training set- and u� +- is the chromosome j that is evaluated in each sample of the

training set.

Moreover, other performance criteria are used in order to evaluate the chromosomes as proposed

in [102]. These criteria are the MSE (Quadratic Average Error), MAPE (Percentage Average

Error), NMSE (Normalized Mean Square Error), POCID (Prediction On Change in Direction) and

ARV (Average Relative Variance). The most widely used criterion to evaluate the performance is

the MSE, equation 3.6.

�� = 1� � + !� − '!-$?

!�10/� (3.6)

Where, P is the length of the training set. This method is not sufficiently robust because it does not

provide enough information about the forecasted model. The MAPE provides information about

the deviation of the model and it is calculated like in equation 3.7:

�S�� = 1� � � !� − '!'! �?

!�10/� (3.7)

Another criterion used is the MMSE or Theil’s U-statistic used by [103] and [104]. This criterion

is used by means of equation 3.8, which is proposed by [102].

:�� = ∑ . !� − '!3$?!�10/�∑ +'! − '!/�-$?!�10/� (3.8)

The POCID, equation 3.9 and 3.10, provides the percentage of the number of the correct direction

decisions, i.e. if the value of the time series is going up or down in the next time interval.

�R��{� � 100 ∗ ∑ {�?!�10/�� (3.9)

{� = �1, + !� − !2�� -+'! − '!2�- > 00, LQℎP��7bP (3.10)

The last measure is the ARV, which relates the performance of the model with the mean of the

time series and it is given by the equation 3.11.

S�� = ∑ . !� − '!3$?!�10/�∑ . !� − '̅3$?!�10/�

(3.11)

Where, '̅ is the mean of the real values in the training set.

The author of [102] proposes four fitness functions using the criteria detailed above. These are

the equation 3.13, 3.14, 3.15 and 3.16.

u7QNPbb� = �R��{1 + �� + �S�� + :�� + S�� (3.13)

u7QNPbb� = 11 + �� (3.14)

u7QNPbb� = �R��{1 + �� (3.15)

u7QNPbb� = �R��{1 + :�� (3.16)

These fitness functions will be used in order to calculate the performance of each chromosome.

The best chromosome is obtained with the highest value of the fitness functions. Therefore, the

highest value will be 100 for equations (3.13, 3.15 and 3.16) and 1 for equation (3.14).

The final fitness function is obtained by means of equation 3.18 where is multiplied by an

exponential with an argument that depends on the preferred numbers of time series arguments that

could appear in the chromosome.

u7QNPbb� = u7QNPbb� ∗ P2+|�2�|- (3.18)

Where, � is the preferred number of time series arguments and � is the actual size of the

chromosome. The result of the exponential is 1 when a chromosome has the same number of time

series arguments that is desired. Therefore, the maximum contribution is obtained.

As it can be seen, the equations proposed for the fitness are different from the author of [27]. This

is because, as it is shown in the equation (4) of [27], if the error (result of equation (3) of the same

author) is much higher than the variance, the result can be negative. The percentage of the total

variance of the training set is taken as fitness for the RWS. Therefore, it is impossible to calculate

the probability of being selected for that chromosome with the RWS. Authors as [42] and [43]

propose similar equations to calculate the fitness. Even if these equations have the same drawback,

these authors proposed the RWS selection method.

From [27] it could be deduced that a normalisation of the values is made in equation 4 before

subtracting 1. The problem of applying this normalisation is that proportionality disappears.

Therefore, the RWS does not work properly. For that reason, different fitness equations based in

different parameters are proposed. Nevertheless, the approach proposed by [27] (equation 3.18)

to avoid many time series arguments in the chromosome is taken into account.

3.4.4 Generation of N random chromosomes

A first initial generation with N random chromosomes is generated. These chromosomes are

generated by means of a function called chromosome_genr.m which randomly selects two

arguments, from the time series or numerical, for the two first locus of the chromosome. Then, the

next positions are generated randomly, selecting arguments, functions or operators until the last

position where a random argument is always selected. This function creates as many chromosomes

- with the preferred length- as indicated in the input. Moreover, in the input the vector is introduced

with the first time series corresponding to the first sample of the training set. This process is

summarised in Figure 26 that always ensures that the first rule in the encoding process is met.

Chromosome generator chromosome_genr.m

Number of

chromosomes equal

Random generator of values, operators and functions.

Selection of the first and second argument for the

chromosome

Length of the

chromosome -1

Random selection of argument, operator or function

Selection an

operator for

the last

position

N Chrosomes

Output

Figure 26: Chromosome generator function

The chromosome repair could be needed due to the fact that is generated randomly and therefore,

only the first constraint in the encoding process is fulfilled. On the other hand, as the other rules

could be wrong, it is necessary to verify if the second and third rules are satisfied. In case of not

satisfy the rules, besides the first, both rules will be required to repair the chromosome in order to

conform to the rules.

It may seem beforehand that this process would not be necessary if the chromosome were

generated conforming to the pre-set rules. This would be true if there were no crossover and

mutation process. Nevertheless, after these two processes it is necessary to verify again if the

chromosomes meet the rules and otherwise proceed to repair them. Therefore, the repair function

has two steps. The first step is to verify if the three rules are met and if not, the chromosome is

repaired.

3.4.5 Calculate the fitness by means of the fitness function for each one of the

chromosomes

Once the chromosomes have been generated and repaired, the calculation of the performance

of each chromosome is needed. Previous to the fitness calculation, the chromosome set needs to

be generated. This is because the actual population of chromosomes only contains arguments from

the first vector of the time series (corresponding to the first training set sample).

3.4.5.1 Generation of the chromosome set

In order to generate the chromosome set the complete population (N chromosomes) is taken.

Then, they are copied and added one position in those arguments that could appear in each

chromosome. This operation is done as many times as samples in the training set. One example of

this operation is illustrated in Table 5, for a time series ['!]!��? , a population of two chromosomes

of length 9 and a training set of ten samples.

Chromosome Genotype Training set sample

Chromosome 1: Chromosome 2:

4 '�� + 5 P'J - 5 vLb / '�} '�� 5 + / ILM 1 P'J ×

First sample

4 '$] + 5 P'J - 5 vLb / '�� '�� 5 + / ILM 1 + ×

Second sample

. . . Chromosome 1: Chromosome 2:

4 '$� + 5 P'J - 5 vLb / '$} '$� 5 + / ILM 1 P'J ×

Tenth sample

Table 5: Chromosome set for a two chromosome population and a training set of ten

This process is carried out in order to obtain the phenotype – the calculation of the chromosome-.

3.4.5.2 Calculation of the chromosome phenotype

The next step is to calculate the value of the chromosome, i.e. the phenotype. For this process

the function RPN.m is used, which transforms the postfix notation into infix notation. The

phenotype is calculated whilst it is converting from one notation to the other.

Hence, in this point we have, as is illustrated in Table 6, the following:

• The chromosome set

• The phenotype of each one of this set of chromosomes

Chromosome Genotype Phenotype Training set sample

4 '�� + 5 P'J - 5 vLb / '�} '�� 5 + / ILM 1 P'J ×

11,57 1,83

First sample

4 '$]+ 5 P'J- 5 vLb / '��'��5 + / ILM1 ^ ×

16.86 1.70

Second sample

. . . Chromosome 1: Chromosome 2:

4 '$�+ 5 P'J- 5 vLb / '$}'$�5 + / ILM1 P'J×

24.61 3.34

Tenth sample

��=2; �� ; �� =3.5; ��= 73; ��= 34 ��=5.7 ��=53.3 ��=10.6 Table 6: Genotype and phenotype of the chromosome set

3.4.5.3 Restrictions in the calculation

During the conversion (from postfix notation to infix notation) and calculation of the

expression some restrictions could appear. The main restrictions are done in order to avoid infinite

numbers, expressions, or due to indeterminate forms. Some examples of these are the division by

zero, the logarithm of a zero or negative number or any operation leading to an infinite number.

Other restrictions appear when the result is excessively high. Therefore, infinite values could reach

when it is computing the operations criteria in the evaluation process. To prevent this, the

maximum result allowed is |1P + 100|. 3.4.5.4 Fitness function

To calculate the performance of each chromosome for the complete set the function fitness.m is

called. Each function -chromosome- is evaluated in each training set sample. This evaluation

changes the time series argument in each training set sample for the corresponding one according

to the shifted window. This is done by means of the chromosome set, previously calculated,

selecting the phenotype of each chromosome of the complete set corresponding to each sample of

the training set. Then, as it has been said in section 0, for each chromosome is calculated the fitness

and finally this is multiplied by the factor length established obtaining finally the fitness for each

chromosome. The same function orders the results in ascending and descending lists and save them

keeping the original position as the chromosomes were introduced in the function. This original

position will help later to identify which phenotype correspond to the genotype in the chromosome

matrix - list of chromosomes –.All these steps previously explained for the fitness calculation are

illustrated in Figure 27.

Fitness Calculation fitness.m

Generation of the chromosome set hromosome_genset.m

Calculation of the phenotype RPN.m

Number of

calculations equal

Square error calculation

Performance calculation

YesPerformance ordered in

ascending and descending

Figure 27: Fitness calculation

3.4.6 Elitism process

To do not lose the best chromosomes during the crossover and mutation, the best

chromosomes of the population process are kept. This process guarantees that the best

chromosomes are going to survive and be present in the next generation [105]. This process takes

the K best chromosomes. The value K is obtained by the calculation of : × PI7Q7b,A@!E � ℕ₀,

where N is the population number and the elitism rate is the percentage of elitism that ranges from

0 to 1. Therefore, the best K chromosomes are obtained from the top positions in the descending

list created during the fitness calculation and are copied to the new generation. It is worth noting

that even though these chromosomes are copied to the next generation they are going to be present

in the next processes of selection, crossover, and mutation.

Fitness Calculation fitness.m

Elitism

(N× Elitism_rate) ∈ ₀

Insert to the next generation

Figure 28: Elitism process

3.4.7 Mating pool

The mating pool is formed by the chromosomes that experience changes in the processes of

selection, crossover and mutation. Thereby this mating pool has the potential parents for the

offspring creation [106, p. 4]. This is created to avoid the genetic algorithm to be blocked in a local

minimum, i.e., tries to add more diversity. Since every new generation, new chromosomes to the

current generation are created and introduced. Thereby the size of the mating pool for the parents’

selection is set by the equation 3.19.

�HQ7NM JLLI = +: − ¢- × ,HQ7NM b7"P � ℕ (3.19)

, where N is the population number, K is the number of elitism chromosomes and mating size is

the proportion of the mating pool that could range from 0 to 1. When the mating size is 1 means

that are not going to generate new chromosomes in each generation. Thus, the number of new

chromosomes that will be generated and introduced in the new generation is determined by the

equation 3.20.

:P� vℎ�L,LbL,Pb = : − +LuubJ�7NM + ¢- � ℕ (3.20)

On the other hand, another approach given by [107, pp. 49-74] to reduce the chances of local

minima consists in selecting randomly the lesser fit chromosomes. Therefore, the remaining

chromosome are selecting from the bottom positions from the descending list instead of generating

new chromosomes.

The implementation of this process is performed by selecting randomly the chromosomes from

the middle of the descending list until the end of it. The number of chromosomes to be selected is

calculated employing the same equation 3.20.

Elitism (K)

Mating_size ≠ 1

Insert to new generationMating pool

(N-K) × mating size

Generate new

chromosomes?

Generate new populationSelecting least fit individuals

Insert to new generation

Figure 29: Mating pool creation

3.4.8 Selection process

The selection process, also known as reproduction, consists in choosing randomly members

of the population for the mating pool as it has been explained in section 2.2.2.4.3. The selection

process chosen for the implementation are the roulette wheel, rank-based roulette wheel selection

(RRWS) and exponential selection. The roulette wheel presented in section is 2.2.2.4.3 initial

approach followed by [27]. In order to observe the impact in the selection method, these two

methods of selection are proposed also: RRWS and exponential selection.

The RRWS method is linear and therefore is proportional to the position it occupies in the entire

range. Therefore, with this method is lost the proportionality that on the contrary has the roulette

wheel selection, but it gives higher chances to the worst chromosomes to be chosen. On the

contrary, the exponential selection method improves the proportionality applying the exponential

method. In this exponential method, the best chromosomes are favoured and at the same time, the

least chromosomes have the chance to be chosen. All these effects and differences will be analysed

during this section as well as their equations to compute the probability and rank.

3.4.8.1 Rank-based roulette wheel selection

In the rank-based roulette wheel selection, a numerical ranking for each chromosome

according to their fitness is assigned. Therefore, the selection is done based on this ranking not in

their fitness value. In this manner, the chromosomes first are ordered according to their fitness and

then the probabilities are calculated depending to the rank assigned (where the worst case is the

first positions) [37].

This method may avoid premature converging, but at the same time converting it to a slow

convergence, because the best chromosomes do not differ substantially from the others [44].

Therefore, it could improve the diversity due to the very best fit will not dominance harming the

less fit.

The probability of the chromosome � can be calculated given by the equation 3.21 [44]

��HN£� � ��∑ �� (3.21)

Where, �� is the rank position of the chromosome � and N is the number of chromosomes.

This equation is the same than equation 22, but the ranking is used instead of the fitness and the

cumulative sum of the fitness.

In linear ranking selection, the probability to be chosen can be controlled by the selective pressure

[44] that is the pressure of competition to survive and have offspring. This means that increasing

the selective pressure SP, following the Darwinian theory of natural selection, would lead over

time to select those individuals (chromosomes), which have a better fitness and the extinction of

those ones with the lower fitness. Nevertheless, it must be taken into account that a big SP could

lead to a fast converge.

Before applying the scaled linearly raking is needed to sort the chromosomes by their fitness where

the fittest chromosomes will be in the first position and the least fit in the last position of this rank.

To calculate the rank for each chromosome in the linear ranking selection is calculated by the

equation 3.22 [44] [41].

�HN£+�Lb- � 2 ; �� ^ ¤2 ∗ +�� − 1- +�Lb − 1-+N − 1- ¥ (3.22)

2.0 ≥ �� ≥ 1.0

Where n is the number of chromosomes and �Lb the number of the position of the chromosome.

The effect of the SP on the probabilities is illustrated in Figure 30 where 100 chromosomes are

created. A fitness value from 1 to 100 respectively is assigned to these chromosomes in order to

analyse clearly the consequence of changing the selective pressure.

Figure 30: Effect of the SP on the probability

3.4.8.2 Exponential ranking wheel selection

The exponential ranking proposed by [108, p. 34] consists in use exponential weight to control

the probability to be chosen. The base is the parameter to control the exponential degree. The

lowest exponential behaviour is reached for the unity. This method permits higher selective

pressure than the RWS, favouring those ones that have a better fitness and the least fit

chromosomes.

Different authors propose several equations as [109], [108, p. 34] or [41, p. 10]. The used for this

implementation is the equation 3.23 proposed by [108, p. 34].

�HN£+�Lb- � �(2_\V (3.23) 0 < � < 1

The same scenario described previously in the ranking based selected is proposed for the

exponential ranking. The 100 chromosomes are ranked and the exponential weight is changed to

observe the effect on their probabilities. This is illustrated in Figure 31.

Figure 31: Effect of C on the probabilities

3.4.8.3 Example of effects on the selection method

The differences of this two methods can be analysed observing the Figure 32, where it is

depicted the probabilities to be chosen for 100 numbers, from 1 to 100, by the two methods and

different selective pressure and exponential weight. As can be seen, with the exponential ranking

the best chromosomes obtain higher probabilities to be chosen – to survive-rather than the worst

fitted chromosomes that the probability are close to zero. On the other hand, with the ranked based,

it higher probabilities to survive are given for the worst fitted chromosomes.

Figure 32: Ranking vs Exponential

Another scenario is proposed creating randomly 100 chromosomes and using the three different

methods: roulette wheel, ranked based and exponential ranking. Unlike the last scenario, the

chromosomes are created randomly and therefore the fitness and rank values do not increase

proportionality. Therefore, the probabilities depend on their fitness or rank, and two or more

chromosomes can have the same rank or fitness value. In Figure 33 can be analysed how the

ranking based assign higher probabilities to those chromosomes with worst fitness or rank. In

contrast, with the roulette wheel or exponential ranking the probabilities are close to zero o zero

for those chromosomes. Moreover, it can be seen that, with the ranking selection method there are

not high differences between the worst and the best chromosomes in terms of probabilities. For

example, between the chromosome 1 and chromosome 93, the probabilities are 0.18 % and 3.57

% respectively. On the other hand, between the exponential – 0.7 of C- and the roulette wheel there

are higher differences between the best fitted chromosomes as the 45, 47 and 93 which have higher

probabilities because these are proportional to their fitness value. In addition, with the exponential

ranking chromosomes as 50, 76, 96 obtain higher probabilities with this method rather than the

roulette wheel. This is because the probabilities are better distributed among the best

chromosomes.

Figure 33: Three selection method example

3.4.9 Crossover operator

The main operator working on the parents is the crossover operator that creates two offspring

combining the genotype of both parents with a certain probability of crossover. As it is explained

in section 2.2.2.4.4.1, a locus is randomly selected and the genes before and after that point are

interchanged to create two offspring.

The matting pool may have an odd number of parents and therefore to perform the crossover with

one parent will be impossible. Hence, in case of an odd number, the most frequent parent is found,

removed from the mating pool and introduced as an offspring. Thus, even if this chromosome did

not pass to the crossover operator, it will be present in the mating pool for the mutation operator.

3.4.10 Mutation operator

The mutation operator consists in random changes in the genes of the offspring, adding more

diversity to the search process and therefore allowing finding other solutions. This process is

performed interchanging two values for two genes placed in different locus. Thereby two locus of

the chromosome are randomly selected, it is implemented to get different locus and these locus are

interchanged.

The mutation is done with a certain probability called mutation probability and normally is low

(values of 0.01 [27] or 0.1 [42]) because otherwise the search would be done as a random process.

This is also demonstrated in [110, p. 21]

3.4.11 New population

In this process, the offspring created with the crossover process and mutation within the

elitism chromosomes are put together to form the new generation. Moreover, if mating pool size

is not 1 the new chromosomes are generated to be inserted in this new generation or on the

contrary, they are inserted those reserved chromosomes – as it is explained in section 3.4.7-.

Finally, it is necessary to calculate the fitness of this new population. If GA is still running – the

stop criterion is not activated – the last processes will be repeated again.

Offsprings Elitism chromosomes New chromosomes

New population

Figure 34: New population

3.4.12 Stopping criteria and error evaluation

Some criteria for the error measurement during the training set and after the prediction are

used. Some of these criteria are the MAPE (Mean absolute percentage error), MAE (Mean absolute

error) and MSE. The last two were explained in previous sections and now are used for the

prediction instead of the training set evaluation. These indicators are used following the criteria

proposed by several authors such as [111], [112] and [113].

�S� � 1� � | ! − '!|�

!�?/� (3.24)

Where N is the prediction horizon, ! is the predicted sample with the best chromosome, '! is

the original sample, T is the last sample of the training set and P is the number of samples

predicted.

�� = 1� � + ! − '!-$�

!�?/� (3.25)

�S�� = 100 %� � ª ! − '!'! ª�

!�?/� (3.26)

The MAPE is taken as a criterion to stop the algorithm if the set threshold along with the maximum

number of generations is reached. These two values are set before the algorithm it is executed.

3.4.13 Prediction

Once the GA finishes because the stop criteria is reached, the best chromosome for the last

generation is taken as the best function. The prediction is done shifting the window one by one

until is predicted the last sample of the predicted horizon. Before the first sample it is predicted, it

has the phenotype and genotype of the best chromosome for each sample of the training set. In

order to make the prediction, to shift and create the variables ['!]?/�?/�2� is needed sometimes to

evaluate the chromosome. The creation of the variables could be necessary if the best chromosome

have in its function a time series arguments. This is because the first sample predicted could

contain the last sample of the training set. Therefore, it would be necessary to calculate the

phenotype of the first sample that has already been predicted in order to be able to predict the

second, and so forth until the next-to-last sample.

3.4.14 Interface and parameters

The interface is the document where the main parameters and functions of the genetic

algorithm are set and can be changed later easily. The main parameters that could be changed are

expressed in Table 7.

Explanation Variables

Embedding dimension embed_dimen

Delay time delay_time

Number of chromosomes N

Length of the training set training_set

Number of generations generation

Tolerance of the error measured from MAPE quality_error

Elitism percentage Elitism

Mating pool size mating_pool

Mutation probability pm

Crossover probability pc

Number of preferred time series in the chromosome lengthZ

If the preference is wanted or not (set 1 to activate and 0 otherwise) pref

Generation of new chromosomes in each iteration div

Length of the chromosomes Long_eq

Selective pressure for ranking selection SP

Selective pressure for exponential selection x_fac

Maximum number for the numerical argument (Z) upperbound

Minimum number for the numerical argument (-Z) lowerbound

Probability of selecting a number in the chromosome generator,

instead of a time series argument.

Scale factor that is used for the input data factor_input

Maximum of the GA data that is going to be used (scaled data) max_value

Table 7: Main parameters

While the algorithm is running, different parameters and graphs are displayed to the user with the

main important data selected. This is important to mention, due to during the results and

conclusions some of these parameters and graphs with the corresponding results will be shown.

Hence, as can be noticed bellow is illustrated in Table 8 the main subset of graphs that is presented

all as a single one for the user. The Table 8 (a) shows the best chromosome function that it is

ongoing in that generation. That chromosome is evaluated for all the samples in the training set

and it is depicted as well as the original samples that are trying to predict. Therefore, the X-axis

represents the corresponding samples in the time series and the Y-axis is the value of the time

series scaled. The Table 8 (b) presents the SSE of the best chromosome in each generation. The

Table 8 (c) indicates the fitness sum of the population in each generation in order to be able to

observe if a better population is growing. Finally, in Table 8 (d) is shown the best fitness value of

the best chromosome in each generation.

(a) (b)

(c) (d)

Table 8: Set of graphs for the user

In addition, in each generation are displayed in MATLAB some important parameters as the

generation that is ongoing, the SSE value, the fitness value, the best chromosome genotype for the

first training set sample and its expression (Infix notation). Besides are shown some statistical

parameters as the MSE, MAE and MAPE. Theses statistical parameters correspond to the

calculation during the training set. Therefore, in order to differentiate these parameters from the

final ones, these are presented as the MSE_bp (MSE before prediction), MAE_bp (MAE before

prediction) and MAPE_bp (MAPE before prediction).

Generation 50 best_value_fit = 57,09119 best_value_SSE = 12.026125 The best chromosome is [9.878634814689685e+02] [3.998763627742979e+02] '*' 'exp' 'sin' 'A15' '-' MSE_bp =0.697213 MAE_bp =0.665095 MAPE_bp =5.09633 % Expression = '(sin((987.8635*399.8764))-A15)'

When the algorithm finishes and the best chromosome are selected to do the prediction is shown

a zoom of the training set result for this chromosome along with the original samples.

Figure 35: Best chromosome for the prediction

Finally, the prediction is performed and it is shown the results of the prediction (samples) along

with the training set as can be seen in Table 9 (a). In Table 9 (b), the expression and statistical

parameters of this prediction (MSE, MAE, and MAPE) are shown. As in Table 8 (a), the Table 9

(a) X axis are the corresponding samples in the time series but for the Y axis in this case

corresponds to the real value of the time series. It is noteworthy that the statistical results are from

the prediction samples along with the training set.

The statistical parameters shown at the end of the prediction correspond only to the calculation of

the prediction, excluding the training set. This is done in order to be able to compare subsequently

the statistical parameters of the training set and the prediction.

As MSE and MAE are not relative parameters and therefore depends on the scale that is being

used the prediction is performed in the same scale used in the training set.

Finally, an optional display could be shown the prediction in the real scale as it is shown in Table

9 (c).

(a) MSE =2.113402 MAE =1.378817 MAPE =6.173257 % Expression = '(sin((987.8635*399.8764))-A15)'

Table 9: Prediction and statistical results

3.5 Summary

In order to study the TCP available bandwidth, 4 different scenarios with different situations,

with and without PUs, will be tested. In order to do that, NS2 is configured with 802.11 g

parameters and the MAC implementation is modified, including the PU spectrum management, in

order to drop the packets if a PU is using the medium. The study does not use Ad-Hoc Routing

agent. Traces of real traffic can be played back in NS2. The scenarios are implemented by

programming a Tcl script.

The implementation of the GA is programmed in MATLAB 2013. For the implementation process,

different fitness functions are implemented to test them and find the better one for the data that it

is going to be analysed. Moreover, two diversity methods are applied to avoid the genetic algorithm

to be stuck in a local minimum and add more diversity.

Chapter 4 Evaluation

In this chapter, we evaluate out approaches using different scenarios. First, we perform several

simulations to evaluate the impact of different Primary User patterns on TCP throughput. Finally,

we evaluate how good our genetic algorithm can predict irregular time series like TCP throughput

samples.

4.1 Introduction

This chapter presents the most relevant results of the test carried out using the implementation

built in the project during the thesis. During the evaluation we have carried out a large number of

tests in different scenarios. First, the chapter presents the results and analysis of the available TCP

bandwidth and MAC busy time using different ON-OFF PU activity patterns. Later, an evaluation

of the performance of the forecasting of the genetic algorithm is performed. Finally, we analyse

how good the Genetic Algorithm can predict the available TCP bandwidth for different ON-OFF

PU activity patterns also using real traffic traces.

CHAPTER 4. EVALUATION 77

4.2 Evaluation of available TCP bandwidth for different ON/OFF

PU activity patterns.

This section tries to analyse the available throughput at TCP layer based on a given PU activity

pattern. The results are presented in some cases in comparison with UDP performance using the

same situations. The relation between TCP throughput and MAC layer busy time is also evaluated.

Three different scenarios have been used, which are detailed in section 3.2. The results presented

in this section are grouped into three main groups:

• TCP available bandwidth with fixed non-random patterns of PU activity

• TCP available bandwidth with random generated patterns of PU activity

• TCP available bandwidth using real traffic traces for Secondary Users

4.2.1 Deterministic patterns

In this set of tests, fixed patterns of PU activity have been used to study the response of the

TCP available throughput for different PU activity times. In order to do so, the multiplication by

the exponential random factor for the generation of the PU activity pattern has been disabled.

Although this test case rarely resembles reality, it is a good base to test the proper functioning of

the simulation and draw interesting conclusions. These conclusions clarify several situations that

may be very difficult to detect with randomly generated patterns. The available bandwidth shown

here is the average value of the whole simulation time. For all this tests, each simulation run has a

length of 100 seconds.

4.2.1.1 50 percent fixed ON-OFF rate

In this scenario, we assume that PUs are active 50% of the simulation time. We vary the time

that PUs are active by setting alpha equal to 2 times beta. As a result, the shape of the pattern is

periodic.

78 CHAPTER 4. EVALUATION

Figure 36: Available bandwidth for 50% PU ON

The available throughput for different alpha/beta values is shown in Figure 36. As can be seen, if

the retransmission time after a PU ON period matches with the OFF periods, the available

throughput is higher. For example, for beta from 0.7 to 1.4, the available bandwidth goes from

zero to zero again and the maximum is reached for beta = 1.3 (around 9Mbit/s). This 9Mbit/s is

close to a 50% of the maximum TCP throughput achievable, which is 20Mbit/s. This fact has been

documented in APPENDIX B with several figures that may help for a better understanding of the

situation.

The reason why the throughput is zero at some points is that, as the ON-OFF pattern is periodic,

whenever the retransmission coincides with the PU ON period it always coincides with ON PU

activity (even if this retransmission time doubles with every retransmission). Therefore, the

resultant throughput is zero as no packets are received. In this case, the throughput is 0 whenever

beta= i x 0.7,being i=2,4… because the sender has never the chance to transmit

In contrast, UDP has a very similar available bandwidth for all the tests (all alpha/beta

combinations). However, the throughput increases marginally with beta.

0,00E+000

2,00E+006

4,00E+006

6,00E+006

8,00E+006

1,00E+007

1,20E+007

1,40E+007

1,60E+007

0 , 1 0 , 3 0 , 5 0 , 7 0 , 9 1 , 1 1 , 3 1 , 5 1 , 7 1 , 9 2 , 1 2 , 3 2 , 5

UDP TCP

4.2.1.2 25 percent fixed ON rate

In this scenario, combinations where PU traffic is ON for 25% of the time are studied, resulting in

alpha equal to 4 times beta.

Figure 37: Available bandwidth for 25% PU ON

The available bandwidth for 25% of PU ON activity is depicted in Figure 37. The figure shows a

similar shape as the one in the previous section. The maximum available bandwidth for TCP is

close to 15Mbit/s, that is, a 75% of the UDP throughput can be achieved, the maximum possible

is 20Mbit/s.

4.2.1.3 Fixed alpha and different beta values

This point shows how the available TCP bandwidth evolves as beta changes. In this case, alpha is

fixed and beta varies. There are two cases, one with very small alpha and beta values (see Figure

38), and another with high values (see Figure 39).

5000000

10000000

15000000

20000000

25000000

0 , 1 0 , 2 0 , 3 0 , 4 0 , 5 0 , 6 0 , 7 0 , 8 0 , 9 1 1 , 1 1 , 2

TCP UDP

Figure 38: Available bandwidth for alpha equal to 0.0768 and different beta values

Figure 39: Available bandwidth for alpha equal to 2.8 and different beta values

Figure 38 and Figure 39 show that TCP has very low throughput compared to UDP when the PU

OFF periods are very short. This is because the TCP Congestion Window takes time to grow and

in contrast, UDP always sends at the maximum possible rate if the medium is idle. Additionally,

as can be seen in Figure 39 the retransmission time has an important effect on the throughput

making the available TCP bandwidth zero for ON-OFF patterns lower than 50% of OFF. The

reasons for this zero available bandwidth are the same as in 4.2.1.1. To sum up, idle time does not

mean that this time will be always useful for TCP.

0,00E+000

5,00E+006

1,00E+007

1,50E+007

2,00E+007

2,50E+007

3,00E+007

UDP TCP

0,00E+000

5,00E+006

1,00E+007

1,50E+007

2,00E+007

2,50E+007

3,00E+007

0 , 0 1 0 , 0 2 0 , 2 0 , 4 0 , 6 0 , 8 1 1 , 2 1 , 4 1 , 6 1 , 6 1 , 8 2 2 , 2 2 , 4 2 , 6 2 , 8

UDP TCP

4.2.1.4 Wide range of alpha and beta values

In order to have a broad view on of the behaviour of the available TCP throughput, a series of

tests with a very wide range of alpha and beta values have been carried out. Varying alpha and

beta from 0.1 to 5 using a step size of 0.1, we get a total of 2500 combinations and each

combination requires a simulation. Each simulation has a duration of 100 seconds.

4.2.1.4.1 Available bandwidth

The maximum achieved throughput for TCP is 20 Mbps while UDP achieves 28.5 Mbps. This

maximum available bandwidth is achieved when beta is very small and alpha is very large,

meaning an almost idle cahnnel. These values can be observed in Figure 40.

(a)TCP (b)UDP

(c)TCP and UDP (d)TCP and UDP in other perspective

Figure 40: 3D graphs with the available bandwidth for different alpha/beta combinations and non-random patterns

The graphs in Figure 40 show, how the TCP throughput is always lower than the UDP, as expected.

The response of TCP neither follows a smoothed shape as UDP does. For combinations where

alpha is close to beta, either TCP is not able to send or the TCP Congestion Window is not able to

grow to a large value even if there are idle periods. This makes the available bandwidth in this area

very small or zero. Also, the available bandwidth has abrupt changes in the alpha-beta plane. This

has to do with the retransmission time and the period of the pattern for the same reasons as

explained in 4.2.1.1.

4.2.1.4.2 MAC busy time

In Figure 41, we show the MAC busy time for different PU patterns comparing UDP and TCP.

The graphs (c) and (d) in Figure 41 are other perspectives of the same data. A MAC busy time of

zero corresponds to always idle and 1 to a 100% busy time. The highest MAC busy time (100%

of usage) is achieved when only PUs are using the medium. Looking at the zone where the medium

is shared between the PUs and SUs (that is, when alpha is greater that beta) the busy time is clearly

never the maximum. This is a consequence of the back-off time between transmissions and the

RTO. The more the available time for TCP, the more packets are sent and obviously, the more

back-off time periods will not be added to the MAC busy time.

For the same reasons as explained in 4.2.1.1, there are some idle periods of time that are not used

during all the simulation. This explains the triangular shape in the graph. The next section makes

a comparison between the available bandwidth and MAC busy time that helps to understand this

section.

(a)MAC layer busy time for TCP (b)MAC layer busy time for UDP

(c) TCP other perspective (d)UDP other perspective

(d)TCP and UDP

Figure 41: 3D graph with the MAC business for different alpha/beta combinations and non-random patterns

4.2.1.4.3 Available bandwidth vs MAC busy time

In this section, we made a comparison in order to find a relation between the available

throughput and the MAC busy time. The graphs in Figure 42 show the available throughput

normalised with the maximum achieved and the MAC busy time. In both cases, zero means no

utilisation and 1 means a 100% MAC layer utilisation.

(a) TCP (b) UDP

Figure 42: 3D graph with the MAC business and normalized throughput different alpha/beta combinations and non-random patterns

Comparing both MAC busy time and utilisation of the bandwidth, there are some alpha-beta

combinations with high percentage of idle time being unused. In Figure 42 (a), we can observe

that for high values of alpha and beta between three and two: the throughput remains constant as

beta decreases. In addition, the busy time reduces when beta reduces towards two. Here TCP

cannot utilize the MAC layer available time. The RTO is involved in the unused time because the

pattern is due to PU activity pattern reasons already explained in 4.2.1.1

4.2.2 Randomly generated patterns

This set of tests is carried out by adding randomness to the generation of the PU activity

patterns and changing the seed in every generated PU activity period. The available bandwidth

shown in this section here is the average of the whole simulation time. For all these tests, each

simulation test has a length of 100 seconds.

4.2.2.1 50 percent fixed ON-OFF rate

In this set of scenarios, we use alpha equal to two times beta resulting in 50% of PU active

time. These tests are performed only once. As we will see, is this scenario, the random generation

of PU activity has big impact. Comparing the results of non-random and random generated PU

activity patterns in Figure 36 and Figure 43, in the random case there is no clear relation between

specific alpha/beta values. This is due to the effect of randomness. The resulting available

bandwidth is very random for both UDP and TCP because the activity patterns are random. Also

in a microscopic scale, the ON-OFF pattern might look like there are no relation at all between

alpha/beta values and TCP throughput. This effect will be smaller if the simulation time is longer.

In TCP, some combinations of alpha and beta result in available bandwidth equal to zero. This

is due to random combinations that make the TCP retransmission attempts match with PU ON

activity for all the attempts to transmit. The TCP RTO is calculated using (among others) the RTO

back-off multiplicative factor. This factor doubles with every failed retransmission attempt. This

factor will grow very large if during the simulation, just by chance, several consecutive

retransmissions fail because they coincide with PU ON periods. The randomness of PU ON

activity is the main fact that makes the available bandwidth very unpredictable.

Figure 43: Available bandwidth for 50% ON and random generation

There is not a clear relation between alpha/beta values and the available throughput, but the trend

shows that the available bandwidth increases as alpha/beta values do. This is because the TCP

Congestion Window is able to grow to larger values because lager alpha/beta mean larger PU OFF

activity periods. Therefore, when TCP finally has the chance to transmit, it transmits longer,

increasing its throughput. As a conclusion, we can say that an a priori estimation of the available

bandwidth based only on alpha and beta values will not be reliable.

0,00E+000

5,00E+006

1,00E+007

1,50E+007

2,00E+007

2,50E+007

0 , 1 0 , 3 0 , 5 0 , 7 0 , 9 1 , 1 1 , 3 1 , 5 1 , 7 1 , 9 2 , 1 2 , 3 2 , 5

UDP TCP Linear (TCP)

4.2.2.2 25 percent fixed ON

In this scenario, we set alpha equal to 2 times beta, resulting in a 25% of PU active time. These

tests are done only once. The same comments as for the 50% are valid for this case. Figure 44

shows why alpha equal to four times beta results in higher throughput on average compared to

alpha equal to two times beta.

Figure 44: Available bandwidth for 25% ON and random generation

4.2.2.3 Fixed alpha and different beta values

In these tests, the alpha value is fixed to 2.8 and the beta value changes from 0.1 to 2.5. Each

test is run only once. As can be seen from Figure 45, as the available bandwidth decreases as beta

decreases. Comparing the results of non-random versus random generation of PU activity patterns

in Figure 39 and Figure 45, with random generation of patterns there are no specific patterns that

lead to zero TCP throughput. For low values of beta, the available bandwidth decreases even faster

than beta, up to a 40% of PU ON ratio. Around a value of 40% of PU ON ratio, this decreasing is

slower. The reasons are already explained in Section 4.2.2.4.1.

0,00E+000

5,00E+006

1,00E+007

1,50E+007

2,00E+007

2,50E+007

0 , 1 0 , 2 0 , 3 0 , 4 0 , 5 0 , 6 0 , 7 0 , 8 0 , 9 1 1 , 1 1 , 2

Figure 45: Available bandwidth for alpha equal to 2.8 and different beta values with random generation

Regarding the available bandwidth, for a 50% of PU ON pattern, only 5Mbit/s is achieved.

Comparing with the non-random case in Figure 39 at the same point, the difference is noticeable

as the available bandwidth was zero in that case. In this case, there is no zero throughput. This is

because as the PO ON activity is random and therefore, there are no cases where the retransmission

do not always match with PU ON activity periods.

4.2.2.4 Wide range of alpha and beta values

In order to have a wide view of the behaviour of the TCP available throughput, a series of

tests with a very wide range of alpha and beta values have been carried out. These tests are

simulated using values of alpha and beta from 0.1 to 5 with a step of 0.1, random generation of the

patterns, and 100 seconds of simulation length. A total of 2500 different combinations that will

require five tests each of them giving a total of 12500 tests carried out. The presented results are

the averages of these five tests for each combination. This is carried out to give more robustness

to the results.

4.2.2.4.1 Available bandwidth

The results of available TCP bandwidth are presented in a 3D graph in Figure 46. The

maximum achieved TCP throughput is close to 20 Mbit/s. This maximum available bandwidth is

achieved when beta is very small and alpha is very big.

0,00E+000

5,00E+006

1,00E+007

1,50E+007

2,00E+007

2,50E+007

3,00E+007

0 , 1 0 , 3 0 , 5 0 , 7 0 , 9 1 , 1 1 , 3 1 , 5 1 , 7 1 , 9 2 , 1 2 , 3 2 , 5

In this case, the fundamental effect of the random generation of the patterns is clear. When

comparing with the non-random case in Figure 40(a). The periodic zero throughput values that

appear in the non-random are not present in the random graph. This is because now the distribution

of the idle periods is random and therefore the retransmissions do not coincide always with PU

ON activity (not always in the same alpha-beta combinations like in the non-random case). It is

noteworthy that at a microscopic level, alpha equal to two times beta does not mean a 50% busy

time because of randomness. Also this is different for different repetitions.

Figure 46: 3D graphs with the available bandwidth for different alpha/beta combinations and random patterns

An additional study in comparison with UDP for different values, also including a study of the

standard deviation of the results, has been carried out in APPENDIX C. Regarding the standard

deviation, two areas can be clearly distinguished:

• First, there is an area with a high standard deviation where both alpha and beta are large.

This is because randomness plays a very important role in these patterns since the patterns

periods are relevant compared to the simulations length.

• The reminders, the big difference between alpha and beta makes the standard deviation

very close to zero as the results of the tests are very similar.

In Figure 46, the area with high standard deviation is the region with alpha and beta greater than

4.2.2.4.2 MAC busy time

Figure 47 show the MAC busy time for the same tests as in the previous section. Zero

corresponds to a busy time of zero and 1 to 100% busy time. A higher MAC busy time is achieved

when only PUs are using the medium (and with a 100% of usage).

Figure 47: 3D graph with the MAC business for different alpha/beta combinations and random patterns

Looking at the zone where the medium is shared between the PUs and SUs-, that is when alpha is

greater than beta, the busy time is never 1. The busy time tends to decrease as more idle time is

available for the secondary user. In this case, in contrast to the non-random one, it is not clear if

the lower MAC busy time is a consequence of the back-off time between transmissions because

randomness has a very important impact here.

4.2.2.4.3 Available bandwidth vs MAC busy time

The graph in Figure 48 shows the available throughput normalized by the maximum achieved

and the MAC busy time. In both cases zero means no utilization and one means 100% utilisation.

Figure 48: 3D graph with the MAC business and normalized throughput different alpha/beta combinations and random patterns

Comparing both MAC busy time and the utilisation of the bandwidth, there are some alpha-beta

combinations with a significant portion of idle time being unused. In this case, there is no triangular

shape like in the non-random case. This is due to randomness in the generation of the PU activity

patterns. Since the TCP available bandwidth grows, the MAC busy time decreases, but not in the

same proportion. As a conclusion, a relation between the MAC busy time and available TCP

bandwidth exists, and model to estimate the available bandwidth based on the MAC busy time

may be developed.

4.2.2.5 Comparison between TCP and UDP throughput

The transition between areas having high throughput is more abrupt in TCP because the TCP

Congestion window is set to 1 MMS with every timeout. In UDP the sending rate is constant to

the maximum if the medium is idle. This means that TCP needs long PU OFF periods in order to

achieve high throughput while UDP can use all the types of PU OFF periods in a similar manner.

This dependency on the Congestion Window and how TCP available bandwidth is affected by

long PU ON periods is analysed in 4.2.3. The effect of long PU ON periods that result in a large

RTO value and implies some periods where neither a PU is active nor TCP is sending data is not

present in UDP. This comparison is very easy to be detected looking how the available bandwidth

and PU activity evolves in Figure 49 and Figure 50.

Figure 49: alpha=1.5 and beta=0.5

Figure 50: alpha=1.5 and beta=0.5

Regarding the standard deviation, the values for UDP are lower than for TCP. This is also caused

by the TCP response after timeouts. If there are some long OFF periods inside the simulation time,

TCP will achieve much higher throughput while UDP is not affected from this phenomenon.

4.2.3 Available TCP throughput over time, Congestion window, RTO

multiplicative factor and Smoothed RTT analysis.

In Figure 49 we show the TCP throughput and PU activity over time for a single simulation

run using alpha equal to 1.5 and beta equal to 0.5. This test shows how the different parameters

behave when PU activity leads to packet loss in the secondary users’ side. As can be seen, the

throughput goes to zero when PU activity is detected. The absence of PU activity does not imply

that the throughput would be the maximum just after the PU OFF period starts. This is because

TCP is still backing off making it not possible to send although there is no PU activity (e.g. between

seconds 8 and 8.5) Taking into account what is explained in section 2.2.1.2.2, for large values of

beta like in this case- long PU ON periods-, this will imply several consecutive timeouts. As a

consequence of it, the RTO multiplicative factor grows to a large value. This makes the connection

to wait longer and longer between retransmissions (large RTO). This does not have a significant

effect when the PU ON period are short as it can be observed in APPENDIX C b for alpha 1.5 and

beta 0.1.

The graphs from where the conclusion has been drawn are in APPENDIX C b. The Congestion

window grows to a high value during long PU OFF periods and only to a relatively small value in

short PU OFF periods. This is the reason why the throughput is higher when there are long PU

OFF periods.

4.3 Evaluation of available TCP bandwidth with real traffic

Secondary Users

In this section, the TCP available throughput with the presence of other Secondary users in

the medium is analysed. We have first started a download using a web browser and we have

recorded with Wireshark the traffic trace of this download. After adapting the trace, this file is

played back in NS2 for different scenarios. This is explained in detail in 3.3.5. The traffic used has

the throughput over time depicted in Figure 51.

Figure 51: Throughput over time of the real traffic played back

4.3.1 Sender inside the sensing area

This test is done setting up the named Scenario 2, described in 3.2.2, where the Senders can

sense each other.

Average Available Bandwidth 9.81Mbit/s

Average throughput of the real traffic played back 9.52 Mbit/s

Average Total throughput in the medium at node 1 19.33 Mbit/s

Tot pack sent by node 0 36169

Number of collisions at node 1 5719

% collisions at node 1 15.81%

Table 10: Test results for Scenario 2

In the graph inside Table 10, the blue line is the TCP available bandwidth and the red line is the

throughput of the real traffic played back. In this case, the available bandwidth is shared and

therefore the real traffic played back is modified in order to use only 50% of the whole bandwidth

leaving the rest for the other nodes that are also using the medium.

4.3.2 Sender outside the sensing area

This test is carried out setting up the named Scenario 3, described in 3.2.3, where the

secondary user senders cannot sense each other and therefore it leads to collisions.

Average Available Bandwidth 0.64 Mbit/s

Average throughput of the real traffic played back 11.65 Mbit/s

Average Total throughput in the medium at node 1 12.29 Mbit/s

Tot pack sent by node 0 2734

Number of collisions at node 1 1721

% collisions at node 1 62.94%

Table 11: Test results for Scenario 3

As expected, this case leads to a large number of MAC layer collisions and therefore packet loss

that result in a very low bandwidth. The number of packets sent is very low because when a

collision occurs, MAC layer protocol backs-off the transmission for a period. In addition, the TCP

congestion window restarts to 1MMS when collisions lead to Timeout events. In the graph

contained in Table 11, we show the throughput over time for both the traffic played back (in red)

and the TCP competing traffic throughput (in blue). The traffic played back is not affected.

However, the competing traffic throughput is very low due to collisions.

This scenario is usually called as “the hidden node problem”. In order to help in solving this,

RTS/CTS handshaking was implemented together with the CSMA/CA scheme. However,

RTS/CTS is not a complete solution since in situations without hidden nodes, it may decrease the

throughput. In this thesis RTS/CTS is disabled.

In this scenario (hidden node), the available bandwidth is similar to the case of the WLAN with

PU activity as almost all the packet are lost due to interferences (collisions in this case) if the

Interferer SU is transmitting. As a result, the study of the PU activity impact on SU may also be

applied. However, the interferer node will behave similar to a PU. The situation is not exactly the

same, as the node that is being interfered by the hidden node will possibly sense the ongoing

transmission and vice versa. Therefore, this node adjusts its sending pattern due to the CSMA

carrier sensing and vice versa.

4.4 Genetic algorithm evaluation and tests

In this section, the GA is used in different scenarios and used to perform different evaluations

such as selection method, fitness functions and diversity methods. The number of tests that it is

possible to perform is extremely high (due to all the different GA parameters). Several studies can

be found in the literature that analyse the impact of these parameters (population number, crossover

probability, etc.) on the GA performance [114] and [115].

In this thesis, the values of parameters for the GA are taken from [27], [42] and [43]. A deeper

analysis of their impact on the performance of the GA is not in the scope of this thesis. However,

in some specific situation, we may change the settings in order to adapt to the time series proposed.

The effects of the fitness method, the diversity selection and the selection method are analysed in

several scenarios. First, a periodic and symmetric function (i.e. a cosine function) will be used in

order to test and verify the correct operation of the GA. Then, the available TCP throughput

obtained from NS2 simulation will be used. First, a non-random pattern will be considered in order

to observe the effects for a traffic that is periodical but not symmetrical. Then, the randomness in

the PU activity will be taken into account in order to analyse the performance of the prediction in

a more realistic situation. Finally, the performance of the GA will be tested considering the

available TCP throughput from real traffic.

Therefore, each proposed scenario is broken down into three parts: the evaluation of the fitness

function, the evaluation of the selection method and diversity, and an example of the prediction.

It is noteworthy to mention that for the evaluation, the mean absolute percentage error (MAPE)

has been used. This is because this is a relative measure and does not depend on the scale used in

the data input factor. Therefore, it is easier to do a final comparison and evaluation in all the

scenarios presented with this measure.

Table 12 relates the nomenclature used for the numeration of the fitness equations in this thesis

and the variables used during the GA, which in this section will be shown.

Numeration in the thesis GA variables

Equation 3.13 Eq1

Equation 3.14 Eq2

Equation 3.15 Eq3

Equation 3.16 Eq4

Table 12: Nomenclature of the fitness equations

In addition, Table 13 relates the name of the selection method with the variable used in the GA.

Selection method GA variables

Roulette wheel selection real_fit

Ranking roulette wheel selection ranking

Exponential ranking wheel selection exponent

Table 13: Nomenclature of the selection method

An overview of all the results (fitness functions and diversity methods) can be found in

APPENDIX E.

4.4.1 Periodic and symmetric function evaluation

In this scenario, a cosine function is used to evaluate the different methods proposed as the

fitness function, diversity and selection method. This function is proposed because of the

periodicity and symmetry (Y-axis), which is interesting in order to observe the GA behaviour using

the different methods proposed.

4.4.1.1 Evaluation of the fitness function

In this section, the different fitness functions implemented are evaluated in order to find that

one that better results gives for a cosine function.

It is noteworthy the difficulty to compare these fitness due to all the random process in the search

of the best solution. As it was explained, the first population is generated randomly. Then, during

the reproduction process, the chromosomes are selected, crossed over and mutated with certain

probability.

Therefore, two scenarios are proposed:

• In one scenario, the same initial population is used for the tests in order to reduce the

randomness in the initial population. In addition, no new chromosomes are generated

during the GA process.

• In the second scenario, random initial population is used with diversity (generation of

new chromosomes), as the default configuration of the GA.

4.4.1.1.1 Evaluation of the fitness function for same initial population

Even if the initial population is fixed, it is impossible to avoid randomness during reproduction

process. Therefore, the test has been run 100 times to obtain an average of all the results.

First, the results of these 100 tests are presented. Then, an example of one of these 100 tests is

shown in section 4.4.1.3. The four fitness equations shown in section 3.4.5.4 are evaluated. In this

scenario, the diversity is set to 0 in order to reduce the randomness introduced by the new

chromosomes (in each generation). In additional, the selection method used for this scenario is the

RWS (real_fit). The GA set-up for this scenario is presented in Table 14.

Parameters embed_dimen = 28 delay_time = 1 Long_eq = 7 training_set = 25 generation = 30 pn=0.75

upperbound= 1000 lowerbound = -10 Elitism =0.1 mating_pool = 0.8 pm=0.05 pc=0.7

max_value =100; factor_input= 100.068; quality_error = 0.05; lengthZ=1 pref=1

Table 14: GA set-up for the cosine function without diversity and same initial population

Figure 55 shows the function to forecast. The X-axis of this figure represents the samples

corresponding to the time series obtained from the cosine function, and the Y-axis represents the

values of this cosine function with the factor input applied.

The results obtained from these tests are shown in Figure 52. In this figure, the MAPE (Mean

Absolute Percentage Error) and MAPE_bp (Mean Absolute Percentage Error before prediction)

are displayed.

Better results are obtained in this scenario with equation 2 and equation 3. However, there is a

higher gap between the MAPE and MAPE_bp in equation 2. However, this gap is lower in equation

1 and equation 3.

Figure 52: Fitness selection evaluation for the same initial population with periodic and symmetric function

E Q 1 E Q 2 E Q 3 E Q 4

MAPE MAPE_bp

4.4.1.1.2 Evaluation of the fitness function for random initial population

In this scenario, a random initial population is generated in the beginning of the GA. This

scenario is proposed in order to observe the behaviour of the GA in the default mode with diversity

(set to 1) and a random initial population. The four fitness equations are evaluated for this scenario.

The GA set-up for this scenario is presented in Table 14.

The results obtained from these tests are shown in Figure 53. As can be observed, equation 3 results

in the lowest MAPE and MAPE_bp.

Figure 53: Fitness selection evaluation for random initial population with a periodic and symmetric function

It can be observed that equation 3 is the best equation in both scenarios, and on the contrary,

equation 4 is the worst. These worst results in equation 4 might be due to its dependency on the

Theil’s U-statistic, instead of on the MSE.

4.4.1.2 Evaluation of the selection method and diversity for the cosine function

The selection methods as well as the diversity effects are evaluated for the cosine function.

For this study, 100 repetitions are performed and the mean of all these repetitions are presented.

In this scenario, the same initial population, as in section 4.4.1.1.1, is used for the fitness

evaluation. In this scenario, the selection methods tested are the RSW (real_fit), exponential

ranking wheel selection (exponent) and the ranking roulette wheel selection (ranking). In addition,

EQ1 EQ2 EQ3 EQ4

MAPE MAPE_bp

the diversity is set to 0 in order to reduce the randomness introduced by the new chromosomes (in

each generation).

For these tests, the parameters used are exposed in Table 14, while Figure 54 displays the results

for the selected fitness equation 1 (Equation 3.13). Figure 54 shows the MAPE, as well as the

MAPE_bp. These values are shown as well as the number of loop to converge in order to evaluate

the faster selection method.

As expected, the MAPE is higher after the prediction, and therefore better results are obtained

during the training set (MAPE_bp). Moreover, the number of generations to converge is reduced

using diversity, with the exception of exponential method (even though this difference is small).

This may be due to the randomness introduced (with the diversity) was not feasible. These not

feasible chromosomes caused that the GA requires more generations to converge.

Figure 54: Results for the selection method with and without diversity for a periodic and symmetric function

Differences up to 80 % for the MAPE with the ranking selection, compared with and without

diversity, can be observed. Higher differences can be also discerned with real fit selection and

exponential selection applying diversity.

In conclusion, the diversity reduces the MAPE and MAPE_bp because of the introduction of new

chromosomes. Therefore, these new chromosomes may help improving the result finding a better

chromosome. Moreover, the number of loops to converge in general is reduced introducing new

chromosomes as a diversity in each generation. However, this method may lead sometimes to an

Eq 1 Real fit Eq1 ranking Eq1

exponential

Eq1 Real fit

Eq1 ranking

exponential

MAPE MAPE_bp Generations

increase in the number of generations needed to converge. This is because of these new

chromosomes introduced were not feasible (worse than the current population).

In general, the best results (MAPE and MAPE_bp) are obtained with the real fit and exponential

selection.

4.4.1.3 Example of prediction with symmetric and periodical function

In this section, an example of prediction for the periodical and symmetric function is explained.

Figure 55 shows the cosine function in order to identify the instants of time that corresponds each

sample.

Figure 55: Cosine function

In the next example, depicted in Table 15, the best solution in 30 generations is found in the first

reproduction process (generation 2).

The set-up used for the GA along with this example using a cosine function is shown in Table 15.

embed_dimen = 28; delay_time = 1; Long_eq = 7; Selection_m = ‘real_fit’; training_set = 25; generation = 30; fitness_select = 'Eq1';

pn=0.75; upperbound= 1000; lowebound = -10; Elitism =0.1; mating_pool = 0.8; pm=0.05;

pc=0.7; div=0; lengthZ=1; pref=1; max_value =100; factor_input= 100.068;

MSE_bp =0.559079

MAE_bp =0.666854

MAPE_bp =4.824 %

Expression = ((629.1349-629.1349)-A15)

SSE_bp = 13.976979

Fitness = 62.10981

MSE =1.530442 MAE =1.072374

MAPE =5.0982 % Table 15: Example 1 for a cosine function

Table 15 shows how the best solution is found in the second generation, which will be the final

solution for the prediction.

4.4.2 Non-random traffic with ON-OFF pattern evaluation

In this scenario, the available TCP throughput for a non-random ON-OFF traffic pattern of PU

activity, which is obtained from the NS2 simulation, is evaluated through the GA.

The different methods proposed as the fitness function, diversity and selection method are

evaluated. This scenario is interesting in order to observe the effects described before for a non-

random ON-OFF pattern with a 50% of activity.

The values of this non-random traffic are « � 2.2 and ¬ = 1.1. The available throughput in such

ON-OFF pattern to forecast is illustrated in Figure 56, where each sample corresponds to 0.1 s.

Figure 56: Throughput from non-random pattern with alpha 2.2 and beta 1.1

one giving the best results for the non-random traffic with the ON-OFF pattern.

As it was carried out in the previous analysis with the cosine function, two scenarios are proposed:

one with the same initial population without diversity (no generation of new chromosomes) and

the other one with a random initial population with diversity.

The results for both tests, same initial population and random initial population, are depicted in

Figure 57 and Figure 60 respectively.

4.4.2.1.1 Evaluation of the fitness function for non-random and same initial population

The selection methods as well as the diversity effects are evaluated for the proposed scenario with

the same initial population and without diversity. The RWS is used for the fitness equation

evaluation. In this scenario, the diversity is set to 0 in order to reduce the randomness introduced

by the new chromosomes (in each generation). In additional, the selection method used for this

scenario is the RWS (real_fit).

The tests are run 50 times using the following parameters exposed in Table 16.

Fixed parameters embed_dimen = 95; delay_time = 1; Long_eq = 7; training_set = 50; generation = 100; pn=0.75;

mating_pool = 0.8; pm=0.05; pc=0.7; lengthZ=1; pref=1;

max_value =10; factor_input= 4.789e-7; quality_error = 0.01 lowerbound = -10; Elitism =0.1; upperbound= 100;

Table 16: Parameters GA for non-random using same initial population

The results obtained in this scenario are depicted in Figure 57. In this figure, the best result in

terms of MAPE is obtained with the equation 1. On the contrary, the worst results are obtained

with the equation 2.

Figure 57: Fitness selection evaluation for the same initial population with Non-random ON-OFF pattern

It may astound how it is possible to reach a difference up to 239 % between equation 1 and 2, if

the GA is started with the same set-up and the same initial population. An explanation can be found

by analysing the equation 2 summarized in Table 17. The best and the worst cases of this equation

are exposed in Table 17.

Eq 1 Eq 2 Eq 3 Eq 4

MAPE MAPE_bp

The worst MAPE results increase the average and therefore this explains why is obtained these

differences between equation 1 and equation 2.

Same initial population

Fitness equation Best MAPE result Worst MAPE result

Eq2 1.2754 % 1752 %

Table 17: Best and worst result for equation 2 with the same initial population

Figure 58 illustrates an example of the worst and best result for equation 2. In this graph, the

training set and the prediction area for both chromosomes, as well as the original, can be observed.

Figure 58: Best and worst prediction equation 2 for the same initial population

This worst prediction result obtained with equation 2 could be compared along with another

chromosome with higher MSE but with lower MAPE. Therefore, this new chromosome with

higher MSE will have a lower fitness (calculating this with equation2). This example is shown in

Table 18 and graphically depicted in Figure 59. The complete set of tests for this equation 2 can

be found in APPENDIX D.

We may infer that the selected chromosome, when the only criterion used is the MSE, can be

selected one chromosome with higher MAPE but lower MSE. This is due to the fact that, since

errors are squared, those less than the unity become smaller and smaller.

As shown in the example Figure 59, in the area where the original function is close to zero, the

chromosome with the worst prediction is close to 1.4. When this difference is squared, it becomes

close to two. On the other hand, with the other chromosome depicted (with MAPE equal to 27.92),

in the area where the original function is close to 9 the prediction is close to 12. Thus, when this

difference is squared, it increases up to 9. Finally, the MSE is summed up and even though the

second chromosome has a lower MAPE, this has a higher MSE. This results in choosing the

chromosome with the worst prediction instead of the other.

Therefore, even when a chromosome has a lower MSE, this could have a higher MAPE.

Before the prediction After the prediction

fitness MAE_bp MSE_bp MAPE_bp MAE MSE MAPE

Prediction 1 0.27641 0.96686 2.6178 27.912 % 0.82423 2.1023 27.443 %

Prediction 2 0.35623 1.3072 1.8072 1650 % 1.2841 1.7253 1752 %

Table 18: Worst prediction and another chromosome for tests with equation 2

Figure 59: Example of Equation 2 comparing the MSE

Equations 2 and 3, which depend greatly on the MSE, arguably have this inconvenient. This is

emphasised in equation 2, which relies heavily on the MSE. The main advantage of equation 1 is

that is composed of a combination of different criteria and not only by the MSE.

4.4.2.1.2 Evaluation of the fitness function for non-random and random initial population

For the second scenario, the genetic algorithm is run 100 times for each fitness equation. Then, the

results are averaged and illustrated in Figure 60. This scenario is proposed in order to observe de

behaviour of the GA in the default mode with diversity (set to 1) and a random initial population.

The four fitness equations are evaluated for this scenario. The GA set-up for this scenario is

presented in Table 16.

It can be inferred, from the results exposed in Figure 60, that equation 1 is again the best fitness

equation giving the best results in terms of MAPE and MAPE_bp. It is noteworthy that in this

scenario the worst results, in terms of the average MAPE and MAPE_bp, are obtained with

equation 3. This may be because of the diversity and the randomness of the initial population could

lead to less favourable results. In addition, another reason might be that the number of generations

up to 50 could lead to do not converge the GA (because needs more generations).

Figure 60: Fitness selection evaluation for random initial population with non-random ON-OFF pattern

Despite the worsts average results obtained in the second scenario, the best result obtained with

each equation in both scenarios can be found in Table 19. The best results are obtained with the

random initial population, as in the other scenarios.

Table 19 shows a comparison between the random initial population and same initial population,

showing the best and the worst results in terms of MAPE for each equation.

Same initial population Random initial population

Eq 1 Eq 2 Eq 3 Eq 4

MAPE MAPE_bp

Fitness

equation

Best MAPE

results (%)

Worst MAPE

results (%)

Best MAPE

results (%)

Worst MAPE

results (%)

Eq1 7.6404 27.443 0.82204 2771.1

Eq2 1.2754 1752 0.76718 3836.5

Eq3 0.97366 1403 0.76724 5760.2

Eq4 5.7929 1212.9 0.76747 4678.6

Table 19: Best and worst results from random and same initial population

In addition, another interesting observation that can be shown consists in the presentation of

the same results that were shown before (MAPE and MAPE_bp) but suppressing those with MAPE

higher than 100 % are shown in Figure 61. This figure shows that equation 2 has better results

despite the fact that has a higher percentage of errors. This means that a higher number of GA

repetitions are needed to find a better solution. Another advantage of equation 2, as it can be seen

in this figure, is the lower number of generations that are needed to find a good solution (lower

than the 5% of quality error fixed in this scenario).

Figure 61: Random initial population for non-random alpha 2.2 and beta 1.1 with MAPE less than 100%

4.4.2.2 Evaluation of the selection method for the non-random traffic

The selection methods as well as the diversity effects are evaluated for the cosine function.

For this study, 50 repetitions are performed and the mean of all these repetitions are shown Figure

62. In this scenario, the same initial population, as in section 4.4.2.1.1, is used for the fitness

Eq 1 Eq 2 Eq 3 Eq 4

MAPE MAPE_bp Errors generations

evaluation. In this scenario, the selection methods tested are the RSW (real_fit), exponential

each generation).

Few differences exist between the three selections methods used with diversity. However, the

worst results are found with ranking selection method without diversity. It might be due to the fact

that raking method assigns more similar probabilities to all the chromosomes. Therefore, this

increases the number of generations needed to converge in a better solution. No one of the 50

repetition converges to the solution after 100 generations; that is, a solution with a MAPE less than

5% was not found in this case.

Figure 62: Results for the selection method with and without diversity for non-random ON-OFF pattern

4.4.2.3 Example of one of the best predictions

Table 20 shows a prediction of the available throughout for a non-random traffic with ON-

OFF pattern.

Table 20 (a) shows the training set along with the prediction. The GA statistics from the last

generation are exposed in (b). The statistics correspond to the same data but with scaled data (using

a range of [0-10]). The prediction is shown in (c) along with its statistics in (d). These statistical

data correspond to the same data scaled but using a range of [0-10], where can be seen a MAPE

lower than 1%.

Eq 1 real fit Eq1 exponent Eq 1 ranking Eq 1

exponent d=1

Eq 1 ranking

Eq 1 real fit

MAPE MAPE_bp

MSE_bp = 0.0517

MAE_bp = 0.1067

MAPE_bp = 1.1521%

SSE_bp = 2.5861

Fitness = 28.5608

Training set (a) Training set statistics (b)

MSE = 0.0260

MAE = 0.0722

MAPE = 0.7671%

SSE = 1.2995

Expression =

(cos(sin((49.5949*16.9764)))*A3

Prediction (c) Prediction statistics (d)

Table 20: Best prediction example for a non-random traffic with ON-OFF pattern

4.4.3 Random traffic with ON-OFF pattern evaluation

In this scenario, the available TCP throughput for a random ON-OFF traffic pattern of PU activity

obtained from the NS2 simulation is evaluated.

In this scenario, the effects of applying different fitness equations, diversity and selection method

in a random traffic with ON-OFF pattern are analysed. For this, « � 2.2 and ¬ = 0.08 are selected.

This data is obtained from the NS2 simulator and used in the GA in order to try to forecast the next

samples (instants of time). The interest of this scenario lies in the fact that the source is random,

and therefore it is a chance to test the ability of the GA to adapt to a chaotic patter.

Figure 63 shows the available throughput for the specified alpha and beta, where each sample is

taken every 0.1 s.

Figure 63: Throughput from a random pattern with alpha 2.2 and beta 0.08

In this section, the different fitness functions are evaluated in order to find the one that gives

the best results.

As it was carried out with the cosine function and the non-random ON-OFF pattern, two scenarios

are proposed: one with the same initial population without diversity (no generation of new

chromosomes) and the other one with a random initial population with diversity. The results of

both scenarios are depicted in Figure 65 and Figure 67, respectively.

4.4.3.1.1 Evaluation of the fitness function for random pattern and same initial population

This section analyses the impact of different fitness functions in the proposed scenario and

considering the same initial population. In this scenario, the diversity is set to 0 in order to reduce

the randomness introduced by the new chromosomes (in each generation). In additional, the

selection method used for this scenario is the RWS (real_fit).

The tests are run 50 times using the parameters exposed in Table 21.

mating_pool = 0.8; pm=0.05; pc=0.7; upperbound= 100; lowerbound = -10; Elitism =0.1;

lengthZ=1; pref=1; max_value =10; factor_input= 4.6598e-7;

Table 21: GA set-up for the random traffic alpha 2.2 and beta 0.08 with the same initial population

The averages throughput for this scenario is illustrated in Figure 65 and Figure 66.

If we compare Figure 65 and Figure 67, we notice that the MAPE is better than the MAPE_bp.

Figure 64 shows a drop in the throughput corresponding to sample 57 in the training set. This drop

makes it more difficult to find a chromosome that fits the function. Nevertheless, during the

prediction, the throughput does not drop suddenly. Thus, the MAPE for the prediction is better

than the MAPE_bp for the training set.

Figure 64: Training set and prediction zone for random traffic with alpha 2.2 and beta 0.08

Figure 64 is shown in order to be able to understand the results depicted. The MAPE, MAPE_bp,

along with the number of MAPE values higher than the 100% (classified as errors) is illustrated in

Figure 66.

In Figure 65 it can be observed that equation 2, 3 and 4 have a lower MSE_bp, even though the

MAPE_bp is higher in the same equations (2, 3 and 4).

Figure 65: Fitness selection evaluation for the same initial population with Random ON-OFF pattern

In Figure 66 the MAPE, MAPE_bp and the percentage of errors in each equation are shown. In

this case, the magnitude of MAPE_bp is depicted in the secondary axis in order to observe the

other data in a clearer way. The best results are obtained from equation 1, which not only has the

lowest MAPE but also the lower percentage of errors. On the contrary, equation 2 has the highest

percentage of errors and MAPE.

Figure 66: Fitness selection evaluation for the same initial population with Random ON-OFF pattern (with errors criterion)

Eq 1 Eq 2 Eq 3 Eq 4

MAPE MAPE_bp MSE_bp

Eq 1 Eq 2 Eq 3 Eq 4

MAPE Error MAPE_bp

4.4.3.1.2 Evaluation of the fitness function for random pattern and random initial population

For the second scenario, the GA is run 50 times but with a random initial population. This scenario

is proposed in order to observe de behaviour of the GA in the default mode with diversity (set to

1) and a random initial population. The four fitness equations are evaluated for this scenario. The

GA set-up for this scenario is presented in Table 21.

The results of these tests can be observed in Figure 67, where the differences between equation 3

and equation 1 are minimal. However, in this scenario better results are reached with the equation

Figure 67: Fitness selection evaluation for random initial population with random ON-OFF pattern

Until this point, it can be stated that equations 1 and 3 give the best results for the performed

predictions in the scenarios presented. This may be due to the chaotic behaviour, for which it is

more difficult to have a better combination in all the criteria used in equation 1, rather than in

equation 3.

4.4.3.2 Evaluation of the selection method for the random ON-OFF pattern traffic

The selection methods as well as the diversity effects for the random ON-OFF pattern traffic

are evaluated. In this scenario, the same initial population is used as the one exposed in section

4.4.3.1.1. In this scenario, the selection methods tested are the RSW (real_fit), exponential ranking

wheel selection (exponent) and the ranking roulette wheel selection (ranking). In addition, the

Eq 1 Eq 2 Eq 3 Eq 4

MAPE MAPE_bp

diversity is set to 0 in order to reduce the randomness introduced by the new chromosomes (in

each generation).

The tests are run 50 times using the parameters exposed in Table 21 and the average results are

shown in Figure 68.

The best results are obtained with equation 1 with diversity as well as without diversity. However,

the worst results are found with ranking selection method without diversity, maybe because the

ranking method assigns similar probabilities to all the chromosomes. Therefore, this increases the

number of generations needed to converge in a better solution. No one of the 50 repetition

converges to the solution after 100 generations; that is, a solution with a MAPE less than 5% was

not found in this case.

Figure 68: Results for the selection method with and without diversity for random ON-OFF pattern

Table 22 shows a prediction of the available throughout for a random traffic with ON-OFF

pattern.

In Table 22 (a), the training set along with the prediction is shown. The GA statistics from the last

generation are exposed in (b). These statistics correspond to the same data but with scaled data

751,29 751,29

686,41

0,41 1,52 6,436,81 6,81

6,521,67 2,84 6,56

100,00

200,00

300,00

400,00

500,00

600,00

700,00

800,00

Eq 1 real fit Eq 1 ranking Eq1 exponent Eq1 real fit d=1 Eq 1 ranking

Eq 1 exponent

MSE_NC MSE_bp_NC

(using a range of [0-10]). The prediction is shown in (c) and its statistics in (d). These statistical

data correspond to the same data scaled but using a range of [0-10].

MSE_bp = 0.9635

MAE_bp = 0.5635

MAPE_bp = 106.4868%

SSE_bp = 38.5384

Fitness = 36.5736

MSE = 0.1401

MAE = 0.2919

MAPE = 3.1997%

SSE = 5.6032

Expression = ((A49-

sin((47.3837+17.0299)))+sin(76.9

Table 22: Best prediction example for a random traffic with ON-OFF pattern

4.4.4 Real traffic from NS2 evaluation

In this section, the throughput obtained from real traffic acquired at the library of Karlstad

University and simulated in NS2 is considered. Different tests and evaluations are performed so to

analyse the ability of the GA in this scenario. Again, different fitness functions, diversity and

selection methods are considered so to study their effects on the prediction.

Figure 69 shows the available throughput in such real traffic conditions to be predicted. Each

sample of the figure corresponds to 0.1 s

Figure 69: Throughput from the real traffic simulated in NS2

one giving the best results for the real traffic simulated in NS2.

As shown in previous sections, two scenarios are proposed: the first one with the same initial

population without diversity and the second one with a random initial population with diversity.

The results from these scenarios are illustrated in Figure 71 and Figure 72.

4.4.4.1.1 Evaluation of the fitness function for real traffic and same initial population

For the first scenario, the GA is run 50 times. These repetitions are run using the parameters

exposed in Table 23. In this scenario, the diversity is set to 0 in order to reduce the randomness

introduced by the new chromosomes (in each generation). In additional, the selection method used

for this scenario is the RWS (real_fit).

mating_pool = 0.8; pm=0.05; pc=0.7; lengthZ=1; pref=1;

max_value =10; factor_input= 5,0468e-007; quality_error = 0.05; upperbound= 100; lowerbound = -10; Elitism =0.1;

Table 23: GA get-up for the real traffic scenario with the same initial population

The average MAPE, MAPE_bp, MSE and MSE_bp are shown in Figure 71. The MSE and

MSE_bp are shown in order to observe the correlation of these parameters with the MAPE and

MAPE_bp.

The training set and the prediction area selected for this scenario (using the scaled values) are

Figure 70: Training set area and prediction area for real traffic with scaled values

As it can be observed, all equations provide good results, with the exception of equation 1 that has

higher MAPE. There exists some relation between those equations that have the MSE as a criterion

(equation 1, 2 and 3), and the MAPE. For example, equation 3 has lower MSE_bp, and lower

MAPE. On the other hand, equation 1 has higher MSE_bp and higher MAPE. This might be

explained due to the lower standard deviation that exists in the function to forecast.

Figure 71: Fitness selection evaluation for the same initial population with real traffic

4.4.4.1.2 Evaluation of the fitness function for real traffic and random initial population

For this second scenario, the genetic algorithm is run 50 times. This scenario is proposed in order

to observe de behaviour of the GA in the default mode with diversity (set to 1) and a random initial

population. The four fitness equations are evaluated for this scenario. The GA set-up for this

scenario is presented in Table 23.

The averages of these results are depicted in Figure 72. In this second scenario, there are not too

many differences between the proposed equations. There is only a difference of around 2%

between the best and the worst result.

1,91 1,88 1,90

0,90 0,900,90 0,919,32

22,6824,47

Eq 1 Eq 2 Eq 3 Eq 4

MAE MAE_bp POCID fit_eq

Figure 72: Fitness selection evaluation for random initial population with real traffic

Although the best equation in the first scenario with the same initial population is equation 3 while

in this scenario the best equation is 2, Table 24 shows that, in this scenario, the best results in terms

of MAPE are given by equation 3.

Same initial population Random initial population

Fitness

equation

Best MAPE

results (%)

Worst MAPE

results (%)

Best MAPE

results (%)

Worst MAPE

results (%)

Eq1 25.134 38.016 14.162 60.174

Eq2 29.935 29.935 17.427 36.057

Eq3 16.829 29.935 13.463 40.059

Eq4 16.799 29.935 17.154 41.486

Table 24: Best and worst results from random and same initial population with real traffic

4.4.4.2 Evaluation of the selection method and diversity for the real traffic

The selection methods as well as the diversity effect are evaluated for the real traffic. For this

study, 50 repetitions are performed. In this scenario, the same initial population is used, as the one

exposed in section 4.4.4.1.1. The selection methods tested are the RSW (real_fit), exponential

27,7425,17

25,63 26,99

15,0514,04 14,15 14,49 14,61

4,50 4,444,84 4,58

0,62 0,48 0,53 0,550,56

Eq 1 Eq 2 Eq 3 Eq 4 Eq 5

MAPE MAPE_bp MSE MSE_bp

each generation).

For these repetitions, the parameters used are exposed in the Table 23 and the results are

Few differences exist between the three selections methods using diversity and without diversity.

Nevertheless, better results are obtained with real fit and ranking (with diversity and without

diversity). In spite of that, the difference between the worst result and the best in both cases is less

than 1.2 %. The number of generations was the same for all of them, because they did not converge

during the 100 generations. The reason is that the MAPE_bp did not dropped below the 5%.

Figure 73: Results for the selection method with and without diversity for real traffic

Table 25 shows a prediction of the available throughout prediction for a real traffic played

back in NS2.

In Table 25 (a), the training set along with the prediction are shown. The GA statistics from the

last generation are exposed in (b). These statistics correspond to the same data but using a range

of [0-10]. The prediction is shown in (c) along with its statistics in (d). These statistical data

correspond to the same data scaled but using a range of [0-10].

Eq 1 real fit Eq1 exponent Eq 1 ranking Eq 1 exponent

Eq 1 ranking d=1 Eq real fit d=1

MAPE_bp MAPE

MSE_bp = 0.9724

MAE_bp = 0.8760

MAPE_bp = 22.2574%

SSE_bp = 38.8954

Fitness = 24.4131

MSE = 1.0444

MAE = 0.7890

MAPE = 13.5050%

SSE = 41.7770

Expression =

(0.94391*(exp(sin((70.3626/log(9

2.4674))))+A110))

Table 25: Best prediction example for a real traffic played back in NS2

4.4.5 Limitation

In this section, the tests that could not have been performed due to the limitation of the GA

will be explained along with the reason why of the limitation.

These limitations appear when there are large periods of OFF, and therefore the available

throughput is zero. Figure 74 shows as, for example, a random activity pattern for an alpha 2.6 and

beta 1.3.

This is because a high number of samples for the training set has to be selected in order not to have

an OFF period in the training set. This is why, in the case an OFF period is selected for the training

set, this will lead with the prediction of an OFF period (if this is long). The downside of increasing

the training set is the increase of the computational time, because more samples have to be

evaluated. In addition, in order to increase the training set, this randomness in the ON-OFF pattern

makes more difficult to find a function that fits in this large training set.

Figure 74: Random activity pattern for an alpha 2.6 and beta 1.3

4.5 Summary

Large number of available bandwidth results for different PU activity patterns combinations

have been obtained. The available bandwidth tends to grow lightly with alpha and beta and the

retransmission time has a fundamental impact on the available bandwidth. Randomness in the PU

activity makes predictions very unreliable. Regarding the MAC busy time, it seems to be a relation

between TCP available bandwidth and MAC busy time. In a scenario with the “hidden node

problem”, the interferer node affects harmfully to the available bandwidth.

The implementation of the GA is successful and its correct functioning has been proved. However,

a limitation in the GA for the proposed scenarios is encountered. Regarding the available

bandwidth for ON-OFF PU activity patterns tested with the GA, even though the behaviour of the

available bandwidth is chaotic, reliable outcomes have been obtained.

In the scenarios proposed, introducing new chromosomes in each generation as a diversity method

improve the GA in the search of a better solution (reducing the MAPE of the best solution).

Varieties of results were obtained for the selection method. Using the average of the tests

performed, in almost all the scenarios the roulette wheel (real fit) gives better results. With

equation 1 reliable results were obtained in most scenarios, even though this has not been the

lowest result. This is because equation 1 is composed of different criteria, which results in more

robust results.

Chapter 5 Conclusions

5.1 Final conclusions

This chapter presents the most relevant conclusions about the project carried out. This thesis

looks into the TCP available bandwidth in different scenarios and compares the results with UDP

performance and MAC busy time in the same situations. These situations include the presence of

PU activity in the wireless medium with fixed deterministic patterns, PU activity with random

generated patterns and Secondary Users activity that can be either inside or outside the sensing

range.

In addition, a Genetic Algorithm is developed in order to forecast the available TCP throughput

for different scenarios. The scenarios tested with the GA are: a periodical and symmetrical

function; a non-random and random traffic with ON-OFF patterns; and real traffic played back

126 CHAPTER 5: CONCLUSIONS

with NS2. Not only the feasibility of the prediction has been studied, but also the effect of the use

of different fitness functions in scenarios proposed has been analysed.

5.1.1 Conclusions

Here the most important conclusions are presented. Regarding the TCP available bandwidth,

the following conclusion can be pointed out:

The available bandwidth tends to slightly grow with alpha and beta

Studying deterministic fixed periodic PU activity patterns, if the relationship between alpha and

beta is fixed, meaning that the PU busy time is constant, a relationship between the alpha/beta

values and the available TCP throughput exists. The available bandwidth, that in the UDP case is

constant for all alpha/beta values, follows a pseudo periodic behaviour giving values in a range

from the maximum achievable TCP throughput and zero. Whenever the retransmission time

matches with the PU ON period, the resultant throughput is zero as the retransmissions always

match with PU ON activity time. In the same situation, but with random generated PU activity

patterns, there is not a clear relationship between alpha/beta values and the available throughput.

However, the trend shows that the available bandwidth tends to increase as alpha/beta values do.

This is because the TCP congestion window is able to grow to greater values. A more clear increase

as alpha and beta grow was expected.

Retransmission time has a fundamental impact on the available bandwidth

If PUs whose activity patterns have long ON periods is using the medium, retransmission time

plays a fundamental role in defining the available bandwidth. This is because after multiple

retransmissions the RTO grows to large values.

If alpha is fixed and beta changes, for deterministic PU activity patterns, the available bandwidth

decreases as beta grows having a zero value if the RTO matches with the period of the pattern. In

the same case, but for random generated PU activity patterns, the results show that there is a clear

correspondence. As beta grows, the available bandwidth decreases even faster until a value of a

40% PU activity time. From this time on, the decreasing is slower.

Randomness in the PU activity makes predictions very unreliable

CHAPTER 5: CONCLUSIONS 127

There is not a clear relationship between alpha/beta values and the available throughput However,

the trend shows that the available bandwidth increases as alpha/beta values do. Therefore, a priori

estimation of the available bandwidth based only on alpha and beta values would not be reliable.

A clearer correspondence was expected.

It seems to be a relationship between TCP available bandwidth and MAC busy time

Comparing both MAC busy time and the utilisation of the bandwidth, there are some alpha-beta

combinations with a percentage of idle time being unused. As the TCP available bandwidth grows

bigger, the MAC busy time decreases. However, it does not decrease in equal proportion. As a

conclusion, a relationship exists and model for the estimation of the available bandwidth based on

the MAC busy time for different PU activity patterns may be developed.

In a scenario with only secondary users and real traffic, they share the bandwidth if they can sense

each other

Having only TCP Secondary Users in the wireless scenario, the available bandwidth is shared and

therefore the real traffic played back suffers a modification (packets delays). The real traffic uses

only a 50% of the whole bandwidth leaving the rest for the other nodes that are also using the

medium. In a scenario with the “hidden node problem”, if RTS/CTS are disabled, the medium is

not shared, the real traffic played back takes almost all the available bandwidth and the collision

percentage is very large. These results match with what we expected.

The results of the available bandwidth with PU activity can be also applied to the “hidden node

problem”

In a scenario with the hidden node problem, the available bandwidth is similar to the case of the

WLAN with PU activity. This is because, if the Interferer SU is transmitting, almost all the packets

are lost due to interferences (that will be collisions in this case). Hence, we can conclude that the

study of the PU activity impact on SU can be applied to this scenario. In that case, the interferer

node will behave in a similar way as a PU. The situation is not exactly the same, since the node

that is being interfered (i.e., the one that is receiving data and gets the collisions) will probably

sense the hidden node and vice versa. Therefore, this node will modify its behaviour according to

the hidden node and the hidden node will modify slightly its behaviour.

The proper functioning of the GA has been proven

The GA is a powerful tool used in different fields to try to predict from a time series data. As a

definition, a basic structure is given, but high number of variations can be done in order to improve

the performance of the GA. These variations stem from the basic functions of the genetic algorithm

as the selection, crossover, mutation or encoding. Beyond these variations in the basic structure

and operation of the GA, different parameters have to be set-up. These parameters add complexity

to the GA. This is because modifying these parameters will change the behaviour of the GA, and

therefore the results.

Thus, a high number of tests combinations can be performed for each scenario proposed. However,

we have proven in this thesis that the basic configuration (based on the main GA structure) of the

GA is enough to guarantee a good prediction. Obviously, as much tests and as more adjusted is

the GA for the proposed problem a better solution could be found.

MATLAB is a good tool for the GA implementation

The main advantage of using MATLAB for the implementation is the simplicity of programming

(due to all the already developed instructions). The graphical interface, debugger, and

mathematical support along with this simplicity of programming have made possible to finish the

implementation and test the algorithm.

However, the main drawback is the computational time (if compared with another programming

language). Therefore, the computational time impacts on the time required to perform the tests and

the necessity to use several computers in parallel.

A limitation in the GA for the scenarios proposed have been encountered

A limitation on the use of the GA as a forecasting tool was found in some scenarios proposed

during the tests stage. This limitation corresponds to those random ON-OFF activity pattern

scenarios with a long period of OFF time. In order not to have a whole OFF period in the training

set, a high number of samples for the training set must be selected. For this reason, if this OFF

period is long (more than the training set), it could lead to an OFF period prediction. The downside

of increasing the training set is that more samples have to be evaluated. Therefore, the

computational cost increases. Another problem of increasing this training set is that it makes more

difficult to find a function that fits in this large training set (due to the randomness in the ON-OFF

pattern).

Reliable results in several scenarios have been achieved

For non-random ON-OFF patterns, very good results have been found. This is a result of the

periodicity of the available bandwidth found in this scenario. In contrast, with random patterns,

worst results with non-random ON-OFF activity patterns have been obtained. Moreover, good

results have been obtained for the real traffic played back with NS2.

The GA can successfully be applied to the TCP available bandwidth forecasting

Although the GA is not a prediction tool, but rather a stochastic search method, this has been used

for the TCP available bandwidth forecasting with good results.

Better results obtained by introducing new chromosomes in each generation as diversity

In the proposed scenario, introducing new chromosomes in each generation as a diversity method

improve the GA in the search of a better solution (reducing the MAPE of the best solution.

However, the use of diversity can lead to increase the number of generations to converge. Better

results in MAPE are obtained by introducing new chromosomes rather than by mixing

chromosomes. It is noteworthy that these diversity methods were used with the same initial

population and with 30% of the reproduction mating pool. Therefore, different results can be

obtained in different scenarios and with different parameters.

Good results in almost all scenarios with the roulette wheel.

Different results were obtained for the selection method. In almost all the scenarios, the roulette

wheel selection gives better results (using the average of the tests performed). However, the best

result in each set is not always found with this selection method. The best solution (lower MAPE)

was sometimes was found using the ranking and exponential selection method. It can be deduced

that the average results of these two methods (ranking and exponential) are worse because of the

need of more generations to converge. However, this can be even worse when using the ranking

method, due to higher probabilities of reproduction of the worst chromosomes.

Equation 1 is composed of different criteria, which results in more robust results

Different results were obtained using the proposed fitness functions. From the tests performed and

the scenarios proposed, we can conclude that equation 2 can lead to undesired results, since the

only criterion used for the fitness is the MSE. This is because the only criterion used for this fitness

is the MSE. Equation 3, which is the same as equation 2 but using also POCID, results in better

outcome (in general). Although not always the best results are obtained with this fitness function,

reliable results were obtained with equation 1. This is because equation 1 is composed of different

criteria, which results in more robust results in all the scenarios.

5.1.2 Project Evaluation

The final impression is satisfactory since several interesting conclusions have been drawn and the

main part of the objectives has been fulfilled. The large number of test run in NS2 allows obtaining

quality results because many situations and possibilities have been taken into account. The Genetic

algorithm, which requires a very big effort to be understood and programmed, works smoothly

and its predictions have reasonable accuracy. The feeling is that the study will be helpful in future

research and in our professional career.

5.1.3 Problems found during the project

Regarding problems encountered during the implementation, a relevant problem was the

compatibility problems between versions of NS2. The patches were applied manually because of

this incompatibility, and this takes time. NS2 has a large number of bugs has to be solved manually.

The most difficult part in NS2 programming (in the C++ part) is that debugging is hard as no good

debugger can be used. Therefore, finding errors is very complicated. Understanding the NS2-

CRAHN [4] implementation took a lot of time and finally we came into conclusion that the

implementation was not valid for high data rates. This encouraged this team to carry out a new

implementation of the spectrum management for PU based on the NS2-CRAHN structure.

In order to speed up the testing stage, which was very time consuming due to the significant

number of tests, we would suggest to run the tests in cloned computers or in a more powerful

machine.

Regarding the Genetic Algorithm, different problems before and during its implementation was

found. First, before its implementation, difficulties to understand the proper functioning of the GA

as a forecasting tool were encountered. Furthermore, the initial fitness equation implemented

(taken from [27]) resulted in inadequate outcomes. Therefore, looking for alternatives for fitness

equations approaches was needed. Some equations were found, implemented and proposed for

their study in the scenarios proposed.

During the test and verification stage, we realized that in some cases the GA was stuck in a local

minimum. In order to solve this problem, some diversity methods were searched in the literature.

5.2 Future work

Using the results of this Master´s Thesis, the available bandwidth can be estimated. If the PU

activity of the network can be modelled using an ON-OFF birth-death Markovian process

distribution (a pattern with similar characteristics as the studied in this thesis), then the estimated

available bandwidth will correspond to the measured in the results. Obviously, the results will not

be accurate because a real situation will be affected by many external factors.

In relationship to the TCP available bandwidth study, for the future, a research on different

scenarios using background real traffic of Secondary Users would be interesting. A scenario with

PUs and real traffic from SUs at the same time would be also interesting to be analysed.

Another possibility is to program functions that simulate a realistic WLAN behaviour when a new

user joins the network. This situation could be done using a pattern file (similar to the PU activity

file) that contains an activity pattern for SU. The new users joining the network would have to wait

until the channel is idle (the pattern tells that it is idle) and then they will be able to transmit all the

time they require.

A MAC busy time study in order to find a model to estimate the TCP available bandwidth may be

carried out.

A similar project using birth-death Markovian process distributions with TCP traffic acting as

hidden node can be interesting.

According to the computation time required for the GA, the forecasting of the available bandwidth

in real time is impossible. Even so, this could be used to forecast the available bandwidth for the

next days or months.

Moreover, the parameters of the GA could be optimized in order to obtain results that are more

accurate. This requires performing more tests observing the effects of these parameters on the

prediction for the scenario proposed.

Furthermore, other tools as a neural network could be tried. The drawbacks of this other tool are

the complexity and the higher computational cost.

References

[1] wikipedia.org, ”ISM band,” wikipedia.org, 2 June 2014. [Online]. Available:

http://en.wikipedia.org/wiki/ISM_band#ISM_bands. [Använd 10 June 2014].

[2] wikipedia.org, ”Cognitive radio,” wikipedia.org, 3 June 2014. [Online]. Available:

http://en.wikipedia.org/wiki/Cognitive_radio. [Använd 10 June 2014].

[3] S. Hanna and J. Sydor, “Spectrum metrics for 2.4 GHz ISM band Cognitive Radio

applications,” in Personal Indoor and Mobile Radio Communications (PIMRC), 2011

IEEE 22nd International Symposium on, Toronto, ON, 2011.

[4] M. D. Felice, K. R. Chowdhury, L. Bononia, A. Kassler and W. kim, “End-to-end

Protocols for Cognitive Radio Ad Hoc Networks: An evaluation study,” 2010.

[5] I. F. Akyildiz, W.-Y. Lee and K. R. Chowdhury, “CRAHNs: Cognitive radio ad hoc

networks,” Broadband Wireless Networking Laboratory, School of Electrical and

Computer Engineering, Georgia Institute of Technology, Atlanta, 2009.

[6] M. abd Rabou Ahmed Kalil, ”Modelling and Analysis of Cognitive Radio Ad Hoc

Networks,” Ilmenau University of Technology, Ilmenau, Germany, 2011.

[7] L-com Global Conectivity, “Advantages and Disadvantages of ISM Band

Frequencies,” L-com Global Conectivity, 22 October 2013. [Online]. Available:

http://www.l-com.com/content/Article.aspx?Type=N&ID=10421. [Accessed 7 May

2914].

[8] Nsnam, “Main page, ns2,” Nsnam, 4 november 2011. [Online]. Available:

http://nsnam.isi.edu/nsnam/index.php/Main_Page. [Accessed 21 February 2014].

[9] wikipedia.org, ”wikipedia.org,” wikipedia.org, 11 May 2014. [Online]. Available:

http://en.wikipedia.org/wiki/Methodology. [Använd 18 May 2014].

[10] Scientific Buddies, ”The Engineering Design Process,” Scientific Buddies, 2002 -

2014 . [Online]. Available: http://www.sciencebuddies.org/engineering-design-

process/engineering-design-process-steps.shtml#theengineeringdesignprocess. [Använd

2014 May 2014].

[11] Hiertz, G.R. , Denteneer, D., Stibor, L., Zang, Y., Costa, X.P. and Walke, B., “The

IEEE 802.11 universe,” IEEE Communications Magazine, vol. 48, no. 1, January 2010.

[12] Wikipedia, “IEEE 802.11,” Wikipedia.org, 18 February 2014. [Online]. Available:

http://en.wikipedia.org/wiki/IEEE_802.11. [Accessed 21 February 2014].

[13] Wikipedia, “IEEE 802.11g-2003,” wikipedia.org, 22 November 2013. [Online].

Available: http://en.wikipedia.org/wiki/IEEE_802.11g-2003. [Accessed 22 February

2014].

[14] M. C. a. E. G. L. Bononi(, “Design and Performance Evaluation of an Asymptotically

Optimal Backoff Algorithm for IEEE 802.11 Wireless LANs,” in System Sciences, 2000.

Proceedings of the 33rd Annual Hawaii International Conference, Bologna, January 2000.

[15] Y.-M. C. T.-H. L. a. A. H. Shao-Cheng Wang, “Performance Evaluations for Hybrid

IEEE 802.11b and 802.11g Wireless Networks,” Department of Electrical Engineering,

University of Southern California, U.S.A. ; Wireless Design Center Winbond Electronics

Corporation America, U.S.A.; Department of Communication Engineering, National

Chiao Tung University, Taiwan., April 2005.

[16] wikipedia.org, “DCF Interframe Space,” wikipedia.org, 26 February 2014. [Online].

Available: http://en.wikipedia.org/wiki/DCF_Interframe_Space. [Accessed 10 June

2014].

[17] K. Xu, Mario Gerla and Sang Bae, “Effectiveness of RTS/CTS handshake in IEEE

802.11 based ad hoc networks,” Elsevier B.V., 2003.

[18] S. Robitzsch, “SEbastian's WLAN CalcUlation Tool,” www.seronline.de, 2010.

[Online]. Available: http://seronline.de/sewcut/. [Accessed 11 February 2014].

[19] “Wireless LAN,” Conectivity Knowledge Platform, [Online]. Available:

http://ckp.made-it.com/ieee80211.html.

[20] D. A. R. P. Agency, “Transmission Control Protocol,” IETF, 1981.

[21] P. R. Egli, “docstoc.com,” 2011. [Online]. Available:

http://www.docstoc.com/docs/119960349/TCP---Transmission-Control-Protocol---

RFC793. [Accessed 3 March 2014].

[22] Wikipedia, “TCP window scale option,” wikipedia.org, December 22 2013. [Online].

Available: http://en.wikipedia.org/wiki/TCP_window_scale_option. [Accessed 3 March

2014].

[23] R. Keith W and K. James F, “TCP Congestion Control,” 1996-2000. [Online].

Available: http://210.43.128.116/jsjwl/net/ross/book/transport_layer/congestion.html.

[Accessed 16 April 2014].

[24] IETF, ”RFC 6349,” IETF, August 2011. [Online]. Available:

https://tools.ietf.org/html/rfc6349#page-12. [Använd 4 March 2014].

[25] T. Issariyakul and E. Hossain, Introduction to Network Simulator NS2, New York:

Springer, 2009, p. Preface 7.

[26] IETF, “RFC 2988 - Computing TCP's Retransmission Timer,” IETF, November

2000. [Online]. Available: http://tools.ietf.org/html/rfc2988. [Accessed 2014 01 04].

[27] D. T. &. K. Moessner, “Traffic modelling and forecasting using genetic algorithms,”

Ann. Telecommun, no. 64, p. 535–543, 2009.

[28] Wikipedia, “Wikipedia - Genetic algorithm,” 26 February 2014. [Online]. Available:

http://en.wikipedia.org/wiki/Genetic_algorithm. [Accessed 27 February 2014].

[29] P. T. Rodríguez-Piñero, “Introducción a los algoritmos genéticos y sus aplicaciones,”

[Online]. Available: www.uv.es/asepuma/X/J24C.pdf. [Accessed 27 February 2014].

[30] “Intelligent Systems Group,” [Online]. Available:

http://www.sc.ehu.es/ccwbayes/docencia/mmcc/docs/temageneticos.pdf. [Accessed 26

February 2014].

[31] J. S. M. T. Matías Ison, “Algoritmos genéticos: aplicación en MATLAB,” 25

November 2005. [Online]. Available:

http://users.df.uba.ar/ariel/materias/FT3_22006/Guias/old/guia_ga.pdf. [Accessed 26

February 2014].

[32] Elena Pérez, “Guía para recién llegados a los algoritmos geneticos,” 2010. [Online].

Available:

http://www.insisoc.org/elena/Elena%20Perez%20Vazquez_archivos/files_newcomers/ne

wcomers-spanish.pdf. [Accessed 26 February 2014].

[33] C. Reeves, “Genetic algorthms - Chapter 3,” [Online]. Available:

http://sci2s.ugr.es/docencia/metah/bibliografia/GeneticAlgorithms.pdf. [Accessed 28

February 2014].

[34] “University of Amsterdam - What is an Evolutionary Algorithm?,” [Online].

Available: http://www.cs.vu.nl/~gusz/ecbook/Eiben-Smith-Intro2EC-Ch2.pdf. [Accessed

28 February 2014].

[35] Wikipedia, “Wikipedia - Genotipo,” 28 February 2014. [Online]. Available:

http://es.wikipedia.org/wiki/Genotipo. [Accessed 28 February 2014].

[36] M. Gen och R. Cheng, ”Genetic Algorithms and Engineering Design,” United States

of America, John Wiley & Sons, Inc., 1997, pp. 5-8.

[37] J. G. Noraini Mohd Razali, “Genetic Algorithm Performance with Different Selection

Strategies in Solving TSP,” in WCE 2011, London, U.K., 2011.

[38] N. S. &. Y. S. Rahul Malhotra, “Genetic Algorithms: Concepts, Design for

Optimization of Process,” Computer and Information Science, vol. 4, no. 2, pp. 39-54,

[39] C. J. a. G. Evertsson, “Master Thesis: Optimizing Genetic Algorithms for Time

Critical Problems,” June 2003. [Online]. Available:

http://digitalamedier.bth.se/fou/cuppsats.nsf/all/7f65a646dddb44a7c1256d44003e9326/$

file/Optimizing%20Genetic%20Algorithms%20for%20time%20critical%20problems.pd

f. [Accessed 3 March 2014].

[40] L. M. Schmitt, “Theory of genetic algorithms,” Theoretical Computer Science, no.

256, pp. 1-61, 2001.

[41] H. Pohlheim, ”Evolutionary Algorithms: Overview, Methods and Operators,” i

GEATbx.com - Genetic and Evolutionary Algorithm Toolbox for Matlab, GEATbx version

3.8, 2006, pp. 9-33.

[42] A. Alvarez, A.Orfila och J. Tintore, ”DARWIN: An evolutionary program for

nonlinear modeling of chaotic time series,” Computer Physics Communications 136, pp.

334-349, 2001.

[43] G. G. Szpiro, “Forecasting chaotic time series with genetic algorithms,” The

American Physical Society, vol. 55, no. 3, pp. 2557 - 2568, 1997.

[44] Jyotishree and R. Kumar, “Blending Roulette Wheel Selection & Rank Selection in

Genetic Algorithms,” International Journal of Machine Learning and Computing, vol. 2,

no. 4, pp. 365 - 370, 2012.

[45] S. A. Burns, Recent advancesd in optimal structural design, United States of America:

ASCE, 2002, pp. 66-70.

[46] R. Chakraborty, “Fundamentals of Genetic Algorithms,” 01 June 2010. [Online].

Available: http://www.myreaders.info/09_Genetic_Algorithms.pdf. [Accessed 1 March

2014].

[47] Y. KAYA, M. UYAR and R. TEKIDN, “A Novel Crossover Operator for Genetic

Algorithms: Ring Crossover,” 2 May 2011. [Online]. Available:

http://arxiv.org/abs/1105.0355. [Accessed 20 May 2014].

[48] L. W. F. L. W. a. J. L. Hepu Deng, “Artificial Intelligence and Computational

Intelligence,” Berlin, Springer, 2009, p. 132.

[49] J. a. P. T. A. Arranz de la Peña, “Algoritmos geneticos,” [Online]. Available:

http://www.it.uc3m.es/jvillena/irc/practicas/06-07/05.pdf. [Accessed 1 March 2014].

[50] Wikipedia, “Wikipedia - Time series,” 11 March 2014. [Online]. Available:

http://en.wikipedia.org/wiki/Time_series. [Accessed 21 March 2014].

[51] F. Takens, ”Detecting strange attractor in turbulence,” Lecture Notes in Mathematics,

vol. 898, pp. 366-381, 1981.

[52] N. Packard, J.P.Crutchfield, J.D.Farmer and R.S.Shaw, “Geometry from a Time

Series,” PHYSICAL REVIEW LETTERS, vol. 45, no. 9, pp. 712-716, 1980.

[53] Taken's theorem in action for the Lorenz chaotic attractor. [Film]. Youtube.

[54] S. R. García, M. P. Romo och J. Figueroa-Nazuno, ”Characterization of ground

motions using recurrence plots,” Geofísica Internacional , vol. 52, nr 3, pp. 209-227, 28

June 2013.

[55] H. Kim, R. Eykholt and J. Salas, “Nonlinear dynamics, delay times, and embedding

windows,” Physica D, no. 127, pp. 48- 60, 1999.

[56] L. Cao, “Practical method for determining the minimum embedding dimension of a

scalar time series,” Physica D, no. 110, pp. 43-50, 1997.

[57] A. A. B. Perez, “Estudio del comportamiento de los precios del cobre y desarrollo de

un modelo de pronostico,” 2007. [Online]. Available: http://css.csregistry.org/tiki-

download_wiki_attachment.php?attId=135. [Accessed 31 March 2014].

[58] M. Hong-guang and H. Chong-zhao, “Selection of Embedding Dimension and Delay

Time in Phase Space Reconstruction,” Front. Electr. Electron. Eng. China, vol. 1, pp. 111-

114, 2006.

[59] C. Piccardi, 26 September 2006. [Online]. Available:

ftp://ftp.elet.polimi.it/users/Carlo.Piccardi/VarieCda/Articoli/CdA-Art-SerieTemporali-

4.pdf. [Accessed 2014 March 2014].

[60] D. Kugiumtzis, ”State space reconstruction parameters in the analysis of chaotic time

series - the role of the time window length,” Physica D, nr 95, pp. 13-28, 1996.

[61] R. Hegger, H. Kantz and T. Schreiber, “Practical implementation of nonlinear time

series methods: The TISEAN package,” 2007. [Online]. Available: http://www.mpipks-

dresden.mpg.de/~tisean/Tisean_3.0.1/index.html. [Accessed 12 March 2014].

[62] A. M. Fraser and H. L. Swinney, “Independent coordinates for strange attractors from

mutual information,” Physical Review A, vol. 33, no. 2, pp. 1134-1140, 1986.

[63] R. V. G, “Recurrence quantification analysis of system signals for detecting tool and

chatter in turning,” 16 November 2012. [Online]. Available:

http://hdl.handle.net/10603/5164. [Accessed 20 May 2014].

[64] M. B.Kennel, R. Brown och H. D. I. Abarbanel, ”Determining embedding dimension

for phase-space reconstruction using a geometrical construction,” PHYSICAL REVIEW A

, vol. 45, nr 6, pp. 3403-3411, 1992.

[65] L. Jiayu, H. Zhiping, W. Yueke and S. Zhenken, “Selection of proper embedding

dimension in phase space reconstruction of speech signals,” Journal of Electronics, vol.

17, no. 2, pp. 161 - 169, 2000.

[66] M. Ruey and S. Tsay, “Lecture 1: Univariate Time Series,” Autumm Quarter 2008.

[Online]. Available: http://faculty.chicagobooth.edu/ruey.tsay/teaching/uts/lec1-08.pdf.

[Accessed 21 March 2014].

[67] Wikipedia, ”Wikipedia - Multivariate analysis,” 19 March 2014. [Online]. Available:

http://en.wikipedia.org/wiki/Multivariate_analysis. [Använd 21 March 2014].

[68] C. Martin, ”Nonlinear prediction of chaotic time series,” Physica D, vol. 35, pp. 335-

356, 1989.

[69] A. S. Soofi and L. Cao, “Modelling and Forecasting Financial Data,” in Techniques

of Nonlinear Dynamics, Norwell, Massachusetts 02061 USA, Kluwer academic

publishers, 2002, pp. 205-211.

[70] H. Cheng, P.-N. Tan, J. Gao and J. Scripps, “Multistep-Ahead Time Series

Prediction,” Advances in Knowledge Discovery and Data Mining, vol. 3918, pp. 765-774,

[71] W. Ming, Y. Bao, Z. Hu and T. Xiong, “Multistep-Ahead Air Passengers Traffic

Prediction with Hybrid ARIMA-SVMs Models,” The Scientific World Journal, vol. 2014,

pp. 1-14, 2014.

[72] Nsnam, “User Information, ns2,” Nsnam, 4 November 2011. [Online]. Available:

http://nsnam.isi.edu/nsnam/index.php/User_Information. [Accessed 21 February 2014].

[73] Nsnam, “What is NS-3,” Nsnam, 2012. [Online]. Available:

http://www.nsnam.org/overview/what-is-ns-3/. [Accessed 21 February 2014].

[74] “Implementation of a cross-layer MAC and channel allocation scheme for Cognitive

Radio Ad Hoc Networks (CRANHs).,” 2009.

[75] Wikipedia, “Birth–death process,” wikipedia.org, 18 february 2014. [Online].

Available: http://en.wikipedia.org/wiki/Birth%E2%80%93death_process. [Accessed

2014 March 03].

[76] Y. Zhang, “Spectrum Handoff in Cognitive Radio Networks: Opportunistic and

Negotiated Situations,” Simula Research Laboratory, Norway, 2009.

[77] E. Kalyvas, “Thesis: Using neural networks and genetic algorithms to predict stock

market returns,” Octuber 2001. [Online]. Available:

http://www.125books.com/inc/pt4321/pt4322/pt4323/pt4324/pt4325/data_all/books/U/U

sing%20Neural%20Networks%20And%20Genetic%20Algorithms%20To%20Predict%

20Stock%20Market%20Returns.pdf. [Accessed 17 April 2014].

[78] M. Liu, R. Wang, J. Wu and R. Kemp, “A Genetic-Algorithm-Based Neural Network

Approach for Short-Term Traffic Flow Forecasting,” Advances in Neural Networks, vol.

3498, pp. 965-970, 2005.

[79] A.-a. H. Mantawy, “Genetic Algorithms Application to Electric Power Systems,”

2012. [Online]. Available: http://cdn.intechopen.com/pdfs-wm/33395.pdf. [Accessed 17

April 2014].

[80] WangYan, W. Hua och X. Limin, ”Highway Traffic Prediction with Neural Network

and Genetic Algorithms,” Vehicular Electronics and Safety, pp. 211-216, 2005.

[81] M. A. Abido och A.Elazouni, ”Improved Crossover and Mutation Operators for

Genetic Algorithm Project Scheduling,” IEEE Congress on Evolutionary Computation,

pp. 1865-1872, 2009.

[82] A. Acan, H. Altincay, Y. Tekol och A. Unveren, ”A Genetic Algorithm with Multiple

Crossover Operators for Optimal Frequency Assignment Problem,” Evolutionary

Computation, vol. 1, pp. 256-263, 2003.

[83] S. Picek, M. Golub och D. Jakobovic, ”Evaluation of Crossover Operator

Performance in Genetic Algorithms with Binary Representation,” [Online]. Available:

http://bib.irb.hr/datoteka/537238.icic.pdf. [Använd 17 April 2014].

[84] Erlend, “Erlend's Lookout Post,” 21 July 2009. [Online]. Available:

http://erl1.wordpress.com/2009/07/21/15/. [Accessed 24 April 2014].

[85] T. Henderson, “18.2 Two-ray ground reflection model,” Nsnam, 11 May 2011.

[Online]. Available: http://www.isi.edu/nsnam/ns/doc/node218.html. [Accessed 24 April

2014].

[86] T. Henderson, “18.1 Free space model,” Nsnam, 11 May 2011. [Online]. Available:

http://www.isi.edu/nsnam/ns/doc/node217.html. [Accessed 24 April 2014].

[87] M. Garetto, T. Salonidis och E. Knightly, ”IEEE Explore,” [Online]. Available:

http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=4457986&url=http%3A%2F%2F

ieeexplore.ieee.org%2Fiel5%2F90%2F4359146%2F04457986.pdf%3Farnumber%3D44

57986.

[88] M. Ergen, “802.11 Tutorial,” Department of Electrical Engineering and Computer

Science, University of California Berkeley, California, June 2002.

[89] D. J. Widmer, “icapeople.epfl.ch,” [Online]. Available:

http://icapeople.epfl.ch/widmer/uwb/ns-2/noah/. [Accessed 4 April 2014].

[90] D. J. Widmer, “Dr. Joerg Widmer,” [Online]. Available:

http://www.joergwidmer.org/.

[91] G. Combs, ”www.wireshark.com,” [Online]. Available:

http://www.wireshark.org/about.html. [Använd 24 April 2014].

[92] wikipedia.org, “http://en.wikipedia.org/wiki/Comma-separated_values,”

wikipedia.org, 2014 April 22. [Online]. Available: http://en.wikipedia.org/wiki/Comma-

separated_values. [Accessed 2014 April 24].

[93] perl.org, “The Perl Programming Language,” perl.org, 2014. [Online]. Available:

http://www.perl.org/about.html. [Accessed 2014 April 24].

[94] J. Kaur, ”Analysis of Available Bandwidth Measurement Techniques,” [Online].

Available: http://www.cs.unc.edu/~jasleen/Research-analysisof.htm. [Använd 22 April

2014].

[95] nsnam, “40.5 Applications objects,” nsnam, 05 May 2011. [Online]. Available:

http://www.isi.edu/nsnam/ns/doc/node516.html. [Accessed 05 April 2014].

[96] V. G, “Linux Shell Scripting Tutorial v1.05r3,” nixCraft Technologies, 1999-2002.

[Online]. Available: http://www.freeos.com/guides/lsst/. [Accessed 17 April 2014].

[97] Free Software Foundation Inc., “The GNU Awk User's Guide,” 2013. [Online].

Available: http://www.gnu.org/software/gawk/manual/gawk.html. [Accessed 17 April

2014].

[98] T. Williams and C. Kelley, “gnuplot homegage,” Gnuplot, February 2014. [Online].

Available: http://gnuplot.info/. [Accessed 2014 April 24].

[99] MPagan, “Xgraph General Purpose 2-D Plotter,” Xgraph , 21 February 2014.

[Online]. Available: http://www.xgraph.org/. [Accessed 24 April 2014].

[100] M. Grey, “Marc Grey's Tutorial. IX. Running Wireless Simulations in ns,” nsnam,

[Online]. Available: http://www.isi.edu/nsnam/ns/tutorial/nsscript5.html . [Accessed 24

April 2014].

[101] Wikipedia, “Wikipedia - Reverse Poslish Notation,” 10 April 2014. [Online].

Available: http://en.wikipedia.org/wiki/Reverse_Polish_notation. [Accessed 26 April

2014].

[102] A. R. L. Junior, “A Study for Multi-Objective Fitness Function for Time Series

Forecasting with Intelligent Techniques,” Proceedings of the 10th annual conference

companion on Genetic and evolutionary computation, pp. 1843-1846, 2008.

[103] G. Good, J. Hartmanis och J. v. Leeuwen, ”Advances in Neural Networks,” Berlin,

Springer, 2007, pp. 606-608.

[104] T. A. E. Ferreira, G. C. Vasconcelos and P. J. L. Adeodato, “A New Hybrid Approach

for Enhanced Times Series Prediction,” in UNISINOS, Säo Leopoldo, 2005.

[105] B. Can, A. Beham and C. Heavey, “A comparative study of genetic algorithm

components in simulation-based optimisation,” Proceedings of the 2008 Winter

Simulation Conference, pp. 1829-1837, 2008.

[106] S. F. Galán, ”A Novel Mating Approach for Genetic Algorithms,” 2007. [Online].

Available:

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.154.6796&rep=rep1&type=pd

f. [Använd 23 April 2014].

[107] S. Patil, ”Indian ETD Repository,” 2012/2013. [Online]. Available:

http://hdl.handle.net/10603/6111. [Använd 21 April 2014].

[108] T. Blickle and L. Thiele, “A Comparison of Selection Schemes used in Genetic

Algorithms,” 11 December 1995. [Online]. Available:

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.11.509&rep=rep1&type=pdf.

[Accessed 23 April 2014].

[109] X. Yao, ”Lecture 02 - Genetic Representation, Search, Operators, Selection Schemes

and Selection Pressure,” 5 Octuber 2009. [Online]. Available:

http://www.cs.bham.ac.uk/~pkl/teaching/2009/ec/lecture_notes/l02-operators.pdf.

[Använd 23 April 2014].

[110] B. Chakraborty and P. Chaudhuri, “On The Use of Genetic Algorithm with Elitism in

Robust and Nonparametric Multivariate Analysis,” AUSTRIAN JOURNAL OF

STATISTICS, vol. 32, no. 1&2, pp. 13-27, 2003.

[111] A. Ghosh, S. Roy, J. PalChoudhury, S. R.BhadraChaudhuri and S. Mandal, “A Novel

Approach of Genetic Algorithm in prediction ofTime Series Data,” IJCA Special Issue on

Advanced Computing and Communication Technologies for HPC Applications, no. 1, pp.

16-20, 2012.

[112] G. Bee-Hua, ”Evaluating the performance of combining neural networks and genetic

algorithms to forecast construction demand: the case of the Singapore residential sector,

Construction Management and Economics,” Construction Management and Economics,

pp. 209-217, 2010.

[113] N. Mastorakis, V. Mladenov och V. T. Kontargyr, ”Proceedings of the European

Computing Conference,” i Proceedings of the European Computing Conference: Volume

2, Springer Science + Business Media, LLC, 2009, pp. 9-12.

[114] A. Rexhepi, A. Maxhuni och A. Dika, ”Analysis of the impact of parameters values

on the Genetic Algorithm for TSP,” IJCSI International Journal of Computer Science

Issues, vol. 10, nr 3, pp. 158-164, 2013.

[115] M. 0. Odetayo, ”OPTIMAL POPULATION SIZE FOR GENETIC ALGORITHMS :

AN INVESTIGATION,” i Genetic Algorithms for Control Systems Engineering, IEE

Colloquium on, London, 1993.

[116] I. C. Society, “Part 11: Wireless LAN Medium Access Control (MAC) and Physical

Layer (PHY) Specifications,” in IEEE Standard for Information technology--

Telecommunications and information exchange between systems Local and metropolitan

area networks, IEEE Computer Society, 2010, p. 92.

[117] Wikipedia, “OSI model,” wikipedia.org, 25 February 2014. [Online]. Available:

http://en.wikipedia.org/wiki/OSI_model. [Accessed 26 February 2014].

[118] R. M. (B.E.), “A Generic Parallel Genetic Algorithm,” Octuber 2003. [Online].

Available: http://www.maths.tcd.ie/~rmurphy/Project/Report/report.html. [Accessed 28

February 2014].

[119] J. O.Pierini and E. A.Gómez, “TIDAL FORECASTING IN THE BAHIA BLANCA

ESTUARY, ARGENTINA,” INTERCIENCIA, vol. 34, no. 12, pp. 851-856, 2009.

Scenario implementation in NS2

The implementation of a wireless scenario in NS2 is performed by programming a tcl script.

First, we defined the different options for the scenario:

After the variables are set, the main program is set up, the variables are initialised and the PU

model file is loaded. A God (General Operations Director) is also created, according to [100] “a

God is the object that is used to store global information about the state of the environment,

network or nodes that an omniscient observer would have, but that should not be made known to

any participant in the simulation”.

Set val(chan) Channel/WirelessChannel; # channel type

set val(prop) Propagation/TwoRayGround; # radio-propagation model

set val(netif) Phy/WirelessPhy; # network interface type

set val(mac) Mac/802_11; # MAC type

set val(ifq) Queue/DropTail/PriQueue; # interface queue type

set val(ll) LL; # link layer type

set val(ant) Antenna/OmniAntenna; # antenna model

set val(ifqlen) 100 ; # max packet in ifq

set val(nn) 2; # number of mobile nodes

set val(rp) NOAH; # routing protocol

Mac/802_11 set basicRate_ 6Mb # set the Basic Rate

Mac/802_11 set dataRate_ 54Mb # set the Data Rate

The next step is to configure the nodes with parameters selected before varying their position.

We create in total nn number of nodes.

set ns_ [new Simulator] #Create a new simulator

set tracefd [open simple.tr w] #Initialise a new trace

$ns_ trace-all $tracefd

set topo [new Topography] #Set the topography

$topo load_flatgrid 1200 1200

set pumap [new PUMap] #create PUMap variable

$pumap set_input_map "map_$val(punum)_$val(wload).txt"

create-god $val(nn) # Create God

$ns_ node-config -adhocRouting $val(rp) \ # Put the configuration data

-llType $val(ll) \ # into the ns2 node config.

-macType $val(mac) \

-ifqType $val(ifq) \

-ifqLen $val(ifqlen) \

-antType $val(ant) \

-propType $val(prop) \

-channelType $val(chan) \

-phyType $val(netif) \

-topoInstance $topo \

-agentTrace ON \ #Show Agent trace

-routerTrace OFF \

-macTrace ON \ #Show MAC trace

-movementTrace OFF \

Then we set the node position, x, y and z. This is fundamental because the position of the nodes

have impact on PU interference..

At this point the topology, the traffic specifications and the Agents are created. There are different

possibilities, depending on if TCP or UDP is used:

• Configuration for TCP with a FTP application

for {set i 0} {$i < $val(nn) } {incr i} {

set node_($i) [$ns_ node] #Set new node

$node_($i) random-motion 0; #Disable random motion

$node_($i) node-CR-configure $pumap #Load the PU model file

ns-random 1

$node_(0) set X_ 200

$node_(0) set Y_ 300

$node_(0) set Z_ 0.0

$node_(1) set X_ 200

$node_(1) set Y_ 550

$node_(1) set Z_ 0.0

set tcp_(0) [new Agent/TCP] #Create a new TCP agent

set sink_(0) [new Agent/TCPSink] #Create a new TCP sink agent

$ns_ attach-agent $node_(0) $tcp_(0) #Attach node 0 to the TCP agent

$ns_ attach-agent $node_(1) $sink_(0) #Attach node 1 to the TCP sink

$ns_ connect $tcp_(0) $sink_(0) #Connect both agents

$set ftp_(0) [new Application/FTP] #Create a new FTP application

$ftp_(0) attach-agent $tcp_(0) #Attach the FTP to the TCP agent

$ns_ at 5.0 "$ftp_(0) start" # Tell the FTP to start at second 5

• Configuration for UDP with a Constant Bit Rate (CBR) application

Both TCP and UDP agents can be used with real traffic played back just adding as application a

trace player as described in 3.3.5.2. In following lines, we show an example of how another two

TCP traffic nodes (playing back real traffic traces) are configured. These nodes are placed inside

the scenario and their activity will affect the rest of the wireless medium in a totally different way

than primary users do.

• Adding two TCP nodes with a played back trace application.

set udp_(0) [new Agent/UDP] #Create a new UDP agent

set sink_(0) [new Agent/Null] # Create a new sink agent

$ns_ attach-agent $node_(0) $udp_(0) #Attach node 0 to the UDP agent

$ns_ attach-agent $node_(1) $sink_(0) #Attach node 1 to the sink agent

$ns_ connect $udp_(0) $sink_(0) #Connect both agents

set cbr_(0) [new Application/Traffic/CBR] #Create new CBR application

$cbr_(0) set packetSize_ 1400 #Set the packet size to 1400 bytes

$cbr_(0) set rate_ 30000000 #Set the CBR data rate (bit/s)

$cbr_(0) attach-agent $udp_(0) #Attach the CBR to the UDP agent

$ns_ at 5.0 "$cbr_(0) start" # Tell the CBR to stat at second 5

When the simulation is completed, the function that calculates the Total MAC busy time is called

for both nodes. This time is stored in a file.

The last commands tell the simulator to end all the agents, end the simulation and close ns2. Once

everything is set, the only step remaining is to start the NS2 simulation using the script above.

set tcp0 [new Agent/TCP]

$ns_ attach-agent $node_(2) $tcp0 Attach node 2 to to tcp0 agent

set tfile1 [new Tracefile] # Create a Trace traffic

$tfile1 filename "values3.txt.if-1.bin" #Select a file to be played

set trac0 [new Application/Traffic/Trace] # Create a new Traffec trace app

$trac0 attach-tracefile $tfile1 #Assign the file to the application

$trac0 attach-agent $tcp0 #Attach the app to it tcp0 agent

set sink0 [new Agent/TCPSink]

$ns_ attach-agent $node_(3) $sink0 #Attach the Sing to node 3

$ns_ connect $tcp0 $sink0 #Connect the Cross-Traffic

$ns_ at 5.0 "$trac0 start" #Start playing the trace

$ns_ at 100.0 "$node_(0) compute-mac-busy"

$ns_ at 100.0 "$node_(1) compute-mac-busy"

Physical and MAC layer

parameters for 802.11

Parameter Value Comments

Packet payload 12000 bits Useful data content of the packet

MAC header 224bits Size of the MAC header

Control frame bit/s 11Mbps Frames used to facilitate data

exchange between stations, such

as RTS, CTS, ACK, etc [12]

11g_PHY header 136bits@6Mbps N/A The PHY header is sent at 6Mbps

ACK 112bits+11g_PHY

header

Acknowledgement message

packet size.

ACK frame bit/s 24Mbps Bit rate for the ACK

Data frame bit/s 54Mbps Bit rate for the Payload

Propagation Delay 1us Time to reach the destination (over

the medium)

SIFS 10us (16us between data

and ACK)

Short Inter-frame Space

DIFS 50us DCF Inter-frame Space

Table 26: Frame parameters of 802.11g [15]

Parameter Value Comments

SlotTime 9µs Basic unit of timing for the protocol

CCATime 15µs Clear-channel assessment

RxTxTurnaroundTime 2µs The time that last from sending of the MAC

transmission request to the PHY layer and the

instant when the first bit is transmitted.

SIFSTime 16µs Short Inter-frame Space

PreambleLength 96bits /

128 bits

(*) Short preamble / Long preamble

PLCPHeaderLength 40bits (**)

PLCPDataRate 6Mbps Transmission rate for the PLCP (Preamble)

CWmin 15 Minimum Contention Window

CWmax 1024 Maximun Contention Window

Table 27: Timing parameters of IEEE 802.11g standard [15]

(*)Preamble length: The preamble is defined in the first part of the Physical Layer Convergence

Protocol/Procedure (PLCP) Protocol Data Unit (PDU) [116]. The PLCP layer is the interface with

the MAC protocol data units (MPDUs) to be transferred between MAC stations over the PMD.

PMD is the method of transmitting and receiving data through the wireless medium.

(**)The preamble contains a header that has information identifying the modulation scheme,

transmission rate, and transmission time of the whole data frame.

Short and Long preamble: The short PLCP preamble and header is defined as optional in 802.11g

[116]. The Short Preamble and header may be used to minimize overhead and, thus, maximize the

network data throughput.

Tests of TCP throughput over time

and other parameters for different PU activity ON-

OFF patterns.

In this appendix, some examples using Scenario 1 for different ON-OFF patterns are

presented.

a. 50% ON time TCP available throughput results for different values

In order to analyse this case, we set alpha equal to two times beta, resulting in a 50% of the

time the PU is ON and 50% OFF. The results show that as the retransmission time after a PU ON

period matches with the OFF periods, the resulting available throughput is greater until is zero

again. In this case, the zero throughput alpha/beta combinations are in multiples of beta equal to

0.7. There exists a relation between the alpha/beta values and the available TCP throughput. As

the ON-OFF pattern is periodic, whenever the Retransmission time matches with the PU ON

period, these Retransmission time doubles with every retransmission, matching always with ON

PU activity. Therefore, the resulting throughput is zero as the retransmissions always match with

PU ON activity time. This is depicted in Figure 75.

alpha 1.4 beta 0.7

alpha 1.6 beta 0.8

alpha 1.8 beta 0.9

alpha 2 beta 1

alpha 2.4 beta 1.2

alpha 2.7 beta 1.35

alpha 2.8 beta 1.4

Figure 75: Graphs of TCP throughput over time for 50% PU ON

b. TCP available throughput over time, Congestion window, Current

RTO multiplicative factor, Smoothed RTT factor and Slow-start

threshold.

The following parameters are presented in graphs in order to be properly visualised:

• The TCP available throughput over time

• Congestion window over time

• Current RTO multiplicative factor over time

• Smoothed RTT factor over time

• Slow-start threshold over time

In Table 28 it is clearly shown, that the congestion window grows large where the throughput

is also high. When there are a long periods where the throughput is zero, the RTO multiplicative

factor grows to a large value, meaning that large retransmission times are experienced during these

periods. The slow-start threshold goes to half with every timeout and only grows when packets are

successfully acknowledged. Similar results can be observed in Table 29 but at a lower scale. As a

consequence of this, the congestion window never grows to values as big as in Table 28.

In Table 30it is clearly shown how the congestion window reduces when a packet is dropped

and therefore a timeout event occur, but the congestion window is high almost all the time. As the

congestion window is high due to few packets dropped, also the Slow-start threshold is almost all

the time large and in contrast, the RT multiplicative factor is very low because there are no

consecutive retransmission attempt failures.

TCP available bandwidth over time for alpha 1.5 and beta 0.5

Congestion window over time for alpha 1.5 and beta 0.5

Current RTO multiplicative factor over time for alpha 1.5 and beta 0.5

Slow-start threshold over time for alpha 1.5 and beta 0.5

Table 28: Throughput over time, Congestion window, Current RTO multiplicative factor and Slow-start threshold for alpha 1.5 and beta 0.5

TCP available bandwidth over time for alpha 0.5and beta 0.1

Congestion window over time for alpha 0.5and beta 0.1

Current RTO multiplicative factor over time for alpha 0.5and beta 0.1

Slow-start threshold for alpha 0.5and beta 0.1

TCP available bandwidth over time for alpha 1.5 and beta 0.1

Congestion window over time for alpha 1.5 and beta 0.1

Current RTO multiplicative factor over time for alpha 1.5 and beta 0.1

Slow-start threshold for alpha 1.5 and beta 0.1

c. Analysis of a wide range of alpha and beta values and random

generated patterns

In order to characterize the response of the available TCP throughput for different ON-OFF

traffic patterns of PU activity, we have carried out a set of tests with 28 alpha values combined

with 28 beta values. These tests are repeated ten times to give robustness and measure the effect

of randomness in the PU activity pattern resulting in a total of 7,840 simulation runs.

The fact that the patterns are generated with different random seeds is fundamental for the results.

As the ON-OFF pattern generated is totally different every time the test a simulation run is

executed, the comparison has to be done by doing an average of several tests with the same alpha

and beta, otherwise the result are also highly random and comparison is unfair. Therefore, the

MAC busy time will be different for the same alpha/beta Also the duration of the simulation is

important. The longer it is, the more robust are the results, but due to the high computational

demand, we could not simulate very long.

alpha 0.0001, 0.001, 0.01 and from 0.1 to 2.5 with 0.1 increment beta 0.0001, 0.001, 0.01 and from 0.1 to 2.5 with 0.1 increment Repetitions 10 Simulation duration 100 seconds

Figure 76: TCP throughput heat-map

Figure 76 shows the result of the set of tests done for TCP. The axes represent alpha, beta and the

average throughput, and the colour represents the standard deviation among the number of test for

the same value.

Analysing the graph, that a higher throughput is achieved when beta is very close to zero and

therefore the PU ON state is very short. The maximum throughput is around 20 Mbps. The

throughput decreases as beta grows, however if alpha is also big, the decease is much lower than

the case where alpha is large. When beta is larger than alpha, the throughput is almost zero because

the probability of having PUs active is very high.

Regarding the standard deviation, when beta is higher than alpha, the throughput is very low and

the standard deviation is large. When throughput is very low or large, the standard deviation is

very close to zero as the results of the tests are very similar. The standard deviation is very low

when the throughput is maximun because there is almost no PU activity. In the areas with very

low available bandwidth, the standard deviation is low because the PUs are either all the time

active or they have long ON and very short OFF periods, so there is no time to send packets. The

standard deviation is big when alpha and beta are big because, as the simulations are done for a

limited period of time, there is the chance to get a random combination that gives some long OFF

periods (several seconds) inside the simulation time. Therefore, the TCP throughput will be higher

because there will be enough time to rise the TCP Congesting Window to the maximum possible.

If the random combination gives no long OFF periods during the simulation time, the throughput

will be much lower.

We use several ON/OFF patterns in order to compare the response of the available UDP throughput

with the TCP. These patterns are the same for TCP, giving a total of 7,840 tests, 10 for each

combination. Also the set up parameters are the same as in the TCP analysis (above in the section).

Figure 77: UDP throughput heat-map

Also can be seen from Figure 77, the results for UDP are similar to the TCP results. The throughput

decreases as beta becomes larger, however if alpha is also big, this decease is much lower than the

case where alpha is big. When beta is higher than alpha, the throughput is almost zero because the

probability of having PUs active is very high. If alpha and beta bot are high, the throughput

decrease is almost proportional to the relation of alpha and beta.

Average on tests results

In this appendix is presented one example of the average on the tests presented in section 4.

In this case, is shown in Table 31 the results of the equation 2 performed with the non-random

« � 2.2 and ¬ � 1.1 using the same initial population.

The table presents the following parameters:

MAE (Mean Absolute Error)

MSE (Mean Squared Error)

MAPE (Mean Absolute Percentage Error)

Fitness equation result

MAE_bp (MAE before prediction)

MSE_bp (MSE before prediction)

MAPE_bp (MAPE before prediction)

Loop (number of loops run during the GA)

The results shown in section 4 are the average of this Table 31, which are presented in Table

32 in this same appendix.

MAE MSE MAPE fit_eq MAE_bp MSE_bp MAPE_bp loop 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 0.38725 0.20236 401.76 0.29372 0.42624 0.2525 436.47 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 1.09 1.3648 1669.2 0.42044 1.1071 1.3785 1571.9 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 1.2841 1.7253 1752 0.35623 1.3072 1.8072 1650 100 0.12517 0.071437 2.4472 0.95208 0.11462 0.050327 2.1426 100 0.11258 0.048887 9.6118 0.3308 0.17542 0.11209 9.7925 100

1.1552 1.3602 1403 0.43342 1.1207 1.3073 1320.8 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 0.075527 0.028881 1.2754 0.95251 0.10371 0.049862 1.5681 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 0.1311 0.074822 3.9571 0.92631 0.1343 0.07955 3.8619 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 1.1552 1.3602 1403 0.43342 1.1207 1.3073 1320.8 100 1.0358 1.2866 1643.5 0.43792 1.0489 1.2835 1547.6 100 1.1552 1.3602 1403 0.43342 1.1207 1.3073 1320.8 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 1.1552 1.3602 1403 0.43342 1.1207 1.3073 1320.8 100 1.3915 2.2739 1240 0.30281 1.4061 2.3024 1168.1 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 0.075273 0.02611 5.959 0.95097 0.10921 0.051556 6.0353 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 0.23178 0.18826 7.3442 0.84474 0.23346 0.18379 7.0921 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100

Table 31: Results of equation 2 with same initial population with non-random alpha 2.2 and beta 1.1

Average:

MAE MSE MAPE fit_eq MAE_bp MSE_bp MAPE_bp loop 0.7881786 1.72625314 266.191194 0.3635312 0.8897832 2.0880695 253.29365 100

Table 32: Average of equation 2 resulting of run the GA 50 times

General overview of GA results

In this appendix is presented the average results for the fitness evaluation in all the scenarios

analysed in chapter 4. In addition, the average results for the selection method and diversity with

the same initial population are shown.

Furthermore, the best and worst results (MAPE) of each individual set of tests of section 4 are

presented in this appendix.

Table 33 shows the average results of the fitness evaluation in all the scenarios analysed. In Table

33 (a) and (b), the results of same initial population and random initial population for the cosine

function are shown. Then, the results of same initial population and random initial population for

the non-random traffic with ON-OFF pattern are shown in Table 33 (c) and (d). In addition, the

results of same initial population and random initial population for the random traffic with ON-

OFF pattern are shown in Table 33 (e) and (f). Finally, in Table 33 (g) and (h) are shown the results

of same initial population and random initial population for real traffic played back in NS2.

Same initial population cosine function (a) Random initial population cosine function (b)

EQ1 EQ2 EQ3 EQ4

F ITN ESS EVA LUATION

FOR CO SINE F UN C TION

MAPE MAPE_bp

EQ 1 EQ 2 EQ 3 EQ 4

F ITN ESS EVALUATION

FO R CO SIN E FUNC TION

MAPE MAPE_bp

Same initial population for non-random

traffic with ON-OFF pattern (c)

Random initial population for non-random

traffic with ON-OFF pattern (d)

Same initial population random traffic with

ON-OFF pattern (e)

Random initial population random traffic with

ON-OFF pattern (f)

Eq 1 Eq 2 Eq 3 Eq 4

F ITN ESS SE LEC TION

METH OD FO R NO N-

R AN D OM TR A FFIC

MAPE MAPE_bp

Eq 1 Eq 2 Eq 3 Eq 4

F ITN ESS SELEC TION

METH OD NO N -

R A ND OM TR AF FIC

MAPE MAPE_bp

Eq 1 Eq 2 Eq 3 Eq 4

F ITNESS SELEC TION

METH OD FO R R AN D OM

TR A FFIC

MAPE Error MAPE_bp

Eq 1 Eq 2 Eq 3 Eq 4

F ITN ESS SE LEC TION

METH OD FO R R AN D OM

TR A FFIC

MAPE MAPE_bp

Same initial population for real traffic (g) Random initial population for real traffic (h)

Table 33: General fitness functions results

Table 34 shows the average results of the diversity and selection method in all the scenarios

analysed. The results of same initial population with a cosine function are shown in Table 34 (a).

In addition, the results of same initial population for the non-random traffic with ON-OFF pattern

are shown in Table 33. Then, the results of same initial population for the random traffic with ON-

OFF pattern are shown in Table 33 (c). Finally, the results of same initial population for real traffic

played back in NS2 are shown in Table 33 (d).

Eq 1 real fit Eq 2 Eq 3 Eq 4

F ITN ESS SELEC TION

METH OD FO R R EA L

T RA FFIC

MAE MAE_bp

Eq 1 Eq 2 Eq 3 Eq 4

F ITN ESS S ELEC TION

METH OD FO R REAL

TR A FFIC

MAPE MAPE_bp

D IVERSITY A ND

SELEC TION METH OD

EFFECT

MAPE MAPE_bp Generations

12141618202224262830

D IVE RSITY A ND

SELEC TION METH OD

EFFECT

MAPE MAPE_bp

Same initial population cosine function (a) Same initial population non-random (b)

Same initial population random (c) Same initial population real traffic (d)

Table 34: General results for the diversity and selection method

Table 35 shows the best and the worst results in each individual set of tests for each scenario. This

could help to observe which combination of the selection method and diversity for equation 1 leads

to obtain the best result.

Cosine function Non-random Random Real traffic Eq1 Best Worst Best Worst Best Worst Best Worst

real fit d=0 3.53% 373.72% 7.640% 27.44% 3.87% 357.87% 25,13% 38,02%

ranking d=0 3.53% 983.35% 1.054% 44.85% 3.87% 357.87% 24,04% 42,80%

exponential

3.35% 193.21% 0.767% 145.39% 5.67% 345.2% 16,80% 38,02%

real fit d=1 2.20% 435.50% 0.767% 169.07% 3.81% 7.215% 13,51% 55,72%

ranking d=1 3.53% 380.11% 0.,767% 205.57% 0.611% 19.78% 14,04% 40,94%

exponential

3.00% 212.64% 0.767% 169.07% 13.13% 32.07% 13,51% 55,72%

Table 35: MAPE best and worst results

0100200300400500600700800

SELEC TION A ND

D IVE RSITY METH OD

EFF ECT

MAPE MAPE_bp

DIVERSITY A N D

S ELEC TIO N MET HO D

EFF ECT

MAPE_bp MAPE

Study of TCP Available Bandwidth Using NS2 and Its ... · medium is a highly demanded topic of...

Documents

Transcript of Study of TCP Available Bandwidth Using NS2 and Its ... · medium is a highly demanded topic of...

Chapter 10: Introduction to Network Simulator (NS2)czou//CDA6530-12/NS2-tutorial.pdf · Where to Run NS2 Our department unix server - eustis.eecs.ucf.edu has installed ns2 Connect

Ns2 Lecture

8. NS2 - 1

Ns2 Trace Formats

Ns2 introduction 2

Ns2 Tutorial

Ns2 leadershipc3

Ns2 Manual

NS2 Examples

NS2 Routage

Chapter 10: Introduction to Network Simulator (NS2)czou/CDA6530-12/NS2-tutorial.pdfWhere to Run NS2 Our department unix server - eustis.eecs.ucf.edu has installed ns2 Connect it using

NS2 Documentation

NS2 Configuration Guide

Tutorial NS2

Namespace workshop - Dyalog · ns0←⎕NS'' ⋄ ns1←⎕NS'' ⋄ ns1.ns2←ns0 ⎕←ns1=ns1.ns2.## 0 ⍝ ns1.ns2 is ns0 which was created in # ns3←ns1.ns2.⎕NS'' ⋄ ⎕←ns0=ns3.##

Ns2 Research.

Ns2 Introduction

Part I: NSPart I: NS2 Part I: NS2 BasicsBasics - UBC ECEteerawat/publications/NS2/01-Simulation.pdfPart I: NSPart I: NS2 Part I: NS2 BasicsBasics Textbook: T. Issariyakul and E. Hossain,

Nat Sci 2 - University of California, Irvine · nat sci 2 ns2 rm 1426 ns2 rm 1436 ns2 rm 1436 hood a ns2 rm 1436 hood b ns2 rm 1436 hood c ns2 rm 2321 ns2 rm 2321 hood a ns2 rm 2327

Network Performance and NS2 - thuvienso.bvu.edu.vnthuvienso.bvu.edu.vn/...Performance-and-NS2.pdf · NETAPPS2010 - Network Performance and NS2 Tutorial. Slide 2. Learning Outcomes