Post on 27-Jul-2018
Study of TCP Available Bandwidth Using NS2 and Its Forecasting Based on Genetic Algorithm
Cristian Hernandez Benet
Francisco Domingo Sanchez Vizcaino
Faculty of Health, Science and Technology
Computer Science
30 ECTS
Andreas Kassler and Enrica Zola
Donald Ross
140603
Study of TCP Available Bandwidth using NS2 and its
forecasting based on Genetic Algorithm
Cristian Hernandez Benet
Francisco Domingo Sanchez Vizcaino
© 2014 by Cristian Hernandez Benet, Francisco Domingo Sanchez Vizcaino and Karlstad
University
iii
This thesis is submitted in partial fulfilment of the requirements
for the Master’s degree in Computer Science. All material in this
thesis which is not our own work, has been identified and no
material is included for which a degree has previously been
conferred.
Cristian Hernandez Benet
Francisco Domingo Sanchez Vizcaino
Approved, June 03, 2014
Opponent: Jonathan Vestin
Advisor: Andreas Kassler
Co-advisor: Enrica Zola
Examiner: Donald Ross
v
Abstract
On the one hand, the available bandwidth in a bandwidth-limited medium as the wireless
medium is a highly demanded topic of study. On the other hand, the Transport Control Protocol
(TCP) is one of the most used transport protocols on the Internet. The available bandwidth study
and TCP constitute the most typical scenario in the Wireless Local Area Networks (WLAN). This
Thesis locates the study in the 2.4GHz frequency band where Primary Users can be present
modifying the behaviour of the WLAN medium. This band is unlicensed and, as a consequence of
this, the congestion is considerable. Nowadays, several studies of this band are related to Cognitive
Users operating in this band. However, this thesis studies the impact of Primary Users in the TCP
available bandwidth in a classic IEE 802.11g WLAN network. The second part of the dissertation
takes a step forward, a tool to forecast the TCP available bandwidth for WLAN based on a genetic
algorithm has been developed. This tool is able to estimate the future available bandwidth finding
the best function that will fit better to the future behaviour of the network. A genetic algorithm
programmed specifically for this purpose finds this function. A significant number of tests have
been carried out in the study. The TCP available bandwidth study shows a relation between MAC
busy time and available bandwidth in some cases. In addition, the study shows that the TCP
available bandwidth increases if the idle periods are longer. Reliable results in the forecasting have
been achieved with a limitation in some specific scenarios.
.
viii
Contents
CHAPTER 1 INTRODUCTION ................................................................................................................ 1
1.1 PROJECT OVERVIEW ..................................................................................................................................... 1
1.2 MOTIVATION ............................................................................................................................................. 5
1.3 OBJECTIVES ............................................................................................................................................... 5
1.4 METHODOLOGY.......................................................................................................................................... 5
1.5 RESULTS .................................................................................................................................................... 5
1.6 ORGANIZATION OF THE DISSERTATION ............................................................................................................. 5
CHAPTER 2 BACKGROUND AND RELATED WORK .................................................................................. 7
2.1 INTRODUCTION........................................................................................................................................... 7
2.2 BACKGROUND ............................................................................................................................................ 8
2.2.1 Wireless network and protocols ................................................................................................... 8
2.2.1.1 IEEE 802.11g ................................................................................................................................................... 8
2.2.1.1.1 IEEE 802.11 MAC protocol ..................................................................................................................... 8
2.2.1.1.2 Throughput at MAC layer ...................................................................................................................... 9
2.2.1.1.3 Physical layer ........................................................................................................................................ 10
2.2.1.2 Transmission Control Protocol .................................................................................................................... 10
2.2.1.2.1 Throughput TCP .................................................................................................................................... 12
2.2.1.2.2 TCP Retransmission TimeOut .............................................................................................................. 12
2.2.2 Genetic Algorithm ...................................................................................................................... 13
ix
2.2.2.1 Concept of genetic algorithm ...................................................................................................................... 13
2.2.2.2 Elements and biological translation ........................................................................................................... 13
2.2.2.2.1 Chromosome ........................................................................................................................................ 14
2.2.2.2.2 Genes .................................................................................................................................................... 14
2.2.2.2.3 Genotype and phenotype .................................................................................................................... 14
2.2.2.2.4 Generation ............................................................................................................................................ 15
2.2.2.2.5 Population ............................................................................................................................................. 15
2.2.2.2.6 Summarize of GA vocabulary .............................................................................................................. 16
2.2.2.3 Genetic algorithm structure ........................................................................................................................ 16
2.2.2.4 Operations of the genetic algorithm .......................................................................................................... 17
2.2.2.4.1 Initial population .................................................................................................................................. 17
2.2.2.4.2 Fitness Function .................................................................................................................................... 18
2.2.2.4.3 Selection ................................................................................................................................................ 18
2.2.2.4.4 Reproduction ........................................................................................................................................ 19
2.2.3 Time series and forecasting ........................................................................................................ 20
2.2.3.1 Time series .................................................................................................................................................... 20
2.2.3.2 Phase space and attractor construction ..................................................................................................... 21
2.2.3.2.1 Time delay ............................................................................................................................................. 23
2.2.3.2.2 Embedding dimension ......................................................................................................................... 23
2.2.3.3 Forecasting ................................................................................................................................................... 24
2.2.3.4 Forecasting methods ................................................................................................................................... 24
2.2.4 Network Simulator NS2 .............................................................................................................. 26
2.2.4.1 Architecture of NS2 ...................................................................................................................................... 26
2.2.4.2 PHY layer and MAC layer parameters ........................................................................................................ 27
2.2.4.3 NS2-CRAHN implementation ...................................................................................................................... 27
2.2.4.3.1 Modifications of the MAC layer .......................................................................................................... 28
2.2.4.3.2 Primary Users Activity block ................................................................................................................ 28
2.2.4.3.3 Spectrum Manager ............................................................................................................................... 30
2.3 RELATED WORK ........................................................................................................................................ 31
2.3.1 Traffic modelling and forecasting using genetic algorithms for next-generation cognitive radio
applications 31
2.3.2 End-to-end Protocols for Cognitive Radio Ad Hoc Networks: An Evaluation Study .................... 32
2.4 SUMMARY ............................................................................................................................................... 33
CHAPTER 3 DESIGN AND IMPLEMENTATION ...................................................................................... 35
3.1 INTRODUCTION......................................................................................................................................... 35
3.2 DESIGN OF THE WIRELESS SCENARIO ............................................................................................................. 36
3.2.1 Scenario 1: Primary users ........................................................................................................... 36
x
3.2.2 Scenario 2: Secondary users with real traffic inside the sensing area ........................................ 37
3.2.3 Scenario 3: Secondary users with real traffic outside the sensing area and only one affected .. 38
3.3 SIMULATION IMPLEMENTATION IN NS2 ........................................................................................................ 38
3.3.1 General simulation outline ......................................................................................................... 39
3.3.2 Configuration of NS2 for IEEE 802.11g ....................................................................................... 41
3.3.3 MAC implementation in NS2 ...................................................................................................... 42
3.3.3.1 Implementation for Primary Users ............................................................................................................. 43
3.3.3.1.1 PU spectrum management .................................................................................................................. 43
3.3.3.1.2 PU activity detection ............................................................................................................................ 45
3.3.3.1.3 PU activity pattern and busy time ....................................................................................................... 45
3.3.3.2 MAC busy time measurement .................................................................................................................... 45
3.3.4 No Ah-Hoc Routing Agent (NOAH) ............................................................................................. 46
3.3.5 Acquisition and playback of real wireless traffic ........................................................................ 47
3.3.5.1 Acquisition and conversion of traffic trace ................................................................................................ 47
3.3.5.2 Playback of real traffic in NS2 ..................................................................................................................... 47
3.3.6 Data analysis and graphic representation ................................................................................. 48
3.3.7 Implementation of the wireless scenario in NS2 ........................................................................ 48
3.4 GENETIC ALGORITHM IMPLEMENTATION ........................................................................................................ 49
3.4.1 General GA outline ..................................................................................................................... 49
3.4.2 Encoding of the chromosome ..................................................................................................... 51
3.4.3 Definition of the fitness function ................................................................................................ 53
3.4.4 Generation of N random chromosomes ..................................................................................... 57
3.4.5 Calculate the fitness by means of the fitness function for each one of the chromosomes ......... 59
3.4.5.1 Generation of the chromosome set ........................................................................................................... 59
3.4.5.2 Calculation of the chromosome phenotype .............................................................................................. 59
3.4.5.3 Restrictions in the calculation ..................................................................................................................... 60
3.4.5.4 Fitness function ............................................................................................................................................ 60
3.4.6 Elitism process ............................................................................................................................ 61
3.4.7 Mating pool ................................................................................................................................ 62
3.4.8 Selection process ........................................................................................................................ 63
3.4.8.1 Rank-based roulette wheel selection ......................................................................................................... 64
3.4.8.2 Exponential ranking wheel selection .......................................................................................................... 66
3.4.8.3 Example of effects on the selection method ............................................................................................. 67
3.4.9 Crossover operator ..................................................................................................................... 68
3.4.10 Mutation operator...................................................................................................................... 69
3.4.11 New population .......................................................................................................................... 69
3.4.12 Stopping criteria and error evaluation ....................................................................................... 70
xi
3.4.13 Prediction ................................................................................................................................... 70
3.4.14 Interface and parameters ........................................................................................................... 71
3.5 SUMMARY ............................................................................................................................................... 75
CHAPTER 4 EVALUATION .................................................................................................................. 76
4.1 INTRODUCTION......................................................................................................................................... 76
4.2 EVALUATION OF AVAILABLE TCP BANDWIDTH FOR DIFFERENT ON/OFF PU ACTIVITY PATTERNS. ............................. 77
4.2.1 Deterministic patterns ................................................................................................................ 77
4.2.1.1 50 percent fixed ON-OFF rate ..................................................................................................................... 77
4.2.1.2 25 percent fixed ON rate ............................................................................................................................. 79
4.2.1.3 Fixed alpha and different beta values ........................................................................................................ 79
4.2.1.4 Wide range of alpha and beta values ......................................................................................................... 81
4.2.1.4.1 Available bandwidth ............................................................................................................................ 81
4.2.1.4.2 MAC busy time ..................................................................................................................................... 82
4.2.1.4.3 Available bandwidth vs MAC busy time ............................................................................................. 83
4.2.2 Randomly generated patterns .................................................................................................... 84
4.2.2.1 50 percent fixed ON-OFF rate ..................................................................................................................... 84
4.2.2.2 25 percent fixed ON ..................................................................................................................................... 86
4.2.2.3 Fixed alpha and different beta values ........................................................................................................ 86
4.2.2.4 Wide range of alpha and beta values ......................................................................................................... 87
4.2.2.4.1 Available bandwidth ............................................................................................................................ 87
4.2.2.4.2 MAC busy time ..................................................................................................................................... 89
4.2.2.4.3 Available bandwidth vs MAC busy time ............................................................................................. 89
4.2.2.5 Comparison between TCP and UDP throughput ....................................................................................... 90
4.2.3 Available TCP throughput over time, Congestion window, RTO multiplicative factor and
Smoothed RTT analysis. ..................................................................................................................................... 92
4.3 EVALUATION OF AVAILABLE TCP BANDWIDTH WITH REAL TRAFFIC SECONDARY USERS ............................................ 92
4.3.1 Sender inside the sensing area ................................................................................................... 93
4.3.2 Sender outside the sensing area ................................................................................................. 94
4.4 GENETIC ALGORITHM EVALUATION AND TESTS ................................................................................................ 95
4.4.1 Periodic and symmetric function evaluation .............................................................................. 97
4.4.1.1 Evaluation of the fitness function ............................................................................................................... 97
4.4.1.1.1 Evaluation of the fitness function for same initial population.......................................................... 97
4.4.1.1.2 Evaluation of the fitness function for random initial population ..................................................... 99
4.4.1.2 Evaluation of the selection method and diversity for the cosine function.............................................. 99
4.4.1.3 Example of prediction with symmetric and periodical function ............................................................ 101
4.4.2 Non-random traffic with ON-OFF pattern evaluation .............................................................. 102
4.4.2.1 Evaluation of the fitness function ............................................................................................................. 103
xii
4.4.2.1.1 Evaluation of the fitness function for non-random and same initial population .......................... 103
4.4.2.1.2 Evaluation of the fitness function for non-random and random initial population ...................... 107
4.4.2.2 Evaluation of the selection method for the non-random traffic ............................................................ 108
4.4.2.3 Example of one of the best predictions ................................................................................................... 109
4.4.3 Random traffic with ON-OFF pattern evaluation ..................................................................... 110
4.4.3.1 Evaluation of the fitness function ............................................................................................................. 111
4.4.3.1.1 Evaluation of the fitness function for random pattern and same initial population .................... 111
4.4.3.1.2 Evaluation of the fitness function for random pattern and random initial population ................ 114
4.4.3.2 Evaluation of the selection method for the random ON-OFF pattern traffic ........................................ 114
4.4.3.3 Example of one of the best predictions ................................................................................................... 115
4.4.4 Real traffic from NS2 evaluation .............................................................................................. 116
4.4.4.1 Evaluation of the fitness function ............................................................................................................. 117
4.4.4.1.1 Evaluation of the fitness function for real traffic and same initial population .............................. 117
4.4.4.1.2 Evaluation of the fitness function for real traffic and random initial population ......................... 119
4.4.4.2 Evaluation of the selection method and diversity for the real traffic .................................................... 120
4.4.4.3 Example of one of the best predictions ................................................................................................... 121
4.4.5 Limitation ................................................................................................................................. 122
4.5 SUMMARY ............................................................................................................................................. 123
CHAPTER 5 CONCLUSIONS .............................................................................................................. 125
5.1 FINAL CONCLUSIONS ................................................................................................................................ 125
5.1.1 Conclusions ............................................................................................................................... 126
5.1.2 Project Evaluation .................................................................................................................... 130
5.1.3 Problems found during the project ........................................................................................... 130
5.2 FUTURE WORK ....................................................................................................................................... 131
REFERENCES 133
SCENARIO IMPLEMENTATION IN NS2 ............................................................................. 146
PHYSICAL AND MAC LAYER PARAMETERS FOR 802.11 .................................................... 151
TESTS OF TCP THROUGHPUT OVER TIME AND OTHER PARAMETERS FOR DIFFERENT PU
ACTIVITY ON-OFF PATTERNS. ....................................................................................................................... 153
A. 50% ON TIME TCP AVAILABLE THROUGHPUT RESULTS FOR DIFFERENT VALUES ............... 153
B. TCP AVAILABLE THROUGHPUT OVER TIME, CONGESTION WINDOW, CURRENT RTO
MULTIPLICATIVE FACTOR, SMOOTHED RTT FACTOR AND SLOW-START THRESHOLD. ..................................... 156
C. ANALYSIS OF A WIDE RANGE OF ALPHA AND BETA VALUES AND RANDOM GENERATED
PATTERNS 160
xiii
AVERAGE ON TESTS RESULTS ......................................................................................... 163
GENERAL OVERVIEW OF GA RESULTS ............................................................................. 165
xv
List of Figures
Figure 1: General diagram of the thesis work ................................................................................. 5
Figure 2: MAC layer throughput Max rate vs throughput [18] .................................................... 10
Figure 3: Evolution of TCP's congestion window [23] ................................................................ 12
Figure 4: Example phenotype and genotype ................................................................................. 15
Figure 5: Genetic algorithm structure [37] ................................................................................... 17
Figure 6: One point crossover example ........................................................................................ 19
Figure 7: Mutation example .......................................................................................................... 20
Figure 8: From parents to offspring process ................................................................................. 20
Figure 9: Example of Lorenz chaotic attractor [53] ...................................................................... 21
Figure 10: Binding between C++ and OTcl .................................................................................. 26
Figure 11: NS2-CRAHN schema [4] ............................................................................................ 28
Figure 12: PU-log file first part .................................................................................................... 29
Figure 13: ON-OFF model............................................................................................................ 29
Figure 14: Example of ON-OFF model distribution in time ........................................................ 29
Figure 15: ON-OFF distribution programmed in NS2-CRAHN .................................................. 30
Figure 16: TCP throughput for different alpha beta combinations for CRAHN [4] .................... 33
Figure 17: Map of Scenario 1 ....................................................................................................... 37
xvi
Figure 18: Map of Scenario 2 ....................................................................................................... 37
Figure 19: Map of Scenario 4 ....................................................................................................... 38
Figure 20: Simulation overview diagram ..................................................................................... 40
Figure 21: Primary Users flow chart and location ........................................................................ 44
Figure 22: MAC busy time ........................................................................................................... 46
Figure 23: GA general outline ...................................................................................................... 51
Figure 24: Encoded space and solutions space ............................................................................. 52
Figure 25: Shifted window and training set .................................................................................. 54
Figure 26: Chromosome generator function ................................................................................. 58
Figure 27: Fitness calculation ....................................................................................................... 61
Figure 28: Elitism process ............................................................................................................ 62
Figure 29: Mating pool creation ................................................................................................... 63
Figure 30: Effect of the SP on the probability .............................................................................. 65
Figure 31: Effect of C on the probabilities ................................................................................... 66
Figure 32: Ranking vs Exponential............................................................................................... 67
Figure 33: Three selection method example ................................................................................. 68
Figure 34: New population ........................................................................................................... 69
Figure 35: Best chromosome for the prediction ........................................................................... 73
Figure 36: Available bandwidth for 50% PU ON ......................................................................... 78
Figure 37: Available bandwidth for 25% PU ON ......................................................................... 79
Figure 38: Available bandwidth for alpha equal to 0.0768 and different beta values .................. 80
Figure 39: Available bandwidth for alpha equal to 2.8 and different beta values ........................ 80
Figure 40: 3D graphs with the available bandwidth for different alpha/beta combinations and non-
random patterns ............................................................................................................................. 81
Figure 41: 3D graph with the MAC business for different alpha/beta combinations and non-random
patterns .......................................................................................................................................... 83
Figure 42: 3D graph with the MAC business and normalized throughput different alpha/beta
combinations and non-random patterns ........................................................................................ 84
Figure 43: Available bandwidth for 50% ON and random generation ......................................... 85
Figure 44: Available bandwidth for 25% ON and random generation ......................................... 86
xvii
Figure 45: Available bandwidth for alpha equal to 2.8 and different beta values with random
generation ...................................................................................................................................... 87
Figure 46: 3D graphs with the available bandwidth for different alpha/beta combinations and
random patterns ............................................................................................................................. 88
Figure 47: 3D graph with the MAC business for different alpha/beta combinations and random
patterns .......................................................................................................................................... 89
Figure 48: 3D graph with the MAC business and normalized throughput different alpha/beta
combinations and random patterns ............................................................................................... 90
Figure 49: alpha=1.5 and beta=0.5 ............................................................................................... 91
Figure 50: alpha=1.5 and beta=0.5 ............................................................................................... 91
Figure 51: Throughput over time of the real traffic played back .................................................. 93
Figure 53: Fitness selection evaluation for the same initial population with periodic and symmetric
function ......................................................................................................................................... 98
Figure 54: Fitness selection evaluation for random initial population with a periodic and symmetric
function ......................................................................................................................................... 99
Figure 55: Results for the selection method with and without diversity for a periodic and
symmetric function ..................................................................................................................... 100
Figure 52: Cosine function.......................................................................................................... 101
Figure 56: Throughput from non-random pattern with alpha 2.2 and beta 1.1 .......................... 103
Figure 57: Fitness selection evaluation for the same initial population with Non-random ON-OFF
pattern ......................................................................................................................................... 104
Figure 58: Best and worst prediction equation 2 for the same initial population ....................... 105
Figure 59: Example of Equation 2 comparing the MSE ............................................................. 106
Figure 60: Fitness selection evaluation for random initial population with non-random ON-OFF
pattern ......................................................................................................................................... 107
Figure 61: Random initial population for non-random alpha 2.2 and beta 1.1 with MAPE less than
100% ........................................................................................................................................... 108
Figure 62: Results for the selection method with and without diversity for non-random ON-OFF
pattern ......................................................................................................................................... 109
Figure 63: Throughput from a random pattern with alpha 2.2 and beta 0.08 ............................. 111
Figure 64: Training set and prediction zone for random traffic with alpha 2.2 and beta 0.08 ... 112
xviii
Figure 65: Fitness selection evaluation for the same initial population with Random ON-OFF
pattern ......................................................................................................................................... 113
Figure 66: Fitness selection evaluation for the same initial population with Random ON-OFF
pattern (with errors criterion) ...................................................................................................... 113
Figure 67: Fitness selection evaluation for random initial population with random ON-OFF pattern
..................................................................................................................................................... 114
Figure 68: Results for the selection method with and without diversity for random ON-OFF pattern
..................................................................................................................................................... 115
Figure 69: Throughput from the real traffic simulated in NS2 ................................................... 117
Figure 70: Training set area and prediction area for real traffic with scaled values ................... 118
Figure 71: Fitness selection evaluation for the same initial population with real traffic ............ 119
Figure 72: Fitness selection evaluation for random initial population with real traffic.............. 120
Figure 73: Results for the selection method with and without diversity for real traffic ............. 121
Figure 74: Random activity pattern for an alpha 2.6 and beta 1.3 .............................................. 123
Figure 75: Graphs of TCP throughput over time for 50% PU ON ............................................. 156
Figure 76: TCP throughput heat-map ......................................................................................... 161
Figure 77: UDP throughput heat-map......................................................................................... 162
List of Tables
Table 1: MAC layer throughput [18] ............................................................................................ 10
Table 2: Genetic Algorithm vocabulary [36, p. 7] ........................................................................ 16
Table 3: Example of reconstructed vector .................................................................................... 22
Table 4: Example training set window ......................................................................................... 54
Table 5: Chromosome set for a two chromosome population and a training set of ten ............... 59
Table 6: Genotype and phenotype of the chromosome set ........................................................... 60
Table 7: Main parameters ............................................................................................................. 72
Table 8: Set of graphs for the user ................................................................................................ 72
Table 9: Prediction and statistical results...................................................................................... 75
Table 10: Test results for Scenario 2 ............................................................................................ 94
Table 11: Test results for Scenario 3 ............................................................................................ 94
Table 12: Nomenclature of the fitness equations .......................................................................... 96
Table 13: Nomenclature of the selection method ......................................................................... 96
Table 15: GA set-up for the cosine function without diversity and same initial population ........ 98
Table 14: Example 1 for a cosine function ................................................................................. 102
Table 18: Parameters GA for non-random using same initial population .................................. 104
Table 19: Best and worst result for equation 2 with the same initial population........................ 105
Table 20: Worst prediction and another chromosome for tests with equation 2 ........................ 106
Table 22: Best and worst results from random and same initial population ............................... 108
Table 24: Best prediction example for a non-random traffic with ON-OFF pattern .................. 110
Table 25: GA set-up for the random traffic alpha 2.2 and beta 0.08 with the same initial population
..................................................................................................................................................... 112
Table 28: Best prediction example for a random traffic with ON-OFF pattern ......................... 116
Table 29: GA get-up for the real traffic scenario with the same initial population .................... 118
Table 31: Best and worst results from random and same initial population with real traffic ..... 120
Table 33: Best prediction example for a real traffic played back in NS2 ................................... 122
Table 34: Frame parameters of 802.11g [15].............................................................................. 151
Table 35: Timing parameters of IEEE 802.11g standard [15] .................................................... 152
Table 36: Throughput over time, Congestion window, Current RTO multiplicative factor and
Slow-start threshold for alpha 1.5 and beta 0.5 .......................................................................... 157
Table 37: Throughput over time, Congestion window, Current RTO multiplicative factor and
Slow-start threshold for alpha 0.5 and beta 0.1 .......................................................................... 158
Table 38: Throughput over time, Congestion window, Current RTO multiplicative factor and
Slow-start threshold for alpha 1.5 and beta 0.1 .......................................................................... 159
Table 39: Results of equation 2 with same initial population with non-random alpha 2.2 and beta
1.1................................................................................................................................................ 164
Table 40: Average of equation 2 resulting of run the GA 50 times ............................................ 164
Table 41: General fitness functions results ................................................................................. 167
Table 42: General results for the diversity and selection method ............................................... 168
Table 43: MAPE best and worst results ...................................................................................... 168
1
Chapter 1 Introduction
1.1 Project overview
The available bandwidth at Transport Control Protocol (TCP) layer is a very important topic
in Wireless Local Area Networks. This thesis locates the study of the available bandwidth in the
2.4 GHz band, also called Industrial Scientific and Medical (ISM) band [1]. This band is unlicensed
and, as a consequence of this, the congestion is considerable. Additionally to the WLAN users -
also called Secondary Users-, Primary or Licensed Users can use the 2.4 GHz frequency band.
PUs are users with preference to operate in the band and that will interfere Secondary Users leading
to the loss of all data. The Primary Users will use the medium without taking into account the
Secondary Users in the network, but by interfering the Secondary Users. Nowadays, several
studies are related to Cognitive Users [2] operating in this band [3] [4] [5] [6]. However, this thesis
will study the impact of this Primary Users in the TCP available bandwidth in a classic IEEE
802.11g WLAN. IEEE 802.11 does not define a Primary-Secondary user management. IEEE
802.11 uses Carrier Sense Multiple Access with Collision Avoidance (CSMA/CA), and it makes
all users equal in terms of opportunity to access the network. However, Primary Users do not
necessarily have to implement a similar strategy. Therefore, this project considers that if a Primary
User is active in the wireless medium, all the secondary users in the same frequency band and
inside the transmission range of the Primary Users will lose all the sent packets due to
2 CHAPTER 1. INTRODUCTION
interferences. The activity of the Primary Users is defined by means of ON-OFF patterns governed
by a Birth-death Markovian process. The use of these patterns, that are completely defined by only
two parameters (alpha and beta), offers the advantage of covering a wide range of cases in an easily
to be controlled way. A classic situation of a WLAN with only Secondary Users with and without
the “hidden node problem” is also studied. Some interesting conclusions have been drawn.
The second part of the thesis takes a step forward: a tool to forecast the TCP available
bandwidth for WLAN by means of a genetic algorithm (GA) has been developed. This tool is able
to estimate the future available bandwidth finding the best function that will fit better to the future
behaviour of the network. A genetic algorithm that has been specifically programmed for this
purpose finds this function. This GA has been tested in different scenarios simulated in NS2.
Furthermore, different GA functionalities such as fitness equations, selection methods and
diversity methods are studied. These functionalities are tested and evaluated for the scenarios
proposed. Figure 1 depicts a very general outline of the most important work process done in this
thesis.
NS2 simulation set-up
Genetic algorithm set-up
Simulation of TCP availble throughput
Find the best function for forecasting
Forecasting of TCP available throughput
PU activity
pattern
TCP available throughput and MAC busy
time calculation
NS2 traffic
trace
Figure 1: General diagram of the thesis work
CHAPTER 1. INTRODUCTION 3
1.2 Motivation
On the one hand, the available bandwidth in a bandwidth-limited medium as the wireless
medium is a much-demanded topic of study. On the other hand, the Transport Control Protocol
(TCP) is one of the most used transport protocols on the Internet. Both of them constitute the most
typical scenario in the Wireless Local Area Networks (WLAN). This Thesis presents a step further
to the typical situation of a WLAN, locating the study in a very congested frequency band where
Primary Users(PU) can be present modifying the behaviour of the WLAN medium in a particular
way. Most of the current studies are related to Cognitive Users operating in this band. However,
this thesis will study the impact of the Primary Users in the TCP available bandwidth in a classic
IEE 802.11g WLAN network. These reasons motivated this team to analyse the performance of
TCP in terms of available bandwidth in these situations and to develop a powerful tool in order to
forecast the available bandwidth in the immediate future of the WLAN environment.
This study could help in the future to identify which types of PU traffic are more detrimental
to WLAN. In addition, it might be possible to identify the available bandwidth of a user that wants
to get into a network. This might be done by identifying the pattern of the PU activity in the
network and then checking the corresponding available bandwidth – from the presented results-
for that pattern. Furthermore, the study of the effects on the PUs on the Secondary Users could
help to understand the effect of those technologies that nowadays the ISM band [7] is using such
as the Bluetooth, Zigbee, WiMAX, DECT, among others. The Genetic Algorithm could help to
forecast the available bandwidth in those situations and be able to manage this information so as
to take the appropriate decisions according to the specific network requirements.
1.3 Objectives
This project is divided in two main blocks: the study of TCP available bandwidth with a
network simulator and the available bandwidth forecasting using a genetic algorithm. The goal is
to use the results of the former to be used in the latter as input and reference. The main aim is to
study the impact of different ON/OFF patterns of Primary Users (PU) on secondary users (SU)
available throughput. In order to accomplish this goal, an implementation in NS2 [8] network
simulator should be developed. This is immediately followed by the available throughput
4 CHAPTER 1. INTRODUCTION
forecasting for different patterns of ON-OFF PU activity using the GA. The purpose is to develop
the full algorithm step by step in MATLAB and to test it in several situations. In addition, the
intention is also to carry out a study on the relation between TCP available bandwidth and MAC
busy time. Another objective is to play back real captured WLAN traffic in the network simulator
and analyse the available bandwidth in different scenarios with only SU. The last point presented
in this thesis will be the test of the reliability of the GA in real situations with the aim of evaluating
the impact on quality/error for different available bandwidths with real traffic processed in the
simulator.
1.4 Methodology
The methodology is the procedure to be followed during the research and development work
[9]. The Engineering Design Process [10] has been the main method taken into account. This
method defines the problem. Later, a deep background research on the topic is carried out. Once
the background is properly understood, the requirements are specified and a Brainstorming of
possible solutions is performed. The next step is to choose the best solution for the problem so as
to be developed and built. The last step is to test the solution and redesign it whether it is required.
Within this process, we usually jump at any time from one step to another if something to be
improved or modified is found.
The research work is mainly based on the analysis of scientific research publications, books,
science magazines and online sources.
1.5 Results
This point present briefly the results obtained in this thesis. A large number of available
bandwidth results for different PU activity patterns combinations have been obtained. The
available bandwidth tends to grow lightly as the parameters alpha and beta do (alpha and beta are
the parameters that define the pattern behaviour) and the retransmission time has a fundamental
impact on the available bandwidth. Randomness in the PU activity makes predictions very
unreliable. Regarding the MAC busy time (time that the Media Access Control layer is busy), it
seems to be a relationship exists between TCP available bandwidth and MAC busy time.
CHAPTER 1. INTRODUCTION 5
In a scenario with only secondary users and real traffic, they share the bandwidth if they can sense
each other. This situation is opposed to a situation with the “hidden node problem” where the
hidden node affects harmfully to the available bandwidth. The results of the available bandwidth
with PU activity can be also applied to the “hidden node problem” because the response is similar,
but not equal because the hidden node will be affected by the transmission of acknowledgements
of the node that is suffering the interference.
The implementation of the GA is successful and its correct functioning has been proved. However,
some limitations have been encountered in the GA for the proposed scenario. Regarding the
available bandwidth for ON-OFF PU activity patterns tested with the GA, even though the
behaviour of the available bandwidth is chaotic and its forecasting is very complicated, acceptable
results have been obtained.
For further details, please refer to Chapter 4 and Chapter 5.
1.6 Organization of the dissertation
This dissertation is structured in a progressive way. In an early step, the basic concepts will
be presented in order to provide the reader a necessary knowledge to understand the subsequent
chapters. All the terminology is explained once is shown on the text. The thesis is divided into the
following parts:
• Introduction
The first part of the dissertation introduces the reason why the topic of the thesis was selected, the
main goals to be achieved, some guidelines about methodology and an outline of the results of the
study.
• Background and Related work
The aim of this chapter is to describe the basic knowledge about the developed topics in this thesis
so as to understand better the implementation presented in the next chapter.
• Design and Implementation
6 CHAPTER 1. INTRODUCTION
In this chapter, the whole design and implementation of the scenario developed in this thesis is
explained. This part tackles all the specific work done in order to get a software able to deploy the
tests that will be evaluated in the following chapter.
• Evaluation
In this part, the results and evaluation of several tests carried out within the developed
implementation in this thesis are presented.
• Conclusions
In the end, the dissertation draws the conclusions resulted from the implementation and results
of the whole project
7
Chapter 2 Background and Related work
In this chapter all the previous background knowledge considered necessary to properly
understand the implementation and the project scope of this master thesis is described. Also some
of the related work that are relevant for this thesis is pointed out.
2.1 Introduction
This chapter is divided into two parts: the background study and the related work. The former
is broken down again into four main parts. Firstly, general previous information regarding
standards and communications protocols is explained. Secondly, basic concepts about genetic
algorithms that may be helpful in understanding the project are presented. Thirdly, the time series
and forecasting are tackled and finally, some features and basic knowledge regarding the network
simulator used in this thesis are described. The second part of the chapter analyses some previous
studies that can be used as a reference for this thesis.
8 CHAPTER 2. BACKGROUND AND RELATED WORK
2.2 Background
The section describes the basic knowledge about the developed tools in this thesis that will
help to understand better the chapters concerning the implementation and evaluation.
2.2.1 Wireless network and protocols
In this part, the network standards and protocols that apply in this thesis implementation are
explained, including IEEE 802.11g wireless local area network standard and the transmission
control protocol.
2.2.1.1 IEEE 802.11g
IEEE 802.11 is a Wireless Local Area Network (WLAN) standard developed by the Institute
of Electrical and Electronics Engineers (IEEE) and published in 1997 [11]. IEEE 802.11 is a
standard containing a set of Media Access Control (MAC) and physical layer specifications for
the implementation of Wireless Local Area Networks (WLAN) [12] in the 2.4, 3.6, 5 and 60 GHz
frequency bands.
IEEE 802.11g [13] is an advanced version of 802.11 that supports data rates per stream of 6, 9, 12,
18, 24, 36, 48, 54 Mbps. IEEE 802.11g employs a transmission scheme based on Orthogonal
Frequency-Division Multiplexing (OFDM) in the 2.4 GHz frequency band, also called Industrial,
Scientific and Medical (ISM) band.
2.2.1.1.1 IEEE 802.11 MAC protocol
In WLAN, the MAC protocol – protocol used to manage the MAC layer- is what primarily
determines how optimum the bandwidth sharing of the wireless channel is [14]. IEEE 802.11
standard defines two access methods: the Distributed Coordination Function (DFC), which is for
distributed access, asynchronous and it uses contention measures; and the Point Coordination
Function (PCF) for centralized access without contention. As it is described in [15], the DCF
method uses Carrier Sense Multiple Access with Collision Avoidance (CSMA/CA). With this
method, before delivering any data packet, the station senses the medium for a DCF Interframe
CHAPTER 2. BACKGROUND AND RELATED WORK 9
Space (DIFS) [16] to detect if there are other transmissions ongoing. If the medium is sensed as
free for a DIFS duration, the transmitter station keeps sensing the medium a time corresponding
to a random multiple of the slot time – from 0 to the called Contention Window (CW) minus 1-,
this time is called the back-off interval and it is used to minimise the possibility of collisions. If
during the back-off interval the medium becomes busy, the time counter stops until the channel is
idle again for more than a DIFS duration.
The contention window size is not a fixed value. In fact, it changes following an exponential law,
where the CW value is set equal to a specified minimum value noted as CWmin that is used in the
first transmission. The CW is doubled every transmission up to the maximum contention window
- noted as CWmax - if the transmission does not fail. When the Back-off interval ends, the
transmitter can either transmit a data packet or send a Request to Send (RTS) that will be answered
with a Clear to Send (CTS) if the receiver is available. The RTS/CTS handshake is an optional
mechanism and its aim is to avoid problems with interferences and hidden nodes [17]. Once the
packet is received at the destination, a MAC layer acknowledgement (ACK) after a Short Inter-
frame Space (SIFS) will be transmitted.
2.2.1.1.2 Throughput at MAC layer
In Figure 2, what should be approximately the throughput at MAC layer for different
configurations of the maximum WLAN data rate per stream is shown. For 54 Mbps of maximum
WLAN data rate, the expected throughput is around 23 Mbps. The throughput is never the
maximum available data rate of the link as the data in Table 1 shows. There are several factors that
affect the throughput, for example: The ACK that is always sent at a lower data rate, Physical
Layer Convergence Protocol (PLCP) preamble that is sent at a lower data rate, overhead,
transmission range and interferences.
10 CHAPTER 2. BACKGROUND AND RELATED WORK
Max rate
[Mbps]
Throughput
6 5.12
9 7.23
12 9.1
18 12.28
24 14.98
36 18.86
48 22.13
54 23.31
Figure 2: MAC layer throughput Max rate vs throughput [18] Table 1: MAC layer throughput [18]
2.2.1.1.3 Physical layer
The PHY [19] is divided again into 2 sub-layers –as also is the MAC layer- called Physical
Layer Convergence Protocol (PLCP) and Physical Medium Dependent (PMD). On the one hand,
the PLCP layer is in charge of the Carrier Sense (CS) part of the CSMA/CA protocol described in
2.2.1.1.1. On the other hand, the PMD is the layer in charge of the modulation scheme management
and signal encoding.
2.2.1.2 Transmission Control Protocol
Transmission Control Protocol (TCP) is defined in RFC 793 [20] [21]. TCP is defined as “ a
reliable connection-oriented delivery service”. The RFC specified that, TCP was defined with the
aim of providing a very reliable host-to-host protocol for packet-switched computer
communication networks. The term connection-oriented means that a connection must be
established before hosts initiate the data transmissions. This connection is established by means of
the so-called three-way handshake. TCP uses segmentation of data packets. It implies that if one
segment of the whole packet is not received properly, the whole packet will be resent. Reliability
CHAPTER 2. BACKGROUND AND RELATED WORK 11
in TCP is achieved by assigning a sequence number to each segment to be transmitted. In order to
confirm that the packet is received free of errors, TCP sends an ACK for every received
packet/segment. TCP also preserves the packet order.
If an ACK is not received on time -before the retransmission timeout (RTO)-, the data are
retransmitted. RTO is based on the estimated round-trip time (RTT) between the sender and
receiver, as well as the variance in this round-trip time. This is explained in detail in 2.2.1.2.2.
TCP defines a sliding window [22] protocol that is an option to increase the received window size
–or receiver's advertised window - to avoid exceeding the data processing capacity of the receiver.
Each TCP packet/segment contains the current value of the receiver's advertised window. This is
useful because with this option the sender can send “bursts” with the maximum window size
without waiting for the ACK of each packet sent in the burst.
Additionally, a congestion window [23] in transmission side is defined to avoid exceeding the
capacity of the network. TCP uses a mechanism called slow start to increase the congestion
window faster after a connection is initialized and after a timeout event. It starts with a window of
two times the maximum segment size (MSS). MMS defines the largest possible amount of data to
avoid fragmentation, and TCP protocol defines it. The congestion window doubles for every RTT
until it reaches a threshold, this is depicted in Figure 3. When the congestion window is below the
threshold, the congestion window grows exponentially and when the congestion window is above
the threshold, the congestion window grows linearly -1 MMS each RTT -. Whenever a timeout
occurs, the threshold is set to one half of the current congestion window and the congestion
window is then set to one.
12 CHAPTER 2. BACKGROUND AND RELATED WORK
Figure 3: Evolution of TCP's congestion window [23]
2.2.1.2.1 Throughput TCP
The congestion window and the receiver window (receiver’s advertised window) limit the
TCP throughput. In the RFC 6349 sec 3.3 [24] it is defined how to the calculation for the TCP
throughput must be done. The equation 2.1 defines the calculation method.
TCP throughput = TCP RWND ∗ 8��� (2.1)
Where:
- TCP RWND is Receive Window size or receiver's advertised window
- RTT is the Round-Trip Time
2.2.1.2.2 TCP Retransmission TimeOut
As it is described in [25, pp. 199-200], for every received packet TCP send back an
Acknowledgement. If it is not received before a given Retransmission TimeOut (RTO), TCP
sender assumes that that packet is lost and it will retransmit it.
CHAPTER 2. BACKGROUND AND RELATED WORK 13
The computation of the current RTO value follows a trade-off between a small value that will
imply a lot of unnecessary retransmission and a high value that will result in a high latency in
packet loss detection. RTO is a function of the Round Trip Time (RTT). The RTT value is not the
same for all packets and Smoothed RTT and RTT variation [26] are calculated based on several
RTT samples. RTO depends on instantaneous RTT, Smoothed RTT, RTT variation and the Binary
Exponential Back-off (BEB). The BEB or RTO back-off multiplicative factor is set to 1 at the
beginning and doubled with every timeout until an ACK is received. As a result of this, for several
consecutive timeouts, the RTO will grow to a considerable value.
2.2.2 Genetic Algorithm
In this section, some definitions and explanations about the genetic algorithms such as the
basic structure, main terminology and the main functions are presented.
2.2.2.1 Concept of genetic algorithm
A genetic algorithm is a stochastic search algorithm that mimics the process of natural
selection and genetics to try to solve complex [27] problems [28]. It is based upon genetic process
of living beings. It uses historical data to find new points of search for an optimal solution of the
problem, trying to improve the results and converge into the best or expected value [29]. The
research procedure is based on Darwinian theories of natural selection and survival. According to
these theories, populations in nature will evolve according to the principles of natural selection
and survival of the fittest [30].
The main feature of this algorithm is the efficiency exploiting historical information in order to
speculate on new search points with the aim of finding a more optimal point [29]. In addition,
Genetic Algorithm can successfully be applied to a vast variety of problems from different areas.
2.2.2.2 Elements and biological translation
In order to make easier and understandable the terminology used during the implementation
and explanation of the algorithm, some biological concepts and the parallelism that exists between
the biological and the evolutionary algorithms are related in the following sections.
14 CHAPTER 2. BACKGROUND AND RELATED WORK
2.2.2.2.1 Chromosome
All the organisms are composed of one or more cells. Every one contains at least one
chromosome (DNA strands) where the genetic information is encoded [31]. This structure carries
hereditary factors or genes. This is used by the genetic algorithm to encode and store the solution.
Hence, the chromosome is the set of parameters defining a proposed solution to the problems to
be solved.
2.2.2.2.2 Genes
One chromosome can be divided into genes and functional blocks of DNA. Therefore, genes
are the basic units in which a chromosome can be split [32]. A gene is a position or set of positions
in a chromosome. For that reason, it exists in a chromosome as many genes as its number of
variable slots. Each one of these genes encodes a particular feature of an individual. The possible
values of a feature that a gene can take from a fixed set of symbols are known as alleles [33]. Each
one of the positions that a gene can take into a chromosome is called locus.
2.2.2.2.3 Genotype and phenotype
Genotype is the complete set of genes contained in a genome, and thereby the set of inherited
factors in an individual that can be manifested in the individual or not. The genotype will give the
encoded solution of the problem to solve [32]. This information will be copied at the time of
reproduction and will be passed from one generation to the next. For this reason, the genotype only
can be determined by biological tests, not observations as the phenotype.
The phenotype is the set of parameters represented in a chromosome, in other words, it
contains the required information to create an individual (e.g. eye colour) [29]. The phenotype will
give a decoded solution for the given problem [32].
The adaptation of an individual to the problem depends on the evaluation of the genotype,
which can be inferred from the phenotype (chromosome) using the fitness function.
In order to clarify the concepts of genotype and phenotype, the following examples are
presented:
CHAPTER 2. BACKGROUND AND RELATED WORK 15
•••• Example 1: Given an optimisation problem on integers, a set of integers would form the
set of phenotypes. In this manner, one of these phenotypes, for example the phenotype 27,
would be the genotype 11011. As it can be seen in this example, the phenotype space and
genotype space are different because a genotype could evolve giving a phenotype that is
not in the set of integers selected. This is because the evolutionary search is done in the
genotype space. Thus, the optimum solution of a phenotype is obtained by decoding the
genotype [34].
•••• Example 2: Given a child with haemophilia (a group of hereditary disorders that impair
the body’s ability to control blood coagulation), it could occur that the parents did not suffer
this disorder in their health, but they carried the haemophilia genes in their body. Then,
these parents have the same phenotype but not the same genotype [35].
Genotype 1 1 0 1 1
Evolution of genotype
0 1 0 1 1
Phenotype 27
11
Figure 4: Example phenotype and genotype
2.2.2.2.4 Generation
The chromosomes evolve through iterations (called generations) during the progression of the
genetic algorithm. In each iteration, the chromosomes are evaluated using some fitness measures
and being exposed to the genetic operations of selection, crossover and mutation giving as a result
new chromosomes in each generation. Therefore, the best individuals tend to survive and
reproduce in this way propagating their genetic material to future generations.
2.2.2.2.5 Population
A population is a set of chromosomes (possible solutions) that remains constant during the
evolution search (generation). As a starting point for the genetic algorithm, an initial population
has to be created.
16 CHAPTER 2. BACKGROUND AND RELATED WORK
2.2.2.2.6 Summarize of GA vocabulary
All the vocabulary explained above is summarized in Table 2 based on the table given by [36,
p. 7].
Genetic Algorithm Explanation
Chromosome Solution
Genes Part of the solution
Locus Position of the gene
Allele Value of the gene
Phenotype Decoded solution
Genotype Encoded solution
Population Set of chromosomes
Generation Iterations of the GA
Table 2: Genetic Algorithm vocabulary [36, p. 7]
2.2.2.3 Genetic algorithm structure
The general procedure of a genetic algorithm is depicted in Figure 5. First, all parameters,
such as the length of the chromosome, population size, probability of crossover and mutation, etc.,
are set. Afterwards, the initial population is randomly generated. This initial population is
evaluated by the fitness function. Once this is performed, if this population does not achieve the
selected criteria, such as the number of iterations, time elapsed or the optimisation criteria, the best
chromosomes are selected for reproduction. Then, these chromosomes are subjected to the
crossover process generating offspring. Finally, this offspring goes to the mutation process giving
a new population for the next generation. This new population is evaluated again and all the
processes previously explained (all of them with the exception of the parameters setting) are
repeated until the criteria are accomplished.
CHAPTER 2. BACKGROUND AND RELATED WORK 17
Figure 5: Genetic algorithm structure [37]
2.2.2.4 Operations of the genetic algorithm
All genetic algorithms have common elements such as the creation of a population of
chromosomes, the selection depending on their adaptation, the crossover to produce a new
generation and finally the random mutation in the new generation.
2.2.2.4.1 Initial population
The first population is obtained in a random process where the chromosomes are created. The
number of individuals in the population is related to the required computational resources,
increasing these requirements as the population increases. The number of possible solutions and
the search space is greater, despite the resources needed are higher, and the speed of the algorithm
decreases. Thus, it exists there a trade-off between efficiency and effectiveness [33].
Depending on the problem to be solved, different methods to encode the solution can be used. For
example, binary encoding, value encoding, tree encoding, permutation encoding, etc. [38].
18 CHAPTER 2. BACKGROUND AND RELATED WORK
2.2.2.4.2 Fitness Function
The fitness function, also called evaluation function, tests the performance of each
chromosome - potential solution - by measuring how good they are related to the current problem
domain [39].
2.2.2.4.3 Selection
This operator selects chromosomes in the population for reproduction. This is done randomly
done but favouring those chromosomes that have a better fitness, and having the fittest more
possibilities to be selected to reproduce. This selection can be done using different techniques as
in [40], among others, the roulette-wheel, ranking based and tournament selection [41, pp. 17-18].
The methods proposed in [27], [42] and [43] is the roulette wheel selection.
The roulette wheel selection creates a roulette wheel where all chromosomes of the generation are
placed and in which each chromosome has a proportion of the roulette wheel according to its
fitness. Therefore, the chromosome with a better fitness will have more probability to be chosen
because it has a bigger area of this roulette wheel [44]. Having n chromosomes, the roulette will
be split into n portions. In order to select the chromosome, the roulette wheel spins n times,
selecting the portion (chromosome) where it stops each time it spins. The probability of selection
of an individual � is given by the equation 2.2 [44].
���� = ����∑ �������� (2.2)
Where, ���� is the given probability to select the chromosome j and ���� is the fitness of the
chromosome j.
The problem of this selection method is that maintains the diversity chance but with the possibility
that might converge very quickly [37].
CHAPTER 2. BACKGROUND AND RELATED WORK 19
2.2.2.4.4 Reproduction
The reproduction is performed by exchanging the genetic material of two chromosomes using
the crossover operator. The mutation is made with the outcome resulting from this operation. The
result of these operations are two new chromosomes with the features of both parents and, if it is
the case, with the random mutation.
2.2.2.4.4.1 Crossover
The crossover operator is the most important in GA, as it allows the exchange of features from
one generation to the next and thereby the evolution of the species [32]. The main objective is to
get an improvement in the fitness of the offspring.
In the crossover, those chromosomes selected for the reproduction are paired up and crossed over.
This is performed to produce offspring with a certain given probability, with a maximum of 1,
meaning that the parents will not survive.
Within this crossover process, there are different techniques to implement it such as the one point
crossover, two point crossover, uniform crossover [45] and arithmetic crossover [46]. The one
point crossover is now explained and a description of the other techniques can be found in [47].
The one point crossover strategy choses randomly a point along the chromosome (a locus) and
exchanges the genes before and after that point of the two chromosomes to create the new
offspring. One example of this technique is shown in Figure 6.
1 1 1 1 1 1 0 0 0 1 0 1 0 1 0 1 0 1
1 1 1 0 1 0 1 0 1 1 0 1 1 1 1 0 0 0
Figure 6: One point crossover example
2.2.2.4.4.2 Mutation
Once the crossover process is finished, the mutation procedure is carried out his process is
performed in order to avoid finding all solutions in a population into a local optimum fitness value
[48]. Moreover, it is beneficial because it results in the possibility of searching in those areas still
untreated. The mutation process involves the modification of some random genes with a certain
probability of mutation [49]. This process is done by changing one gene to another value or
20 CHAPTER 2. BACKGROUND AND RELATED WORK
interchanging two values for two genes placed in different locus. After this process, the
chromosomes are evaluated again using the fitness function.
There is a trade-off in the choice of the mutation probability. This is because if this mutation
probability is too high, the search will be random, as it will not guarantee the survival of the fittest
chromosomes for the next generation. One example of this technique is shown in Figure 7.
1 2 3 4 5 6 7 8 9
1 2 5 4 3 6 7 8 9
Figure 7: Mutation example
The steps explained until now can be illustrated as the Figure 8.
Parents Offspring
Selection Crossover Mutation
Figure 8: From parents to offspring process
2.2.3 Time series and forecasting
In this section, time series and how are used with the genetic algorithm are explained. This
tries to help in understanding how the genetic algorithm is used for forecasting.
2.2.3.1 Time series
A time series is a sequence of data points, { !} in case of discrete time and {X(t) for
continuous-time case}, belonging to a system, measured in different instants of time and ordered
chronologically which can represent variables such as physical, financial, etc. The main objective
of the time series analysis done here is to study the behaviour to the time series in order to try to
CHAPTER 2. BACKGROUND AND RELATED WORK 21
predict the future evolution, up to a certain time horizon (also called prediction horizon). Time
series analysis can be divided into linear and nonlinear, and univariate and multivariate [50]. Linear
time series follow usually stable patterns in regular intervals that depend linearly on the time series’
past values in contrast with the nonlinear time series, which display a chaotic behaviour [27].
It is possible to characterize the complexity of a system observing one variable belonging to the
system as was demonstrated in the theorem proposed by Takens (1981) [51].
2.2.3.2 Phase space and attractor construction
The theorem proposed by Takens states that the statistical properties of an attractor are
conserved in the delay coordinates1, these coordinates are reconstructed with a temporal series of
one variable belonging to that system.
Thus, to construct the attractor by means of the delay coordinates proposed by Packard et al. [52]
the time delay and the embedding dimension are needed. One example of this can be seen in Figure
9 with the construction of the Lorenz attractor by means of Takens’ theorem.
Figure 9: Example of Lorenz chaotic attractor [53]
1 In the vector space " = "�, "$, "%, … , "� , " is called point or vector and "�, "$, "%, … , "� are called the coordinates
of " where n is natural number and is called the dimension of the space.
22 CHAPTER 2. BACKGROUND AND RELATED WORK
Having the following scalar time series '�, '$, '%, … , '(, obtained from observations in constant
intervals, is possible to reconstruct a vector with embedding dimension m, into an m-dimensional
space, [54] [55] [56] by means of the method of delays as follows:
)*+,- = .'* , '*/0 , … , '*/+12�-03, 4* ϵ ℝ1 7 = 1,2, … , : − +, − 1-<
(2.3)
Where, )* is the reconstructed vector with the embedding dimension m, '* is the observed discrete
value at time n and < is the time delay or embedding time. Thus, the m coordinates of each '* are
samples from the time series separated by a fixed <. The result of that are a series of vectors as
follows:
) = )�, )$, … , )(2+12�-0 (2.4)
Where, N is the length of the original series. The purpose is to form the attractor that preserves the
topology properties of the original unknown attractor. Thus, the idea of such reconstruction is to
capture the original system states in each observation of the system output.
An example applied to evolution of a stock market can be seen the following:
Date Price
1/1/2014 10 000 2/1/2014 10 005
3/1/2014 10 008
4/1/2014 10 003 5/1/2014 10 006
Then, for d=1 and m=4, the reconstructed vectors are defined as:
)� = {10 003, 10 008, 10 005, 10 000}
)$ = {10 006, 10 003, 10 008, 10 005}
Table 3: Example of reconstructed vector
10000100011000210003100041000510006100071000810009
Stock Market
Price
CHAPTER 2. BACKGROUND AND RELATED WORK 23
Therefore, two parameters have to be computed. These parameters are the time delay < and
embedding dimension m.
2.2.3.2.1 Time delay
The method used to calculate the delay is important in order to reconstruct the attractor from
scalar time series and once reconstructed, to be able to estimate the correlation dimension to
evaluate if the scalar time series is chaotic or stochastic.
There is a trade-off in the election of < because it has to be large enough to obtain the largest
amount of new information between 4� and 4�/0 and at the same time has to be small to not be
independent [57].
For the time delay <, different methods and techniques were set and implemented to calculate the
time delay and embedding dimension as the proposed in [55], [56] and [58]. The method used in
this thesis will be the series correlation approaches of autocorrelation and average mutual
information.
These methods are not explained in detail in this thesis but further information can be found in
[59] and [60]. These methods are used from Nonlinear Time Series Analysis (TISEAN) [61].
The calculation of the delay time is based on the average mutual information and it is determined
finding the position of the first minimum represented graphically [62]. Even if, as it was used in
[27] this parameter is set to 1 and it is compensated increasing the embedding dimension, as is
demonstrated also in [63, p. 31].
2.2.3.2.2 Embedding dimension
The other parameter to be found is the embedding dimension, which is the space dimension
where the attractor is reconstructed. Different methods and techniques were set and implemented
to calculate the embedding dimension in [56], [64] and [65]. Kennel et al. [64] proposed a method
used to calculate the minimum embedding dimension called FNN (False Nearest Neighbours).
This method is the selected for the calculation of the m, which is used the tool TISEAN and further
information about this method can be found in [56], [59] and [64].
24 CHAPTER 2. BACKGROUND AND RELATED WORK
2.2.3.3 Forecasting
The univariate analysis is carried out with a single variable with the objective of finding the
dynamic dependence of { !}, i.e. on its past value { !2�, !2�, … , !2(} [66]. On the other hand,
with the multivariate time analysis, more than one variable is studied and observed at a time [67].
The forecasting of the values is done using time series analysis. The work of Takens (1981) and
Casdagli (1989) [68] and others established the methodology for the creation of a dynamic model
from a chaotic time series [42]. According to the Takens theorem, nonlinear chaotic dynamic
systems can be reconstructed from a sequence of observations [27]. This theorem states that giving
a deterministic time series, there exist a function F(·) such that verifies the equation 2.5 [42].
! = �+ !20 , !2$0 , … , !210- (2.5)
Where, < is the delay factor and m is the embedded dimension. Therefore, the theorem guarantees
the forecasting of future values considering only its past values. The difficulty comes when trying
find a function F(·) where the genetic algorithm will be implemented and used in order to find a
good approximation of this desired function. The forecasts are carried out by deterministic models
directly built from observations of the system evolution [43].
2.2.3.4 Forecasting methods
Having an univariate time series ['!]!��? representing the observations, it is possible to predict
the next n points of this series, i.e. '?/�, '?/$, … , '?/�, with only some of the previous samples.
The prediction could be performed with different methods, such as the prediction of just one point,
'!/� – 1-step ahead [69]– , prediction of several points at one time -direct strategy - , and the
iterative prediction which is used the '!/� as the input for the next prediction until '!/�.
The method used during this thesis is the direct multistep-ahead prediction of several points, also
known as independent value prediction in [70] or direct strategy in [71]. For the forecasting of the
next samples it is applied the Takens theorem, trying to find a function that connects the previous
samples to find a pattern in that past values aiming at using it to predict the next ones.
'!/� = �+'?, '?2�, … , '?2+12�-- (2.6)
CHAPTER 2. BACKGROUND AND RELATED WORK 25
In order to forecast ['!]!/�?/� from a time series ['!]!��? , a training set is created from the time series
using a shifted window of length +, − 1- ∗ < [70],where m is the embedding dimension, < the
time delay and n the number of samples to predict. Therefore, the prediction horizon will be always
fixed when the training set is chosen and vice versa. The training set is necessary in order to find
a nonlinear function of the data set.
On the other hand, the iteration prediction [71], also known as multi-stage prediction [70], consists
in taking the predicted sample as in input for the next forecast until the n prediction. In this manner,
the first predicted sample is used along with the past samples to predict the next one. Hence, the
samples used to predict are shifted one time unit, adding the new sample predicted in each iteration.
This is described mathematically by the expressions in equations 2.7, 2.8 and 2.9 [71].
'?/� = �+'? , '?2�, '?2$ … , '?2+12�-- (2.7)
'?/$ = �+'?/�, '? , '?2� … , '?2+12$-- (2.8)
'?/% = �+'?/$, '?/�, '? … , '?2+12%-- (2.9)
Where from the previous equations could be written the general equation 2.10 described also in
[71].
'?/� = �+'?/+�2�-, '?/+�2$-, '?/+�2%- … , '?2+12�-- (2.10)
The main problem of this method is that the error is summing up in each iteration due to the
predicted sample that is being included in every iteration and the inherited error is added in the
next prediction [70]. In contrast to the iteration prediction, in the associated squared multistep-
ahead, error is minimised with the direct prediction. Despite the direct strategy implies more
computational resources [71] as more n samples are attempted to be predicted. This is because the
larger the n the larger the training data needed in order to obtain a good predicted model. This is
due to the considerable absence of samples between T and n. Nonetheless, a better function could
be found using the latter strategy. For this reason, and also following the work done by [42], [27]
and [43] the direct multistep-ahead prediction is used.
26 CHAPTER 2. BACKGROUND AND RELATED WORK
2.2.4 Network Simulator NS2
The network simulator NS2, as it is defined in [25, p. Preface 7], is an “open-source event-
driven simulator designed specifically for research in computer communications networks”. The
first version of NS, called NS-1 [72] started in 1995 and successive versions of the simulator were
released. Since 1995, the Defence Advanced Research Projects Agency of United States (DARPA)
supported NS and in 1996-1997, the first version of NS2 was released. The latest version of NS2
is the 2.35, released on the late 2011. A newer version of the whole simulator, called NS3 [73], is
available since 2008. Scripting compatibility between NS2 and NS3 is not possible but models
were imported. Nevertheless, NS2 will continue to be active and maintained by the NS developers,
so its use in research is still valid. NS2 offers [8] a good platform focused on the simulation of
TCP, UDP, routing and multi-cast protocols over wired and wireless networks.
2.2.4.1 Architecture of NS2
The simulator is invoked by means of the command ns - in the Linux terminal – followed by
the Tcl file name as input argument. The Tcl is a script with the set-up of a simulation. As a result
of this, the usual output is a simulation trace file containing all the information about the
transmission performed in the medium.
Figure 10: Binding between C++ and OTcl
NS2 is based on two languages [25, p. 37]: C++ and OTcl -Object Oriented Tool Command
Language-. C++ is used to implement all the models and internal mechanisms of the NS2 network
simulator while OTcl is used as a user interface to control and set the simulation. On the one hand,
C++ is a compiled language and therefore computationally efficient when executed, however, it is
slow to be changed because requires re-compilation after any change done. On the other hand,
OTcl is an interpreted language and therefore does not need compilation, so changes are done very
CHAPTER 2. BACKGROUND AND RELATED WORK 27
fast. However OTcl it is slower than C++. These are the reasons why they are used for different
purposes inside NS2.
The C++ and the OTcl are linked using TclCL - Tcl with classes - [25, pp. 37-38]. Every bounded
C++ class has a corresponding OTcl class. Variables inside an OTcl object are called handles and
are mapped into C++ objects. These handles are strings – all start with the symbol “_” - in the
OTcl domain and does not have functionalities, it is only the interface with the user and other OTcl
objects.
2.2.4.2 PHY layer and MAC layer parameters
In order to correctly configuring NS2 to an 802.11g environment is necessary to understand
the specific parameters of the original NS2 configuration. The main parameters for 802.11g are
listed in the Table 26 and Table 27 inside APPENDIX B, where in the last column of the tables
there is a short explanation of each one.
2.2.4.3 NS2-CRAHN implementation
The Implementation of a cross-layer MAC and channel allocation scheme for Cognitive Radio
Ad Hoc Networks (CRAHNs) [5] published in 2009 [4] will be used as a reference for the
modification of NS2 and the explanation of the useful parts is presented here.
As it is described in [4], NS2-CRAHN is an extension of NS2 that is designed to support realistic
simulation of CRAHNs. NS2-CRAHN contains an accurate and flexible modelling of the activities
of Primary Users (PUs) and of the cognitive cycle implemented by each Cognitive Radio (CR)
user. In the study, NS2-CRAHN was used to analyse the impact of CRAHN features over the route
formation process and was used for the study of the TCP performance over CRAHNs considering
the impact of different factors on several TCP variants. NS2-CRAHN implements a multi-radio
transceiver system model, with several operating channels. Any of these radios can be turned to
any of the channels as some the interfaces are switchable. This is shown in Figure 11. In the project,
multi-radio will not be used, but is important to understand the operation in order to modify the
code properly.
28 CHAPTER 2. BACKGROUND AND RELATED WORK
Figure 11: NS2-CRAHN schema [4]
In CRAHNs, usually there are two types of users: Primary Users (PUs) and Secondary Users (SU).
PUs are users with preference to operate in the band. PUs operation will interfere secondary users
leading to the loss of all data. Secondary Users – CR in the CRAHN implementation- are
unlicensed users that are able to use the spectrum whenever it is idle. In this project will not be
used CRAHN but the spectrum management implemented in NS2-CRAHN for PU activities will
be used as a reference for the implementation of the desired scenario.
2.2.4.3.1 Modifications of the MAC layer
The NS2-CRAHN includes several modifications to the MAC implementation found in NS2
by default (file mac/mac-802.11.cc). All these modifications are done in order to include the
spectrum management handlers (enabling switching channel while required), the multi-radio
implementation and packet dropping while PU interference is found. The part including the packet
dropping while a PU is interfering will be adapted for the implementation of this project.
2.2.4.3.2 Primary Users Activity block
The PUs activity in this block of NS2-CRAHN is implemented. The PU activity model is in
a file called PUmodel and is described in [74]. The PU information is inside a file called PU-log
file. PUmodel read that file and saves it into a data structure. In the PU-log file, the first part is
composed of entries with the format in Figure 12.
CHAPTER 2. BACKGROUND AND RELATED WORK 29
Figure 12: PU-log file first part
Where the variables are the following:
The second part of the file describes the activity of each PU over time specifying the number of
entries for PU, when they enter and leave the channel. It is important to clarify that the pattern of
the PU activity is not defined, only intervals of time when the PU can be active.
The PU activity follows an alternative exponential ON-OFF distribution model. These alternative
states are: ON when the PU is using the channel – the channel is busy- and OFF when the PU is
not using the channel -the channel is idle and is available for Secondary Users-. According to [4],
the ON-OFF switching is regulated by following a birth-death Markovian process [75].
Figure 13: ON-OFF model
Figure 14: Example of ON-OFF model distribution in time
channel_ID, x,y, xrec, yrec, alpha, beta, tx_range
-channel_ID: channel used by the PU
([1…MAX_CHANNELS[)
- x, y: position of the PU transmitter
- xrec, yrec: position of the PU receiver
-alpha, beta: parameters for the ON/OFF exponential
distribution
30 CHAPTER 2. BACKGROUND AND RELATED WORK
In a birth-death Markovian model distribution with two states (ON and OFF). This model is
represented in Figure 13 and Figure 14 where beta is the death rate for PU activity. The duration
of ON state follows an exponential distribution with mean 1/beta. Therefore, if alpha is the birth
rate for a PU activity, then the duration of OFF state follows an exponential distribution with mean
1/alpha.
In order to program this is an efficient way, in the implementation, alpha and beta are time values
instead of probabilities, and the time ON /OFF is calculated in a different way: �@AA*B@C , equation
2.11 gives the time for the next birth( or ON period) and �DEF@A!GAE. and equation 2.12 gives the
time that the state will be alive (duration of ON state). The Figure 15 depicts this distribution in
time.
�@AA*B@C = HIJKH ∗ +; ILM+N-- (2.11)
�DEF@A!GAE � OPQH ∗ +; ILM+,-- (2.10)
(2.12)
Where N and , are uniform random real numbers in between the interval 0 and 1.
Figure 15: ON-OFF distribution programmed in NS2-CRAHN
2.2.4.3.3 Spectrum Manager
In the NS2-CRAHN implementation [4], the spectrum manager bock is in charge of
implementing the cognitive cycle that is sensing the medium and changing the transmission
channel to an idle one if a PU is active. The NS2-CRAHN spectrum manager is composed by three
blocks: the Spectrum Sensing Block, the Spectrum Decision Block and the Spectrum Mobility
Block.
The Spectrum Decision block is in charge of, in the case of PU activity detection, decide the actions
to be done: leave the channel to a new available channel one or stay on the channel. The Spectrum
Mobility block is in charge of the operation of the spectrum handoff [76] process to a new channel
CHAPTER 2. BACKGROUND AND RELATED WORK 31
and the spectrum sensing block is in charge of detecting the PUs activity in a given channel
exchanging information with the PU Activity block.
2.3 Related work
In this section, some related work that is useful to compare or as a reference for this thesis
work is analysed.
2.3.1 Traffic modelling and forecasting using genetic algorithms for next-
generation cognitive radio applications
Several time series from stock prices [77], traffic volume [78], electrical power [79] or
highway traffic [80] are involved to predict a sequence of future values using the historical values.
The difficulty of the prediction of these time series stems in its chaotic behaviour where they do
not follow a tendency or periodicity and conventional predictive models such as regression
analysis have their limits (in accuracy). In numerous articles such as [42], [27] or [43], a Genetic
Algorithm (GA) for forecasting time series of different nature is presented. The mentioned articles
concluded that the GA could be applied for highly dynamic systems where values are connected
to their long history.
In [27] a genetic algorithm for the next-generation cognitive radio applications is used. The
mentioned article highlights the importance of determining the availability of spare spectrum. To
do so, to estimate the future demand of the network is necessary.
The genetic algorithm is a stochastic search technique that could be used for short-term predictions
from chaotic nature of nonlinear and deterministic dynamic system with more accuracy than linear
stochastic models [42]. One of the advantages of this method is that it only needs some
observations of the system evolution to be able to carry out this forecasting. With this algorithm,
a function (equation) by means of only the limited data provided is obtained. In that function, the
connection of the past values with the present and future values can be examined. Another benefit
of this algorithm is the possibility of solving problems with multiple solutions.
32 CHAPTER 2. BACKGROUND AND RELATED WORK
The main difficulty in using this algorithm is to find the optimum set up parameters for the time
series analysed as the population size, probabilities of applying a certain operator, the evaluation
function, etc. Moreover, different methods inside the algorithm could be applied in each genetic
operation such as the kind of encoding, selection, mutation or crossover. Finding the optimum
becomes complicated with these methods. Several research approaches in genetic operations are
proposed by several authors as [44], [81], [82], [83] and [37].
2.3.2 End-to-end Protocols for Cognitive Radio Ad Hoc Networks: An
Evaluation Study
In [4] is written that most of the research in CRANH is focused on devising spectrum sensing
and sharing algorithms at the link-layer, so that CR devices can operate without interfering with
the transmissions of PUs. However, it is also important to consider the impact of such schemes on
the higher layers of the protocol stack, in order to provide efficient end-to-end data delivery.
Routing and transport layer protocols constitute an important study area that has not been
investigated in detail over CRAHNs.
In the study, an implementation of CRAHN in NS2 with multiple channel and multi-hop is used.
The study includes a part that analyses TCP performance with PU activity in the network using
the NS2 simulator.
Despite is not exactly the same scenario than the studied in this thesis. The TCP throughput for
different patterns of ON-OFF birth-death PU activity is analysed. This analysis is similar to the
one carried out in this thesis and therefore is related. As a result, the study shows a 3D graph
(Figure 16) with the TCP throughput of a set of alpha-beta combinations.
CHAPTER 2. BACKGROUND AND RELATED WORK 33
Figure 16: TCP throughput for different alpha beta combinations for CRAHN [4]
The drawback of the study is that, apart from scenario differences to the proposed in this thesis,
the analysis is done for low data rate – maximum TCP throughput achieved equal to 400Kbit/s-.
In addition, randomness in the ON-OFF patterns is not taken into account and may be an important
fact in the study.
2.4 Summary
The IEEE 802.11g is a standard for WLAN in the 2.4 GHz band. The MAC layer protocol
mainly determines the maximum throughput achieved, that is not equal to the maximum WLAN
data rate, and for a 54Mbit maximum WLAN data rate is 23 Mbps. TCP is a reliable connection-
oriented delivery service. The throughput at TCP layer relies on the Congestion window and the
retransmission time of data packets.
NS2 is a simulator for communication networks that it is mainly programmed in C++ and Tcl and
all its code can be modified by the researchers. In fact, NS2 has to be configured specially for
802.11g. The spectrum management NS-CRAHN implementation for NS2 will be used as a
reference to implement the PU activity management.
The genetic algorithm section begins defining the algorithm as a stochastic search that is based
on Darwinian theories of natural selection and survival. Time series are used with the objective of
study its own behaviour when they are applied to forecasting. This study is done in order to try to
34 CHAPTER 2. BACKGROUND AND RELATED WORK
predict the future evolution of time series (up to a certain time horizon or also called prediction
horizon).
The theorem proposed by Takens states how to reconstruct a chaotic system using the delay
coordinates from a single variable belonging to the system. For the forecasting, the Takens theorem
is used in order to find a function that connects the previous samples. This is carried out to find a
pattern in that past values aiming at using it to predict the next ones.
35
Chapter 3
Design and implementation
In this chapter, the whole design and implementation of the scenario developed in this Thesis
is explained. This chapter presents all work carried out during the project.
3.1 Introduction
This chapter explains the most important points of the project implementation and presents a
general overview of the implementation. In the first part of the chapter, the different wireless
scenarios used in this thesis are detailed. The next part is dedicated to the implementation of the
simulation software in NS2. It includes all the modifications, configurations and new features done
particularly for this thesis implementation. An example of how the different scenarios can be
programmed in NS2 is also included. The last part deals with all the issues related to the genetic
algorithm and forecasting implementation done in this thesis, including contains all the parts that
conform the genetic algorithm.
36 CHAPTER 3. DESIGN AND IMPLEMENTATION
3.2 Design of the wireless scenario
In order to properly characterise the scope of the Thesis research, the wireless scenarios that
has been used when doing the tests and simulations are described in this section. As mentioned
before, the proposed scenario is a wireless network, complying with the standard IEEE 802.11g.
This standard is detailed in the section 2.2.1.1. NS2 is not configured by default to comply with
802.11g so it must be configured. There will be three different scenarios (named as Scenario 1,
Scenario 2 and Scenario 3) that will be described in the following points. In these scenarios, there
will be some the following elements:
• Node 0: This node communicates with Node 1 using TCP or UDP. This a secondary
user that can use the network whenever it is sensed as idle.
• Node 1: This node communicates with Node 0 using TCP or UDP. This is also a
secondary user with the same characteristics as Node 0.
• Node 2: This node communicates with Node 3 using TCP or UDP. This is also a
secondary user with the same characteristics as Node 0 but playing back real traffic
from a file.
• Node 3: This node communicates with Node 2 using TCP or UDP. This is also a
secondary user with the same characteristics as Node 0.
• Primary users: There can be one or more pairs of PUs that in any moment are allowed
to use the wireless medium. In the case that both secondary users and PUs are
transmitting, secondary users will lose the sent packets. PUs are always in pairs.
Hence, having one PU will imply 2 nodes.
3.2.1 Scenario 1: Primary users
In this scenario, depicted graphically in Figure 17, in the wireless medium, there is a PU
transmitting inside the reception range of the Secondary users. The PU transmission area (range
of 300 meters) and the nodes that will serve to measure the available bandwidth are inside this area
and therefore will be affected by the PU activity is shown in Figure 17.
CHAPTER 3. DESIGN AND IMPLEMENTATION 37
Figure 17: Map of Scenario 1
3.2.2 Scenario 2: Secondary users with real traffic inside the sensing area
In this scenario, there are 4 nodes, 2 nodes will exchange real TCP traffic played back – nodes
in red colour- and the other two –nodes in black colour- are the used to measure the available
bandwidth. The nodes are located in such a way that they sense each other and therefore they
should share de medium.
Figure 18: Map of Scenario 2
38 CHAPTER 3. DESIGN AND IMPLEMENTATION
3.2.3 Scenario 3: Secondary users with real traffic outside the sensing area and
only one affected
In this scenario, there are 4 nodes, 2 nodes will exchange real TCP traffic played back (nodes
in red) and the other two (nodes in black) are the used to measure the available bandwidth. The
nodes are located in such a way that they cannot sense each other and therefore they will interfere
each other. A significant number of collisions and low available bandwidth is expected for this
scenario.
Figure 19: Map of Scenario 4
3.3 Simulation implementation in NS2
The most important actions that have been done in order to configure and implement the
simulation in NS2 version 2.35 are described in this section. Regarding the modifications and
programming done, the following points may summarise the relevant stages:
• Configuration of NS2 to comply with IEEE 802.11g WLAN standard
• Include the cognitive folder –inherited from NS2-CRAHN- into NS2.
• Modify the PU Spectrum management functions of NS2-CRHAN to fit the
requirements of this thesis.
CHAPTER 3. DESIGN AND IMPLEMENTATION 39
• Configure and modify the 802.11 MAC implementation to fit the requirements of this
thesis. Install No Ad Hoc Routing Agent (NOAH)
• Modify ns-lib.tcl
• MAC Busy Time measurement
• Real traffic acquisition
• Scenario scripting programming
3.3.1 General simulation outline
This section present, as a brief overview, the most relevant stages in the simulation process in
NS2. The simulation begins when the Tcl script is executed in the Linux terminal, all the simulation
and scenario parameters are set and this will be described in 3.3.7. After that, a MAC layer object
is created in order to manage all the operations at MAC layer for a specific node. Immediately, a
spectrum manager handler is created for this MAC. Once this is finished, the PU model
information is loaded in the system and the PU activity pattern for the whole simulation time is
calculated. This calculation is optional because a previously calculated file can be used meanwhile
the PU busy time is computed. At this point, the simulation is performed and at the last moment,
the total MAC busy time is calculated. This simulation gives as output an NS2 trace file with the
information of the simulation that will be used to calculate some others metrics. All this is depicted
in Figure 20.
40 CHAPTER 3. DESIGN AND IMPLEMENTATION
Set simulation with Tcl file
Create MAC layer handler
Create Spectrum Manager handler
Load PU model
Calculate the PU activity pattern
Calculate the PU busy time
Calculate total MAC busy time
Output to a NS2 trace file
Calculate other metrics
Save MAC busy time into a
file
Send a packet
Are there any PU active
now + tx time?Drop packet
Calculate MAC busy time
No
Yes
Figure 20: Simulation overview diagram
CHAPTER 3. DESIGN AND IMPLEMENTATION 41
3.3.2 Configuration of NS2 for IEEE 802.11g
In order to comply with the IEEE 802.11g standard, NS2 has to be configured properly. The
characteristics of this standard are explained in detail in 2.2.1.1. This configuration is done
changing, in the file ns-default.tcl is located in the folder tcl/lib/, the following parameters:
• MAC parameters:
• Physical parameters:
In order to calculate the reception range, it is necessary to calculate Reception threshold
(RXThresh) and Carrier Sense Threshold (CSThresh) as desired. NS2 has a C++ program intended
for the calculation for the RXThresh and CSThresh. That program is called threshold.cc [84], it
can be compiled and modified adding the specific parameters of the physical layer. The program
threshold.cc has to be executed specifying a propagation model. In the thesis, the Two Ray Ground
Model [85] is used. The Two Ray Ground Model gives a more reliable result that the Free Space
Mac/802_11 set CWMin_ 15
Mac/802_11 set CWMax_ 1023
Mac/802_11 set SlotTime_ 0.000009
Mac/802_11 set CCATime_ 0.000003
Mac/802_11 set RxTxTurnaroundTime_ 0.000002
Mac/802_11 set SIFS_ 0.000016
Mac/802_11 set PreambleLength_ 96
Mac/802_11 set PLCPHeaderLength_ 40
Mac/802_11 set PLCPDataRate_ 6.0e6
Mac/802_11 set RTSThreshold_ 3000
Mac/802_11 set MaxPropagationDelay_ 0.0000005
Mac/802_11 set ShortRetryLimit_ 7
Mac/802_11 set LongRetryLimit_ 4
Mac/802_11 set basicRate_ 6Mb
Mac/802_11 set dataRate_ 54Mb
42 CHAPTER 3. DESIGN AND IMPLEMENTATION
Model [86]. As output of threshold.cc, we will get the RXThresh or CSThresh value. The threshold
used is for a reception distance of 500 meters so more that this distance, no packets will be
received. The sensing distance is set to an equal value so traffic further than 500 meters will not
be sensed. The other physical parameters will affect to that distance so they have to be taken into
account.
It is also necessary to change in this file the UDP and TCP packet size to 1400 bytes. For further
explanation of each parameter, please refer to APPENDIX B.
3.3.3 MAC implementation in NS2
The implementation of the MAC in NS2 has been modified in order to perform all of the
desired functions, such as the PU activity management or the MAC busy time calculation. This
implementation is done in the files mac-802_11.cc and mac-802_11.h contained inside the /mac
folder. The creation of a PU management handler for every node, the loading of the PU model log
file, the calculation of the PU activity pattern, the calculation of the PU busy time, the calculation
of the total busy time and finally the PU activity management have been added. In order to load
the PU model log file or calculate the PU activity pattern, the modification of the file ns-lib.tcl is
required to be able to execute the order calling it from the Tcl scripts. That information has the
structure explained in 2.2.4.3.2.
Phy/WirelessPhy set bandwidth_ 54e6
Phy/WirelessPhy set freq_ 2.4e+9
Phy/WirelessPhy set Pt_ 3.3962527e-2
Phy/WirelessPhy set RXThresh_ 2.75096e-12
Phy/WirelessPhy set CSThresh_ 2.75096e-12
Phy/WirelessPhy set L_ 1.0
Phy/WirelessPhy set CPThresh_ 10.0
CHAPTER 3. DESIGN AND IMPLEMENTATION 43
3.3.3.1 Implementation for Primary Users
This part explains the changes done to NS2 in order to include PUs activity in the wireless
network. The implementation is done in such a way that, that whenever a PU is transmitting and
reaching the secondary users location; the secondary users must drop the packets because in a real
environment, the packet will not be received properly due to the interference of the PUs. This
implementation is done by changing the way the MAC layer works in NS2. In order to understand
how the implementation is done, Figure 21 shows the flow of operations whenever a packet is
received at MAC level.
Regarding the PU activity, the following modifications have been done to the 802.11 mac
implementation in NS2. This implementation is done in the files mac-802_11.cc and mac-
802_11.h contained inside the /mac folder. The most important changes are carried out in the
function recv(Packet *p, Handler *h), where for every received packet is checked if a PU is
interfering. In contrast, if PU activity detected, that packet is dropped.
3.3.3.1.1 PU spectrum management
The Spectrum management that will allow NS2 to model the presence of Primary Users in the
wireless scenario is done in the functions Is_PU_interfering(), sense, Is_PU_active() and
check_active(). These structure and functions have been taken and modified from the NS2-
CRAHN implementation. However, the implementation of the last function -check_active()- is
completely new because of the different functioning of both implementations. In conclusion, only
the part that implements the PU pattern calculation algorithm is inherited from NS2-CRAHN.
The flow chart depicted in Figure 21 represents the steps each packet pass in the spectrum manager.
When a packet is received, the proper loading of the PU model file is checked, if it is not, this
means that there are no PUs. Therefore, the packet will not be dropped and a negative message is
sent to the receiver function in order to not to drop the packet. In case of the existence of PUs, the
next step is to verify for every PU, if the node that is receiving is inside the transmission area of
the PU, and if it is not, a negative message is sent to the receiver function in order to not to drop
the packet. If the reception node is inside the PU transmission area the next step it to check if the
PU is active during the period that the packet would be received. If the PU is active at the same
44 CHAPTER 3. DESIGN AND IMPLEMENTATION
time that the packet is sent, that function will send an affirmative message to the receiver function
to drop the packet. In the opposite case, the function will tell the receiver not to drop the packet.
recv()
Is_PU_interfering()
sense()
Is_PU_active()
check_active()
recv()
Packet received
Check if there are PUs interfering for this packet
Is the PU model file load?
Is the node inside the PU tx
radius?
Are there any PU active
now + tx time?
Are there more PUs?
Drop the packet Not PU interfering – No drop
Yes
Yes
No
Yes
No
No
Yes
No
Figure 21: Primary Users flow chart and location
CHAPTER 3. DESIGN AND IMPLEMENTATION 45
3.3.3.1.2 PU activity detection
This final decision of the dropping or not of the packet is done in check_active(), that is located
in PUmodel.cc. This function reads the file PUactivity.txt, which contains the information about
the PU activity pattern for all the simulation time. This file is generated in the beginning of the
simulation if it is specified or a previously-calculated file can be used. After reading the file, the
function searches for any superposition between the PU activity and the packet transmission. If
this superposition occurs, information to drop the packet is sent back to the MAC handler.
3.3.3.1.3 PU activity pattern and busy time
In the function get_PU_busy(), the PU activity pattern following a Birth-death Markovian
process is calculated. This calculation is described in 2.2.4.3.2. The result is an ON-OFF
distribution. This function, that is called only one time, calculates the PU activity pattern for all
the simulation time and stores them in a file. The PU busy time along all the simulation time is
also calculated and given it as return data. To include the call to this function in the ns simulator
in order to be able to call it from the Tcl script, it is necessary to bind it to the function
Mac802_11::command() including it in ns-lib.tcl.
The function by default always gives the same random pattern of ON-OFF for an alpha-beta
combination. In order to have an option with very random results, every time the function is called,
a change of seed (in a heuristic way for speed) of the random generator has been added.
In the cases where randomness has to be deleted, a small modification in the algorithm will set the
values of the time to arrive and departure -explained in 2.2.4.3.2- to alpha and beta respectively,
obtaining a fixed periodic pattern.
3.3.3.2 MAC busy time measurement
The busy time measurement of the MAC layer will be used to compare with the TCP and UDP
bandwidth. The MAC business has been measured adapting the implementation done in [87]. This
adaptation has been done manually due to compatibility issues (compatibility of patches between
different NS2 versions is not possible). The measurement is done in the implementation of the
802.11 mac layer in NS2, the implementation adds all the periods when the MAC is active, from
46 CHAPTER 3. DESIGN AND IMPLEMENTATION
(Eq. 3.1)
the transmission of the packet to the reception of the MAC layer acknowledgement, including the
SIFS and the DIFS. The timing structure [88] without RTS/CTS is represented in Figure 22.
Figure 22: MAC busy time
In the NS2 MAC, after each MAC layer ACK is received, the sender backs-off (BO in Figure 22).
This back-off is not included in the MAC busy time. In the modifications done, a condition has
been added to take into account that the time when the PUs are active must not be added to the
MAC busy time as it will be added to the PU busy time. Finally, the PU activity busy time and
the MAC busy time will be added giving as a result the Total MAC busy time as it is shown in
Eq.3.1. Adding the MAC busy time of the sender, receiver and the PU busy time.
�R�STUVW?*1E ��XYZ_�\DE_] ^�XYZ_�\DE_� ^�_`V_a(
The functions that allows saving the time where the MAC start and end a busy period into a file
has been added to the basic implementation. Also �XYZ_�\DE_],�XYZ_�\DE_�and �_`V_a( are
stored in a file if it is called the line $ns_ at TIME "$node_(X) compute-mac-busy” in the Tcl
script. To include the busy time calculation in the ns simulator in order to be able to call it from
the Tcl script, it is necessary to bind it to the function Mac802_11::command() including it in ns-
lib.tcl. This is necessary in order to be able to compute the busy time whenever it is interesting.
3.3.4 No Ah-Hoc Routing Agent (NOAH)
As only two nodes will be used- and therefore there are no hops- only two nodes no routing
agent is required. According to [89] [90] NOAH is a wireless routing agent that only supports
direct communication between wireless nodes or between base stations and mobile nodes in case
of use mobile IP. This allows simulating scenarios where multi-hop wireless routing is not
required. NOAH does not send any routing related packets. In our case, this is interesting because
in case of PU activity, if the routing packets are dropped, the node will not find the route and all
CHAPTER 3. DESIGN AND IMPLEMENTATION 47
the packets will be dropped in the transmitter node. NOAH is not available in NS2 by default and
the set-up must be done manually as described in [89].
3.3.5 Acquisition and playback of real wireless traffic
This section contains the procedure used to acquire wireless traffic. In order to playback real
traffic, the acquisition and adaptation of traffic traces is required. The final data has to be presented
in NS2 binary trace format in order to be played back in NS2.
3.3.5.1 Acquisition and conversion of traffic trace
The wireless traffic packets have been captured using the program Wireshark [91]. This
program allows live capture and offline analysis of traffic in any interface of the computer. It also
allows packet filtering. After the traffic is acquired, its traffic trace must be exported. The
exportation has been done in a Coma-Separated Values (CSV) format [92], it make possible the
conversion from Wiresharks’ traffic traces to NS2 binary trace files.
The conversion from CSV ASCII format to NS2 binary trace file is done with a Perl [93]
script, the script has been modified from the original [94]. This script takes the time stamp of each
packet and its size and creates a new binary file that can be used in NS2 as an input.
3.3.5.2 Playback of real traffic in NS2
As it has been mentioned, the playback of real traffic gives the opportunity of having a more
realistic behaviour of the network and therefore its use is recommended. In order to include the
playback of traffic, the following lines have to be added:
set tfile1 [new Tracefile] Create a new trace file
$tfile1 filename "my_file.bin" Assign a name
set trac0 [new Application/Traffic/Trace] Create a new traffic trace app.
$trac0 attach-tracefile $tfile1 Attach the application to trace file
48 CHAPTER 3. DESIGN AND IMPLEMENTATION
The third line creates an Application object [95], which is an object located on top of the transport
layer and can either generate traffic – e.g. CBR- or simulate an application – e.g FTP.
Application/Traffic/Trace application is able to take a binary trace file and play it back. This traffic
will have the same timing that the original one if the network allows it, otherwise, the played back
traffic will be affected by the state of the network and will see modified its original shape.
3.3.6 Data analysis and graphic representation
In order to calculate different parameters such as throughput, packet loss, from the trace files or
other files with huge amount of data, Linux shell script [96] and AWK [97] languages have been
used.
The results have been plotted using Gnuplot [98] and Xgraph [99]. The former is run using a script
and needs pre-processing the output traces and the latter has to be called during the simulation
execution in order to get output files to be plotted later.
3.3.7 Implementation of the wireless scenario in NS2
The implementation of the scenario is done in NS2 by programming a TCL script. First, the
different options for the scenario, such as the propagation model, the number of nodes or the
antenna type are defined. After the variables are set, the main program is set up, the variables are
initialised and the PU model file is loaded. A God (General Operations Director) is created,
according to [100] “a God is the object that is used to store global information about the state of
the environment, network or nodes that an omniscient observer would have, but that should not be
made known to any participant in the simulation”.
The next step is to configure the nodes that will conform the network. All the nodes will be equals.
The only thing that will vary from one to another is the position. Immediately after this, the mobile
nodes are created. There will be created as many nodes as specified when the variables were set at
the very beginning of the Tcl script.
At this point, the topology, the traffic specifications and the agents are created. There are different
possibilities, configuration for TCP and for UDP and there are explained in detail in APPENDIX
A. Both TCP and UDP agents can be used with real traffic played back, just adding as application
CHAPTER 3. DESIGN AND IMPLEMENTATION 49
a trace player as it is described in 3.3.5.2. When the simulation is completed, it is called the
function that calculates the total MAC busy time and stores it in a file.
The last commands that have to be executed tell the simulator to end all the agents, end the
simulation and close NS2. After all is set, the only step remaining is telling ns to start the simulation
that will use all the above described. All these processes are explained in detail in APPENDIX A
3.4 Genetic algorithm implementation
In this section, the most important steps of the design and implementation of the genetic
algorithm are described. The algorithm is programmed in MATLAB 2013 and its configuration is
based on the approach given by [27].
3.4.1 General GA outline
For the design and implementation of the genetic algorithm and its use for forecasting, the
next steps are followed. A brief explanation can be found in section 2.2.2.4.
• Step 1: Encode the solution by means of the chromosome.
• Step 2: Definition of the fitness function to quantify the performance of each chromosome
for the problem to solve.
• Step 3: Generation of N random chromosomes.
• Step 4: Calculate the fitness by means of the fitness function for each one of the
chromosomes generated. Then the population is sorted by their fitness.
• Step 5: Selected the best K chromosomes for the next generation. This is called elitism.
• Step 6: The best chromosomes are introduced in a mating pool: : ∗ b7"P1@!*�c F\\C, where 0 ≤ b7"P1@!*�c F\\C ≤ 1.
• Step 7: Apply the selection criteria in the mating pool to select a pair of parents according
to their fitness, i.e. the probability to be chosen.
• Step 8: Creation of two offspring by means of the selected pair of parents applying the
crossover function.
• Step 9: Applying the mutation operator for the new offspring.
50 CHAPTER 3. DESIGN AND IMPLEMENTATION
• Step 10: The elitism chromosomes and the offspring are introduced to the next generation.
The rest of the mating pool is introduced to the next generation or new chromosomes are
generated for the next generation with the same size of this mating pool.
• Step 11: Go to step 4, and repeated the process until the termination criteria.
• Step 12: the prediction with the best function is done.
After describing these steps, each of them and their implementation and design will be
detailed. All the steps described above in different sections are summarized in a general outline
depicted in Figure 23.
CHAPTER 3. DESIGN AND IMPLEMENTATION 51
No
No Yes
Generate Initial population
Set up parameters
Evaluate the chromosomes fitness
Selection
Crossover
Mutation
Termination criteria
Elitism
New Population
Diversity
Generation of new
chromosomes?
Generate new populationSelecting least fit
individuals
Yes
Selecting the best chromosome
Prediction
Figure 23: GA general outline
3.4.2 Encoding of the chromosome
The encoding of the chromosome is important because it represents the solution of the
problem to be solved, that is, the genotype (encoded space). This is essential because the
chromosome will evolve by manipulating the genotype so a bad encoding could result in a bad
52 CHAPTER 3. DESIGN AND IMPLEMENTATION
solution. This is because the evaluation and the selection of the chromosome are performed based
on the phenotype (solutions space) which is illustrated in Figure 24.
The encoding conforms to the rules provided by [27], where chromosomes represent equations
with reverse Polish notation [101]. These equations are randomly generated to build an equation
composed of arguments, operators and functions. These arguments are either real numbers chosen
from a finite set between [-Z, Z], or values from the time series ['!]!��? . Moreover, the operators
could be f+, −, × or ÷ i and the functions f‘ sin+n- ’, ‘ cos+n- ‘, ‘ exp+'- ’ log+'- ‘/’i. These chromosomes are generated following three basic rules:
• The two first elements of the chromosome must be arguments and the last one is an
operator.
• The number of arguments on the left must be greater than the number of operators at any
position of the chromosome.
• The number of arguments must be the same as the number of operators plus one.
Encoded
spaceSolutions space
decoding
coding
Figure 24: Encoded space and solutions space
The input values are scaled using a scale factor in order to prevent higher numbers and therefore
to reduce the probability of infinite values. Moreover, this play down the importance of chose a
correct finite values [-Z, Z] for each data input range.
Hence, from the data that is going to be analysed it is used the equation 3.1 in order to obtained
the factor scale to be applied.
uHvQL�_7NJwQ = max _zHIwP/,H'+{HQH*�FG!- (3.1)
Where, uHvQL�_7NJwQ is the factor to scale the data that is going to be used, max_value is the
maximum value allowed during the GA operation, and ,H'+{HQH*�FG!- is the maximum of the
data that is going to be used.
CHAPTER 3. DESIGN AND IMPLEMENTATION 53
Once the factor scaled is obtained, this is used with the data to be analysed in order to obtain the
data to use during the GA operation. This is done by means of the equation 3.2.
{HQHGVED � {HQH*�FG! ∗ uHvQL�_7NJwQ (3.2)
Where, {HQH*�FG! is the data that is going to be used (samples), {HQHGVED is the data scaled that
is going to be used during the GA operation.
3.4.3 Definition of the fitness function
Regarding the performance of each chromosome, different criteria are used. These criteria
will be explained in this section. One of the common performance criteria in all the fitness
functions is the error between the predicted sample and the real sample – from the time series - in
the training set.
As it was explained in section 2.2.3.4, for forecasting ['!]!/�?/� from a time series ['!]!��? a training
set of the time series using a shifted window of length +, − 1- ∗ <– is created, where m is the
embedding dimension, < the time delay and n the number of samples to predict-. This is shown in
Figure 25, where the green line is the shifted window that will be slid as far as the sample 29, and
the black line is where the training set starts.
This training set is created with length n, and is equal to the horizon prediction. The first sample
of the training set, of the time series ['!]!��? , could take any sample contained in the shifted window
to form the chromosome. This is described along with the input in the next table for the same time
series example and τ=1 to simplify this example. The input is a vector that indicates all the range
of past samples that the chromosome could take to be created. This is also illustrated in Figure 19
along with Table 5 for a general expression where the samples of the training set and the arguments
that the chromosome could take as an input for that sample of the training set and for the output
(prediction) are shown.
54 CHAPTER 3. DESIGN AND IMPLEMENTATION
Figure 25: Shifted window and training set
Samples of the
training set
Input Output Predicted
First
sample
'?2�/� ='?2�, '?2�2�, … , '?2�2+12�-> ='?21/�, '?21, … , '?> '?/�
Second
sample
'?2�/$ ='?2�/�, '?2�, … , '?2�2+12$-> ='?21/$, '?21/�, … , '?/�> '?/$
Third
sample
'?2�/% ='?2�/$, '?2$2� , … , '?2�2+12%-> ='?21/%, '?21/|, … , '?/$> '?/%
Fourth
sample
'?2�/| ='?2�/%, '?2%2� , … , '?2�2+12|-> ='?21/|, '?21/}, … , '?/%> '?/|
n
sample
'? ='?2�, '?2$, … , '?21> ='?21/�, '?2~/�/�, … , '?/�2�> '?/�
Table 4: Example training set window
Each chromosome, or function, is used to predict the values of this training set in order to validate
the efficiency of the function and try to minimise the error. Thereafter it will be used to forecast
the next n samples (last of set output) and thereby obtaining the predicted samples.
CHAPTER 3. DESIGN AND IMPLEMENTATION 55
Therefore, each chromosome (function) is evaluated in each point of the training set and it is added
up the squared error (���(- in order to avoid negative results.
This step can be described mathematically as follows:
!� � u�+'!20 , '!2$0, … , '!210- (3.3)
,< + 1 ≤ Q ≤ � 1 ≤ � ≤ : (3.4)
���� = � + !� − '!-$?
!�10/� (3.5)
, where !� is the predicted sample at time t, '! is the original sample from the time series, T is ,<
+n – large of the training set- and u� +- is the chromosome j that is evaluated in each sample of the
training set.
Moreover, other performance criteria are used in order to evaluate the chromosomes as proposed
in [102]. These criteria are the MSE (Quadratic Average Error), MAPE (Percentage Average
Error), NMSE (Normalized Mean Square Error), POCID (Prediction On Change in Direction) and
ARV (Average Relative Variance). The most widely used criterion to evaluate the performance is
the MSE, equation 3.6.
���� = 1� � + !� − '!-$?
!�10/� (3.6)
Where, P is the length of the training set. This method is not sufficiently robust because it does not
provide enough information about the forecasted model. The MAPE provides information about
the deviation of the model and it is calculated like in equation 3.7:
�S��� = 1� � � !� − '!'! �?
!�10/� (3.7)
Another criterion used is the MMSE or Theil’s U-statistic used by [103] and [104]. This criterion
is used by means of equation 3.8, which is proposed by [102].
:���� = ∑ . !� − '!3$?!�10/�∑ +'! − '!/�-$?!�10/� (3.8)
56 CHAPTER 3. DESIGN AND IMPLEMENTATION
The POCID, equation 3.9 and 3.10, provides the percentage of the number of the correct direction
decisions, i.e. if the value of the time series is going up or down in the next time interval.
�R��{� � 100 ∗ ∑ {�?!�10/�� (3.9)
{� = �1, + !� − !2�� -+'! − '!2�- > 00, LQℎP��7bP (3.10)
The last measure is the ARV, which relates the performance of the model with the mean of the
time series and it is given by the equation 3.11.
S��� = ∑ . !� − '!3$?!�10/�∑ . !� − '̅3$?!�10/�
(3.11)
Where, '̅ is the mean of the real values in the training set.
The author of [102] proposes four fitness functions using the criteria detailed above. These are
the equation 3.13, 3.14, 3.15 and 3.16.
u7QNPbb� = �R��{1 + ��� + �S�� + :��� + S�� (3.13)
u7QNPbb� = 11 + ��� (3.14)
u7QNPbb� = �R��{1 + ��� (3.15)
u7QNPbb� = �R��{1 + :��� (3.16)
These fitness functions will be used in order to calculate the performance of each chromosome.
The best chromosome is obtained with the highest value of the fitness functions. Therefore, the
highest value will be 100 for equations (3.13, 3.15 and 3.16) and 1 for equation (3.14).
The final fitness function is obtained by means of equation 3.18 where is multiplied by an
exponential with an argument that depends on the preferred numbers of time series arguments that
could appear in the chromosome.
u7QNPbb� = u7QNPbb� ∗ P2+|�2�|- (3.18)
CHAPTER 3. DESIGN AND IMPLEMENTATION 57
Where, � is the preferred number of time series arguments and � is the actual size of the
chromosome. The result of the exponential is 1 when a chromosome has the same number of time
series arguments that is desired. Therefore, the maximum contribution is obtained.
As it can be seen, the equations proposed for the fitness are different from the author of [27]. This
is because, as it is shown in the equation (4) of [27], if the error (result of equation (3) of the same
author) is much higher than the variance, the result can be negative. The percentage of the total
variance of the training set is taken as fitness for the RWS. Therefore, it is impossible to calculate
the probability of being selected for that chromosome with the RWS. Authors as [42] and [43]
propose similar equations to calculate the fitness. Even if these equations have the same drawback,
these authors proposed the RWS selection method.
From [27] it could be deduced that a normalisation of the values is made in equation 4 before
subtracting 1. The problem of applying this normalisation is that proportionality disappears.
Therefore, the RWS does not work properly. For that reason, different fitness equations based in
different parameters are proposed. Nevertheless, the approach proposed by [27] (equation 3.18)
to avoid many time series arguments in the chromosome is taken into account.
3.4.4 Generation of N random chromosomes
A first initial generation with N random chromosomes is generated. These chromosomes are
generated by means of a function called chromosome_genr.m which randomly selects two
arguments, from the time series or numerical, for the two first locus of the chromosome. Then, the
next positions are generated randomly, selecting arguments, functions or operators until the last
position where a random argument is always selected. This function creates as many chromosomes
- with the preferred length- as indicated in the input. Moreover, in the input the vector is introduced
with the first time series corresponding to the first sample of the training set. This process is
summarised in Figure 26 that always ensures that the first rule in the encoding process is met.
58 CHAPTER 3. DESIGN AND IMPLEMENTATION
Chromosome generator chromosome_genr.m
Number of
chromosomes equal
to N
Random generator of values, operators and functions.
Selection of the first and second argument for the
chromosome
Length of the
chromosome -1
Random selection of argument, operator or function
Selection an
operator for
the last
position
No
Yes
No
N Chrosomes
Output
Yes
Figure 26: Chromosome generator function
The chromosome repair could be needed due to the fact that is generated randomly and therefore,
only the first constraint in the encoding process is fulfilled. On the other hand, as the other rules
could be wrong, it is necessary to verify if the second and third rules are satisfied. In case of not
satisfy the rules, besides the first, both rules will be required to repair the chromosome in order to
conform to the rules.
It may seem beforehand that this process would not be necessary if the chromosome were
generated conforming to the pre-set rules. This would be true if there were no crossover and
mutation process. Nevertheless, after these two processes it is necessary to verify again if the
chromosomes meet the rules and otherwise proceed to repair them. Therefore, the repair function
has two steps. The first step is to verify if the three rules are met and if not, the chromosome is
repaired.
CHAPTER 3. DESIGN AND IMPLEMENTATION 59
3.4.5 Calculate the fitness by means of the fitness function for each one of the
chromosomes
Once the chromosomes have been generated and repaired, the calculation of the performance
of each chromosome is needed. Previous to the fitness calculation, the chromosome set needs to
be generated. This is because the actual population of chromosomes only contains arguments from
the first vector of the time series (corresponding to the first training set sample).
3.4.5.1 Generation of the chromosome set
In order to generate the chromosome set the complete population (N chromosomes) is taken.
Then, they are copied and added one position in those arguments that could appear in each
chromosome. This operation is done as many times as samples in the training set. One example of
this operation is illustrated in Table 5, for a time series ['!]!��? , a population of two chromosomes
of length 9 and a training set of ten samples.
Chromosome Genotype Training set sample
Chromosome 1: Chromosome 2:
4 '�� + 5 P'J - 5 vLb / '�} '�� 5 + / ILM 1 P'J ×
First sample
Chromosome 1: Chromosome 2:
4 '$] + 5 P'J - 5 vLb / '�� '�� 5 + / ILM 1 + ×
Second sample
. . . Chromosome 1: Chromosome 2:
4 '$� + 5 P'J - 5 vLb / '$} '$� 5 + / ILM 1 P'J ×
Tenth sample
Table 5: Chromosome set for a two chromosome population and a training set of ten
This process is carried out in order to obtain the phenotype – the calculation of the chromosome-.
3.4.5.2 Calculation of the chromosome phenotype
The next step is to calculate the value of the chromosome, i.e. the phenotype. For this process
the function RPN.m is used, which transforms the postfix notation into infix notation. The
phenotype is calculated whilst it is converting from one notation to the other.
60 CHAPTER 3. DESIGN AND IMPLEMENTATION
Hence, in this point we have, as is illustrated in Table 6, the following:
• The chromosome set
• The phenotype of each one of this set of chromosomes
Chromosome Genotype Phenotype Training set sample
Chromosome 1: Chromosome 2:
4 '�� + 5 P'J - 5 vLb / '�} '�� 5 + / ILM 1 P'J ×
11,57 1,83
First sample
Chromosome 1: Chromosome 2:
4 '$]+ 5 P'J- 5 vLb / '��'��5 + / ILM1 ^ ×
16.86 1.70
Second sample
. . . Chromosome 1: Chromosome 2:
4 '$�+ 5 P'J- 5 vLb / '$}'$�5 + / ILM1 P'J×
24.61 3.34
Tenth sample
���=2; ��� � ��� ; ��� � �� ���=3.5; ���= 73; ���= 34 ���=5.7 ���=53.3 ���=10.6 Table 6: Genotype and phenotype of the chromosome set
3.4.5.3 Restrictions in the calculation
During the conversion (from postfix notation to infix notation) and calculation of the
expression some restrictions could appear. The main restrictions are done in order to avoid infinite
numbers, expressions, or due to indeterminate forms. Some examples of these are the division by
zero, the logarithm of a zero or negative number or any operation leading to an infinite number.
Other restrictions appear when the result is excessively high. Therefore, infinite values could reach
when it is computing the operations criteria in the evaluation process. To prevent this, the
maximum result allowed is |1P + 100|. 3.4.5.4 Fitness function
To calculate the performance of each chromosome for the complete set the function fitness.m is
called. Each function -chromosome- is evaluated in each training set sample. This evaluation
changes the time series argument in each training set sample for the corresponding one according
to the shifted window. This is done by means of the chromosome set, previously calculated,
selecting the phenotype of each chromosome of the complete set corresponding to each sample of
CHAPTER 3. DESIGN AND IMPLEMENTATION 61
the training set. Then, as it has been said in section 0, for each chromosome is calculated the fitness
and finally this is multiplied by the factor length established obtaining finally the fitness for each
chromosome. The same function orders the results in ascending and descending lists and save them
keeping the original position as the chromosomes were introduced in the function. This original
position will help later to identify which phenotype correspond to the genotype in the chromosome
matrix - list of chromosomes –.All these steps previously explained for the fitness calculation are
illustrated in Figure 27.
No
Fitness Calculation fitness.m
Generation of the chromosome set hromosome_genset.m
Calculation of the phenotype RPN.m
Number of
calculations equal
to N
Square error calculation
Performance calculation
YesPerformance ordered in
ascending and descending
lists
Figure 27: Fitness calculation
3.4.6 Elitism process
To do not lose the best chromosomes during the crossover and mutation, the best
chromosomes of the population process are kept. This process guarantees that the best
chromosomes are going to survive and be present in the next generation [105]. This process takes
the K best chromosomes. The value K is obtained by the calculation of : × PI7Q7b,A@!E � ℕ₀,
where N is the population number and the elitism rate is the percentage of elitism that ranges from
62 CHAPTER 3. DESIGN AND IMPLEMENTATION
0 to 1. Therefore, the best K chromosomes are obtained from the top positions in the descending
list created during the fitness calculation and are copied to the new generation. It is worth noting
that even though these chromosomes are copied to the next generation they are going to be present
in the next processes of selection, crossover, and mutation.
Fitness Calculation fitness.m
Elitism
(N× Elitism_rate) ∈ ₀
Insert to the next generation
Figure 28: Elitism process
3.4.7 Mating pool
The mating pool is formed by the chromosomes that experience changes in the processes of
selection, crossover and mutation. Thereby this mating pool has the potential parents for the
offspring creation [106, p. 4]. This is created to avoid the genetic algorithm to be blocked in a local
minimum, i.e., tries to add more diversity. Since every new generation, new chromosomes to the
current generation are created and introduced. Thereby the size of the mating pool for the parents’
selection is set by the equation 3.19.
�HQ7NM JLLI = +: − ¢- × ,HQ7NM b7"P � ℕ (3.19)
, where N is the population number, K is the number of elitism chromosomes and mating size is
the proportion of the mating pool that could range from 0 to 1. When the mating size is 1 means
that are not going to generate new chromosomes in each generation. Thus, the number of new
chromosomes that will be generated and introduced in the new generation is determined by the
equation 3.20.
:P� vℎ�L,LbL,Pb = : − +LuubJ�7NM + ¢- � ℕ (3.20)
CHAPTER 3. DESIGN AND IMPLEMENTATION 63
On the other hand, another approach given by [107, pp. 49-74] to reduce the chances of local
minima consists in selecting randomly the lesser fit chromosomes. Therefore, the remaining
chromosome are selecting from the bottom positions from the descending list instead of generating
new chromosomes.
The implementation of this process is performed by selecting randomly the chromosomes from
the middle of the descending list until the end of it. The number of chromosomes to be selected is
calculated employing the same equation 3.20.
Elitism (K)
Mating_size ≠ 1
Insert to new generationMating pool
(N-K) × mating size
Generate new
chromosomes?
Generate new populationSelecting least fit individuals
YesNo
Insert to new generation
Yes
Figure 29: Mating pool creation
3.4.8 Selection process
The selection process, also known as reproduction, consists in choosing randomly members
of the population for the mating pool as it has been explained in section 2.2.2.4.3. The selection
process chosen for the implementation are the roulette wheel, rank-based roulette wheel selection
(RRWS) and exponential selection. The roulette wheel presented in section is 2.2.2.4.3 initial
approach followed by [27]. In order to observe the impact in the selection method, these two
methods of selection are proposed also: RRWS and exponential selection.
64 CHAPTER 3. DESIGN AND IMPLEMENTATION
The RRWS method is linear and therefore is proportional to the position it occupies in the entire
range. Therefore, with this method is lost the proportionality that on the contrary has the roulette
wheel selection, but it gives higher chances to the worst chromosomes to be chosen. On the
contrary, the exponential selection method improves the proportionality applying the exponential
method. In this exponential method, the best chromosomes are favoured and at the same time, the
least chromosomes have the chance to be chosen. All these effects and differences will be analysed
during this section as well as their equations to compute the probability and rank.
3.4.8.1 Rank-based roulette wheel selection
In the rank-based roulette wheel selection, a numerical ranking for each chromosome
according to their fitness is assigned. Therefore, the selection is done based on this ranking not in
their fitness value. In this manner, the chromosomes first are ordered according to their fitness and
then the probabilities are calculated depending to the rank assigned (where the worst case is the
first positions) [37].
This method may avoid premature converging, but at the same time converting it to a slow
convergence, because the best chromosomes do not differ substantially from the others [44].
Therefore, it could improve the diversity due to the very best fit will not dominance harming the
less fit.
The probability of the chromosome � can be calculated given by the equation 3.21 [44]
��HN£� � ��∑ ������ (3.21)
Where, �� is the rank position of the chromosome � and N is the number of chromosomes.
This equation is the same than equation 22, but the ranking is used instead of the fitness and the
cumulative sum of the fitness.
In linear ranking selection, the probability to be chosen can be controlled by the selective pressure
[44] that is the pressure of competition to survive and have offspring. This means that increasing
the selective pressure SP, following the Darwinian theory of natural selection, would lead over
time to select those individuals (chromosomes), which have a better fitness and the extinction of
CHAPTER 3. DESIGN AND IMPLEMENTATION 65
those ones with the lower fitness. Nevertheless, it must be taken into account that a big SP could
lead to a fast converge.
Before applying the scaled linearly raking is needed to sort the chromosomes by their fitness where
the fittest chromosomes will be in the first position and the least fit in the last position of this rank.
To calculate the rank for each chromosome in the linear ranking selection is calculated by the
equation 3.22 [44] [41].
�HN£+�Lb- � 2 ; �� ^ ¤2 ∗ +�� − 1- +�Lb − 1-+N − 1- ¥ (3.22)
2.0 ≥ �� ≥ 1.0
Where n is the number of chromosomes and �Lb the number of the position of the chromosome.
The effect of the SP on the probabilities is illustrated in Figure 30 where 100 chromosomes are
created. A fitness value from 1 to 100 respectively is assigned to these chromosomes in order to
analyse clearly the consequence of changing the selective pressure.
Figure 30: Effect of the SP on the probability
66 CHAPTER 3. DESIGN AND IMPLEMENTATION
3.4.8.2 Exponential ranking wheel selection
The exponential ranking proposed by [108, p. 34] consists in use exponential weight to control
the probability to be chosen. The base is the parameter to control the exponential degree. The
lowest exponential behaviour is reached for the unity. This method permits higher selective
pressure than the RWS, favouring those ones that have a better fitness and the least fit
chromosomes.
Different authors propose several equations as [109], [108, p. 34] or [41, p. 10]. The used for this
implementation is the equation 3.23 proposed by [108, p. 34].
�HN£+�Lb- � �(2_\V (3.23) 0 < � < 1
The same scenario described previously in the ranking based selected is proposed for the
exponential ranking. The 100 chromosomes are ranked and the exponential weight is changed to
observe the effect on their probabilities. This is illustrated in Figure 31.
Figure 31: Effect of C on the probabilities
CHAPTER 3. DESIGN AND IMPLEMENTATION 67
3.4.8.3 Example of effects on the selection method
The differences of this two methods can be analysed observing the Figure 32, where it is
depicted the probabilities to be chosen for 100 numbers, from 1 to 100, by the two methods and
different selective pressure and exponential weight. As can be seen, with the exponential ranking
the best chromosomes obtain higher probabilities to be chosen – to survive-rather than the worst
fitted chromosomes that the probability are close to zero. On the other hand, with the ranked based,
it higher probabilities to survive are given for the worst fitted chromosomes.
Figure 32: Ranking vs Exponential
Another scenario is proposed creating randomly 100 chromosomes and using the three different
methods: roulette wheel, ranked based and exponential ranking. Unlike the last scenario, the
chromosomes are created randomly and therefore the fitness and rank values do not increase
proportionality. Therefore, the probabilities depend on their fitness or rank, and two or more
chromosomes can have the same rank or fitness value. In Figure 33 can be analysed how the
ranking based assign higher probabilities to those chromosomes with worst fitness or rank. In
contrast, with the roulette wheel or exponential ranking the probabilities are close to zero o zero
for those chromosomes. Moreover, it can be seen that, with the ranking selection method there are
68 CHAPTER 3. DESIGN AND IMPLEMENTATION
not high differences between the worst and the best chromosomes in terms of probabilities. For
example, between the chromosome 1 and chromosome 93, the probabilities are 0.18 % and 3.57
% respectively. On the other hand, between the exponential – 0.7 of C- and the roulette wheel there
are higher differences between the best fitted chromosomes as the 45, 47 and 93 which have higher
probabilities because these are proportional to their fitness value. In addition, with the exponential
ranking chromosomes as 50, 76, 96 obtain higher probabilities with this method rather than the
roulette wheel. This is because the probabilities are better distributed among the best
chromosomes.
Figure 33: Three selection method example
3.4.9 Crossover operator
The main operator working on the parents is the crossover operator that creates two offspring
combining the genotype of both parents with a certain probability of crossover. As it is explained
in section 2.2.2.4.4.1, a locus is randomly selected and the genes before and after that point are
interchanged to create two offspring.
CHAPTER 3. DESIGN AND IMPLEMENTATION 69
The matting pool may have an odd number of parents and therefore to perform the crossover with
one parent will be impossible. Hence, in case of an odd number, the most frequent parent is found,
removed from the mating pool and introduced as an offspring. Thus, even if this chromosome did
not pass to the crossover operator, it will be present in the mating pool for the mutation operator.
3.4.10 Mutation operator
The mutation operator consists in random changes in the genes of the offspring, adding more
diversity to the search process and therefore allowing finding other solutions. This process is
performed interchanging two values for two genes placed in different locus. Thereby two locus of
the chromosome are randomly selected, it is implemented to get different locus and these locus are
interchanged.
The mutation is done with a certain probability called mutation probability and normally is low
(values of 0.01 [27] or 0.1 [42]) because otherwise the search would be done as a random process.
This is also demonstrated in [110, p. 21]
3.4.11 New population
In this process, the offspring created with the crossover process and mutation within the
elitism chromosomes are put together to form the new generation. Moreover, if mating pool size
is not 1 the new chromosomes are generated to be inserted in this new generation or on the
contrary, they are inserted those reserved chromosomes – as it is explained in section 3.4.7-.
Finally, it is necessary to calculate the fitness of this new population. If GA is still running – the
stop criterion is not activated – the last processes will be repeated again.
Offsprings Elitism chromosomes New chromosomes
New population
Figure 34: New population
70 CHAPTER 3. DESIGN AND IMPLEMENTATION
3.4.12 Stopping criteria and error evaluation
Some criteria for the error measurement during the training set and after the prediction are
used. Some of these criteria are the MAPE (Mean absolute percentage error), MAE (Mean absolute
error) and MSE. The last two were explained in previous sections and now are used for the
prediction instead of the training set evaluation. These indicators are used following the criteria
proposed by several authors such as [111], [112] and [113].
�S� � 1� � | ! − '!|�
!�?/� (3.24)
Where N is the prediction horizon, ! is the predicted sample with the best chromosome, '! is
the original sample, T is the last sample of the training set and P is the number of samples
predicted.
��� = 1� � + ! − '!-$�
!�?/� (3.25)
�S�� = 100 %� � ª ! − '!'! ª�
!�?/� (3.26)
The MAPE is taken as a criterion to stop the algorithm if the set threshold along with the maximum
number of generations is reached. These two values are set before the algorithm it is executed.
3.4.13 Prediction
Once the GA finishes because the stop criteria is reached, the best chromosome for the last
generation is taken as the best function. The prediction is done shifting the window one by one
until is predicted the last sample of the predicted horizon. Before the first sample it is predicted, it
has the phenotype and genotype of the best chromosome for each sample of the training set. In
order to make the prediction, to shift and create the variables ['!]?/�?/�2� is needed sometimes to
evaluate the chromosome. The creation of the variables could be necessary if the best chromosome
have in its function a time series arguments. This is because the first sample predicted could
contain the last sample of the training set. Therefore, it would be necessary to calculate the
CHAPTER 3. DESIGN AND IMPLEMENTATION 71
phenotype of the first sample that has already been predicted in order to be able to predict the
second, and so forth until the next-to-last sample.
3.4.14 Interface and parameters
The interface is the document where the main parameters and functions of the genetic
algorithm are set and can be changed later easily. The main parameters that could be changed are
expressed in Table 7.
Explanation Variables
Embedding dimension embed_dimen
Delay time delay_time
Number of chromosomes N
Length of the training set training_set
Number of generations generation
Tolerance of the error measured from MAPE quality_error
Elitism percentage Elitism
Mating pool size mating_pool
Mutation probability pm
Crossover probability pc
Number of preferred time series in the chromosome lengthZ
If the preference is wanted or not (set 1 to activate and 0 otherwise) pref
Generation of new chromosomes in each iteration div
Length of the chromosomes Long_eq
Selective pressure for ranking selection SP
Selective pressure for exponential selection x_fac
Maximum number for the numerical argument (Z) upperbound
Minimum number for the numerical argument (-Z) lowerbound
Probability of selecting a number in the chromosome generator,
instead of a time series argument.
pn
Scale factor that is used for the input data factor_input
Maximum of the GA data that is going to be used (scaled data) max_value
72 CHAPTER 3. DESIGN AND IMPLEMENTATION
Table 7: Main parameters
While the algorithm is running, different parameters and graphs are displayed to the user with the
main important data selected. This is important to mention, due to during the results and
conclusions some of these parameters and graphs with the corresponding results will be shown.
Hence, as can be noticed bellow is illustrated in Table 8 the main subset of graphs that is presented
all as a single one for the user. The Table 8 (a) shows the best chromosome function that it is
ongoing in that generation. That chromosome is evaluated for all the samples in the training set
and it is depicted as well as the original samples that are trying to predict. Therefore, the X-axis
represents the corresponding samples in the time series and the Y-axis is the value of the time
series scaled. The Table 8 (b) presents the SSE of the best chromosome in each generation. The
Table 8 (c) indicates the fitness sum of the population in each generation in order to be able to
observe if a better population is growing. Finally, in Table 8 (d) is shown the best fitness value of
the best chromosome in each generation.
(a) (b)
(c) (d)
Table 8: Set of graphs for the user
CHAPTER 3. DESIGN AND IMPLEMENTATION 73
In addition, in each generation are displayed in MATLAB some important parameters as the
generation that is ongoing, the SSE value, the fitness value, the best chromosome genotype for the
first training set sample and its expression (Infix notation). Besides are shown some statistical
parameters as the MSE, MAE and MAPE. Theses statistical parameters correspond to the
calculation during the training set. Therefore, in order to differentiate these parameters from the
final ones, these are presented as the MSE_bp (MSE before prediction), MAE_bp (MAE before
prediction) and MAPE_bp (MAPE before prediction).
Generation 50 best_value_fit = 57,09119 best_value_SSE = 12.026125 The best chromosome is [9.878634814689685e+02] [3.998763627742979e+02] '*' 'exp' 'sin' 'A15' '-' MSE_bp =0.697213 MAE_bp =0.665095 MAPE_bp =5.09633 % Expression = '(sin((987.8635*399.8764))-A15)'
When the algorithm finishes and the best chromosome are selected to do the prediction is shown
a zoom of the training set result for this chromosome along with the original samples.
Figure 35: Best chromosome for the prediction
74 CHAPTER 3. DESIGN AND IMPLEMENTATION
Finally, the prediction is performed and it is shown the results of the prediction (samples) along
with the training set as can be seen in Table 9 (a). In Table 9 (b), the expression and statistical
parameters of this prediction (MSE, MAE, and MAPE) are shown. As in Table 8 (a), the Table 9
(a) X axis are the corresponding samples in the time series but for the Y axis in this case
corresponds to the real value of the time series. It is noteworthy that the statistical results are from
the prediction samples along with the training set.
The statistical parameters shown at the end of the prediction correspond only to the calculation of
the prediction, excluding the training set. This is done in order to be able to compare subsequently
the statistical parameters of the training set and the prediction.
As MSE and MAE are not relative parameters and therefore depends on the scale that is being
used the prediction is performed in the same scale used in the training set.
Finally, an optional display could be shown the prediction in the real scale as it is shown in Table
9 (c).
(a) MSE =2.113402 MAE =1.378817 MAPE =6.173257 % Expression = '(sin((987.8635*399.8764))-A15)'
CHAPTER 3. DESIGN AND IMPLEMENTATION 75
(b)
(c)
Table 9: Prediction and statistical results
3.5 Summary
In order to study the TCP available bandwidth, 4 different scenarios with different situations,
with and without PUs, will be tested. In order to do that, NS2 is configured with 802.11 g
parameters and the MAC implementation is modified, including the PU spectrum management, in
order to drop the packets if a PU is using the medium. The study does not use Ad-Hoc Routing
agent. Traces of real traffic can be played back in NS2. The scenarios are implemented by
programming a Tcl script.
The implementation of the GA is programmed in MATLAB 2013. For the implementation process,
different fitness functions are implemented to test them and find the better one for the data that it
is going to be analysed. Moreover, two diversity methods are applied to avoid the genetic algorithm
to be stuck in a local minimum and add more diversity.
76 CHAPTER 3. DESIGN AND IMPLEMENTATION
Chapter 4 Evaluation
In this chapter, we evaluate out approaches using different scenarios. First, we perform several
simulations to evaluate the impact of different Primary User patterns on TCP throughput. Finally,
we evaluate how good our genetic algorithm can predict irregular time series like TCP throughput
samples.
4.1 Introduction
This chapter presents the most relevant results of the test carried out using the implementation
built in the project during the thesis. During the evaluation we have carried out a large number of
tests in different scenarios. First, the chapter presents the results and analysis of the available TCP
bandwidth and MAC busy time using different ON-OFF PU activity patterns. Later, an evaluation
of the performance of the forecasting of the genetic algorithm is performed. Finally, we analyse
how good the Genetic Algorithm can predict the available TCP bandwidth for different ON-OFF
PU activity patterns also using real traffic traces.
CHAPTER 4. EVALUATION 77
4.2 Evaluation of available TCP bandwidth for different ON/OFF
PU activity patterns.
This section tries to analyse the available throughput at TCP layer based on a given PU activity
pattern. The results are presented in some cases in comparison with UDP performance using the
same situations. The relation between TCP throughput and MAC layer busy time is also evaluated.
Three different scenarios have been used, which are detailed in section 3.2. The results presented
in this section are grouped into three main groups:
• TCP available bandwidth with fixed non-random patterns of PU activity
• TCP available bandwidth with random generated patterns of PU activity
• TCP available bandwidth using real traffic traces for Secondary Users
4.2.1 Deterministic patterns
In this set of tests, fixed patterns of PU activity have been used to study the response of the
TCP available throughput for different PU activity times. In order to do so, the multiplication by
the exponential random factor for the generation of the PU activity pattern has been disabled.
Although this test case rarely resembles reality, it is a good base to test the proper functioning of
the simulation and draw interesting conclusions. These conclusions clarify several situations that
may be very difficult to detect with randomly generated patterns. The available bandwidth shown
here is the average value of the whole simulation time. For all this tests, each simulation run has a
length of 100 seconds.
4.2.1.1 50 percent fixed ON-OFF rate
In this scenario, we assume that PUs are active 50% of the simulation time. We vary the time
that PUs are active by setting alpha equal to 2 times beta. As a result, the shape of the pattern is
periodic.
78 CHAPTER 4. EVALUATION
Figure 36: Available bandwidth for 50% PU ON
The available throughput for different alpha/beta values is shown in Figure 36. As can be seen, if
the retransmission time after a PU ON period matches with the OFF periods, the available
throughput is higher. For example, for beta from 0.7 to 1.4, the available bandwidth goes from
zero to zero again and the maximum is reached for beta = 1.3 (around 9Mbit/s). This 9Mbit/s is
close to a 50% of the maximum TCP throughput achievable, which is 20Mbit/s. This fact has been
documented in APPENDIX B with several figures that may help for a better understanding of the
situation.
The reason why the throughput is zero at some points is that, as the ON-OFF pattern is periodic,
whenever the retransmission coincides with the PU ON period it always coincides with ON PU
activity (even if this retransmission time doubles with every retransmission). Therefore, the
resultant throughput is zero as no packets are received. In this case, the throughput is 0 whenever
beta= i x 0.7,being i=2,4… because the sender has never the chance to transmit
In contrast, UDP has a very similar available bandwidth for all the tests (all alpha/beta
combinations). However, the throughput increases marginally with beta.
0,00E+000
2,00E+006
4,00E+006
6,00E+006
8,00E+006
1,00E+007
1,20E+007
1,40E+007
1,60E+007
0 , 1 0 , 3 0 , 5 0 , 7 0 , 9 1 , 1 1 , 3 1 , 5 1 , 7 1 , 9 2 , 1 2 , 3 2 , 5
AV
AIL
AB
LE B
AN
DW
IDT
H [
BIT
S/S
]
BETA
UDP TCP
CHAPTER 4. EVALUATION 79
4.2.1.2 25 percent fixed ON rate
In this scenario, combinations where PU traffic is ON for 25% of the time are studied, resulting in
alpha equal to 4 times beta.
Figure 37: Available bandwidth for 25% PU ON
The available bandwidth for 25% of PU ON activity is depicted in Figure 37. The figure shows a
similar shape as the one in the previous section. The maximum available bandwidth for TCP is
close to 15Mbit/s, that is, a 75% of the UDP throughput can be achieved, the maximum possible
is 20Mbit/s.
4.2.1.3 Fixed alpha and different beta values
This point shows how the available TCP bandwidth evolves as beta changes. In this case, alpha is
fixed and beta varies. There are two cases, one with very small alpha and beta values (see Figure
38), and another with high values (see Figure 39).
0
5000000
10000000
15000000
20000000
25000000
0 , 1 0 , 2 0 , 3 0 , 4 0 , 5 0 , 6 0 , 7 0 , 8 0 , 9 1 1 , 1 1 , 2
AV
AIL
AB
LE B
AN
DW
IDT
H [
BIT
S/S
]
BETA
TCP UDP
80 CHAPTER 4. EVALUATION
Figure 38: Available bandwidth for alpha equal to 0.0768 and different beta values
Figure 39: Available bandwidth for alpha equal to 2.8 and different beta values
Figure 38 and Figure 39 show that TCP has very low throughput compared to UDP when the PU
OFF periods are very short. This is because the TCP Congestion Window takes time to grow and
in contrast, UDP always sends at the maximum possible rate if the medium is idle. Additionally,
as can be seen in Figure 39 the retransmission time has an important effect on the throughput
making the available TCP bandwidth zero for ON-OFF patterns lower than 50% of OFF. The
reasons for this zero available bandwidth are the same as in 4.2.1.1. To sum up, idle time does not
mean that this time will be always useful for TCP.
0,00E+000
5,00E+006
1,00E+007
1,50E+007
2,00E+007
2,50E+007
3,00E+007
AV
AIL
AB
LE B
AN
DW
IDT
H [
BIT
S/S
]
BETA
UDP TCP
0,00E+000
5,00E+006
1,00E+007
1,50E+007
2,00E+007
2,50E+007
3,00E+007
0 , 0 1 0 , 0 2 0 , 2 0 , 4 0 , 6 0 , 8 1 1 , 2 1 , 4 1 , 6 1 , 6 1 , 8 2 2 , 2 2 , 4 2 , 6 2 , 8
AV
AIL
AB
LE B
AN
DW
IDT
H [
BIT
S/S
]
BETA
UDP TCP
CHAPTER 4. EVALUATION 81
4.2.1.4 Wide range of alpha and beta values
In order to have a broad view on of the behaviour of the available TCP throughput, a series of
tests with a very wide range of alpha and beta values have been carried out. Varying alpha and
beta from 0.1 to 5 using a step size of 0.1, we get a total of 2500 combinations and each
combination requires a simulation. Each simulation has a duration of 100 seconds.
4.2.1.4.1 Available bandwidth
The maximum achieved throughput for TCP is 20 Mbps while UDP achieves 28.5 Mbps. This
maximum available bandwidth is achieved when beta is very small and alpha is very large,
meaning an almost idle cahnnel. These values can be observed in Figure 40.
(a)TCP (b)UDP
(c)TCP and UDP (d)TCP and UDP in other perspective
Figure 40: 3D graphs with the available bandwidth for different alpha/beta combinations and non-random patterns
The graphs in Figure 40 show, how the TCP throughput is always lower than the UDP, as expected.
The response of TCP neither follows a smoothed shape as UDP does. For combinations where
82 CHAPTER 4. EVALUATION
alpha is close to beta, either TCP is not able to send or the TCP Congestion Window is not able to
grow to a large value even if there are idle periods. This makes the available bandwidth in this area
very small or zero. Also, the available bandwidth has abrupt changes in the alpha-beta plane. This
has to do with the retransmission time and the period of the pattern for the same reasons as
explained in 4.2.1.1.
4.2.1.4.2 MAC busy time
In Figure 41, we show the MAC busy time for different PU patterns comparing UDP and TCP.
The graphs (c) and (d) in Figure 41 are other perspectives of the same data. A MAC busy time of
zero corresponds to always idle and 1 to a 100% busy time. The highest MAC busy time (100%
of usage) is achieved when only PUs are using the medium. Looking at the zone where the medium
is shared between the PUs and SUs (that is, when alpha is greater that beta) the busy time is clearly
never the maximum. This is a consequence of the back-off time between transmissions and the
RTO. The more the available time for TCP, the more packets are sent and obviously, the more
back-off time periods will not be added to the MAC busy time.
For the same reasons as explained in 4.2.1.1, there are some idle periods of time that are not used
during all the simulation. This explains the triangular shape in the graph. The next section makes
a comparison between the available bandwidth and MAC busy time that helps to understand this
section.
CHAPTER 4. EVALUATION 83
(a)MAC layer busy time for TCP (b)MAC layer busy time for UDP
(c) TCP other perspective (d)UDP other perspective
(d)TCP and UDP
Figure 41: 3D graph with the MAC business for different alpha/beta combinations and non-random patterns
4.2.1.4.3 Available bandwidth vs MAC busy time
In this section, we made a comparison in order to find a relation between the available
throughput and the MAC busy time. The graphs in Figure 42 show the available throughput
84 CHAPTER 4. EVALUATION
normalised with the maximum achieved and the MAC busy time. In both cases, zero means no
utilisation and 1 means a 100% MAC layer utilisation.
(a) TCP (b) UDP
Figure 42: 3D graph with the MAC business and normalized throughput different alpha/beta combinations and non-random patterns
Comparing both MAC busy time and utilisation of the bandwidth, there are some alpha-beta
combinations with high percentage of idle time being unused. In Figure 42 (a), we can observe
that for high values of alpha and beta between three and two: the throughput remains constant as
beta decreases. In addition, the busy time reduces when beta reduces towards two. Here TCP
cannot utilize the MAC layer available time. The RTO is involved in the unused time because the
pattern is due to PU activity pattern reasons already explained in 4.2.1.1
4.2.2 Randomly generated patterns
This set of tests is carried out by adding randomness to the generation of the PU activity
patterns and changing the seed in every generated PU activity period. The available bandwidth
shown in this section here is the average of the whole simulation time. For all these tests, each
simulation test has a length of 100 seconds.
4.2.2.1 50 percent fixed ON-OFF rate
In this set of scenarios, we use alpha equal to two times beta resulting in 50% of PU active
time. These tests are performed only once. As we will see, is this scenario, the random generation
of PU activity has big impact. Comparing the results of non-random and random generated PU
activity patterns in Figure 36 and Figure 43, in the random case there is no clear relation between
CHAPTER 4. EVALUATION 85
specific alpha/beta values. This is due to the effect of randomness. The resulting available
bandwidth is very random for both UDP and TCP because the activity patterns are random. Also
in a microscopic scale, the ON-OFF pattern might look like there are no relation at all between
alpha/beta values and TCP throughput. This effect will be smaller if the simulation time is longer.
In TCP, some combinations of alpha and beta result in available bandwidth equal to zero. This
is due to random combinations that make the TCP retransmission attempts match with PU ON
activity for all the attempts to transmit. The TCP RTO is calculated using (among others) the RTO
back-off multiplicative factor. This factor doubles with every failed retransmission attempt. This
factor will grow very large if during the simulation, just by chance, several consecutive
retransmissions fail because they coincide with PU ON periods. The randomness of PU ON
activity is the main fact that makes the available bandwidth very unpredictable.
Figure 43: Available bandwidth for 50% ON and random generation
There is not a clear relation between alpha/beta values and the available throughput, but the trend
shows that the available bandwidth increases as alpha/beta values do. This is because the TCP
Congestion Window is able to grow to larger values because lager alpha/beta mean larger PU OFF
activity periods. Therefore, when TCP finally has the chance to transmit, it transmits longer,
increasing its throughput. As a conclusion, we can say that an a priori estimation of the available
bandwidth based only on alpha and beta values will not be reliable.
0,00E+000
5,00E+006
1,00E+007
1,50E+007
2,00E+007
2,50E+007
0 , 1 0 , 3 0 , 5 0 , 7 0 , 9 1 , 1 1 , 3 1 , 5 1 , 7 1 , 9 2 , 1 2 , 3 2 , 5
AV
AIL
AB
LE B
AN
DW
IDT
H [
BIT
S/S
]
BETA
UDP TCP Linear (TCP)
86 CHAPTER 4. EVALUATION
4.2.2.2 25 percent fixed ON
In this scenario, we set alpha equal to 2 times beta, resulting in a 25% of PU active time. These
tests are done only once. The same comments as for the 50% are valid for this case. Figure 44
shows why alpha equal to four times beta results in higher throughput on average compared to
alpha equal to two times beta.
Figure 44: Available bandwidth for 25% ON and random generation
4.2.2.3 Fixed alpha and different beta values
In these tests, the alpha value is fixed to 2.8 and the beta value changes from 0.1 to 2.5. Each
test is run only once. As can be seen from Figure 45, as the available bandwidth decreases as beta
decreases. Comparing the results of non-random versus random generation of PU activity patterns
in Figure 39 and Figure 45, with random generation of patterns there are no specific patterns that
lead to zero TCP throughput. For low values of beta, the available bandwidth decreases even faster
than beta, up to a 40% of PU ON ratio. Around a value of 40% of PU ON ratio, this decreasing is
slower. The reasons are already explained in Section 4.2.2.4.1.
0,00E+000
5,00E+006
1,00E+007
1,50E+007
2,00E+007
2,50E+007
0 , 1 0 , 2 0 , 3 0 , 4 0 , 5 0 , 6 0 , 7 0 , 8 0 , 9 1 1 , 1 1 , 2
AV
AIL
AB
LE B
AN
DW
IDT
H [
BIT
S/S
]
BETA
UDP TCP Linear (TCP)
CHAPTER 4. EVALUATION 87
Figure 45: Available bandwidth for alpha equal to 2.8 and different beta values with random generation
Regarding the available bandwidth, for a 50% of PU ON pattern, only 5Mbit/s is achieved.
Comparing with the non-random case in Figure 39 at the same point, the difference is noticeable
as the available bandwidth was zero in that case. In this case, there is no zero throughput. This is
because as the PO ON activity is random and therefore, there are no cases where the retransmission
do not always match with PU ON activity periods.
4.2.2.4 Wide range of alpha and beta values
In order to have a wide view of the behaviour of the TCP available throughput, a series of
tests with a very wide range of alpha and beta values have been carried out. These tests are
simulated using values of alpha and beta from 0.1 to 5 with a step of 0.1, random generation of the
patterns, and 100 seconds of simulation length. A total of 2500 different combinations that will
require five tests each of them giving a total of 12500 tests carried out. The presented results are
the averages of these five tests for each combination. This is carried out to give more robustness
to the results.
4.2.2.4.1 Available bandwidth
The results of available TCP bandwidth are presented in a 3D graph in Figure 46. The
maximum achieved TCP throughput is close to 20 Mbit/s. This maximum available bandwidth is
achieved when beta is very small and alpha is very big.
0,00E+000
5,00E+006
1,00E+007
1,50E+007
2,00E+007
2,50E+007
3,00E+007
0 , 1 0 , 3 0 , 5 0 , 7 0 , 9 1 , 1 1 , 3 1 , 5 1 , 7 1 , 9 2 , 1 2 , 3 2 , 5
AV
AIL
AB
LE B
AN
DW
IDT
H [
BIT
S/S
]
BETA
UDP TCP Linear (TCP)
88 CHAPTER 4. EVALUATION
In this case, the fundamental effect of the random generation of the patterns is clear. When
comparing with the non-random case in Figure 40(a). The periodic zero throughput values that
appear in the non-random are not present in the random graph. This is because now the distribution
of the idle periods is random and therefore the retransmissions do not coincide always with PU
ON activity (not always in the same alpha-beta combinations like in the non-random case). It is
noteworthy that at a microscopic level, alpha equal to two times beta does not mean a 50% busy
time because of randomness. Also this is different for different repetitions.
Figure 46: 3D graphs with the available bandwidth for different alpha/beta combinations and random patterns
An additional study in comparison with UDP for different values, also including a study of the
standard deviation of the results, has been carried out in APPENDIX C. Regarding the standard
deviation, two areas can be clearly distinguished:
• First, there is an area with a high standard deviation where both alpha and beta are large.
This is because randomness plays a very important role in these patterns since the patterns
periods are relevant compared to the simulations length.
CHAPTER 4. EVALUATION 89
• The reminders, the big difference between alpha and beta makes the standard deviation
very close to zero as the results of the tests are very similar.
In Figure 46, the area with high standard deviation is the region with alpha and beta greater than
one.
4.2.2.4.2 MAC busy time
Figure 47 show the MAC busy time for the same tests as in the previous section. Zero
corresponds to a busy time of zero and 1 to 100% busy time. A higher MAC busy time is achieved
when only PUs are using the medium (and with a 100% of usage).
Figure 47: 3D graph with the MAC business for different alpha/beta combinations and random patterns
Looking at the zone where the medium is shared between the PUs and SUs-, that is when alpha is
greater than beta, the busy time is never 1. The busy time tends to decrease as more idle time is
available for the secondary user. In this case, in contrast to the non-random one, it is not clear if
the lower MAC busy time is a consequence of the back-off time between transmissions because
randomness has a very important impact here.
4.2.2.4.3 Available bandwidth vs MAC busy time
The graph in Figure 48 shows the available throughput normalized by the maximum achieved
and the MAC busy time. In both cases zero means no utilization and one means 100% utilisation.
90 CHAPTER 4. EVALUATION
Figure 48: 3D graph with the MAC business and normalized throughput different alpha/beta combinations and random patterns
Comparing both MAC busy time and the utilisation of the bandwidth, there are some alpha-beta
combinations with a significant portion of idle time being unused. In this case, there is no triangular
shape like in the non-random case. This is due to randomness in the generation of the PU activity
patterns. Since the TCP available bandwidth grows, the MAC busy time decreases, but not in the
same proportion. As a conclusion, a relation between the MAC busy time and available TCP
bandwidth exists, and model to estimate the available bandwidth based on the MAC busy time
may be developed.
4.2.2.5 Comparison between TCP and UDP throughput
The transition between areas having high throughput is more abrupt in TCP because the TCP
Congestion window is set to 1 MMS with every timeout. In UDP the sending rate is constant to
the maximum if the medium is idle. This means that TCP needs long PU OFF periods in order to
achieve high throughput while UDP can use all the types of PU OFF periods in a similar manner.
This dependency on the Congestion Window and how TCP available bandwidth is affected by
CHAPTER 4. EVALUATION 91
long PU ON periods is analysed in 4.2.3. The effect of long PU ON periods that result in a large
RTO value and implies some periods where neither a PU is active nor TCP is sending data is not
present in UDP. This comparison is very easy to be detected looking how the available bandwidth
and PU activity evolves in Figure 49 and Figure 50.
Figure 49: alpha=1.5 and beta=0.5
Figure 50: alpha=1.5 and beta=0.5
92 CHAPTER 4. EVALUATION
Regarding the standard deviation, the values for UDP are lower than for TCP. This is also caused
by the TCP response after timeouts. If there are some long OFF periods inside the simulation time,
TCP will achieve much higher throughput while UDP is not affected from this phenomenon.
4.2.3 Available TCP throughput over time, Congestion window, RTO
multiplicative factor and Smoothed RTT analysis.
In Figure 49 we show the TCP throughput and PU activity over time for a single simulation
run using alpha equal to 1.5 and beta equal to 0.5. This test shows how the different parameters
behave when PU activity leads to packet loss in the secondary users’ side. As can be seen, the
throughput goes to zero when PU activity is detected. The absence of PU activity does not imply
that the throughput would be the maximum just after the PU OFF period starts. This is because
TCP is still backing off making it not possible to send although there is no PU activity (e.g. between
seconds 8 and 8.5) Taking into account what is explained in section 2.2.1.2.2, for large values of
beta like in this case- long PU ON periods-, this will imply several consecutive timeouts. As a
consequence of it, the RTO multiplicative factor grows to a large value. This makes the connection
to wait longer and longer between retransmissions (large RTO). This does not have a significant
effect when the PU ON period are short as it can be observed in APPENDIX C b for alpha 1.5 and
beta 0.1.
The graphs from where the conclusion has been drawn are in APPENDIX C b. The Congestion
window grows to a high value during long PU OFF periods and only to a relatively small value in
short PU OFF periods. This is the reason why the throughput is higher when there are long PU
OFF periods.
4.3 Evaluation of available TCP bandwidth with real traffic
Secondary Users
In this section, the TCP available throughput with the presence of other Secondary users in
the medium is analysed. We have first started a download using a web browser and we have
recorded with Wireshark the traffic trace of this download. After adapting the trace, this file is
CHAPTER 4. EVALUATION 93
played back in NS2 for different scenarios. This is explained in detail in 3.3.5. The traffic used has
the throughput over time depicted in Figure 51.
Figure 51: Throughput over time of the real traffic played back
4.3.1 Sender inside the sensing area
This test is done setting up the named Scenario 2, described in 3.2.2, where the Senders can
sense each other.
Average Available Bandwidth 9.81Mbit/s
Average throughput of the real traffic played back 9.52 Mbit/s
Average Total throughput in the medium at node 1 19.33 Mbit/s
Tot pack sent by node 0 36169
Number of collisions at node 1 5719
% collisions at node 1 15.81%
94 CHAPTER 4. EVALUATION
Table 10: Test results for Scenario 2
In the graph inside Table 10, the blue line is the TCP available bandwidth and the red line is the
throughput of the real traffic played back. In this case, the available bandwidth is shared and
therefore the real traffic played back is modified in order to use only 50% of the whole bandwidth
leaving the rest for the other nodes that are also using the medium.
4.3.2 Sender outside the sensing area
This test is carried out setting up the named Scenario 3, described in 3.2.3, where the
secondary user senders cannot sense each other and therefore it leads to collisions.
Average Available Bandwidth 0.64 Mbit/s
Average throughput of the real traffic played back 11.65 Mbit/s
Average Total throughput in the medium at node 1 12.29 Mbit/s
Tot pack sent by node 0 2734
Number of collisions at node 1 1721
% collisions at node 1 62.94%
Table 11: Test results for Scenario 3
CHAPTER 4. EVALUATION 95
As expected, this case leads to a large number of MAC layer collisions and therefore packet loss
that result in a very low bandwidth. The number of packets sent is very low because when a
collision occurs, MAC layer protocol backs-off the transmission for a period. In addition, the TCP
congestion window restarts to 1MMS when collisions lead to Timeout events. In the graph
contained in Table 11, we show the throughput over time for both the traffic played back (in red)
and the TCP competing traffic throughput (in blue). The traffic played back is not affected.
However, the competing traffic throughput is very low due to collisions.
This scenario is usually called as “the hidden node problem”. In order to help in solving this,
RTS/CTS handshaking was implemented together with the CSMA/CA scheme. However,
RTS/CTS is not a complete solution since in situations without hidden nodes, it may decrease the
throughput. In this thesis RTS/CTS is disabled.
In this scenario (hidden node), the available bandwidth is similar to the case of the WLAN with
PU activity as almost all the packet are lost due to interferences (collisions in this case) if the
Interferer SU is transmitting. As a result, the study of the PU activity impact on SU may also be
applied. However, the interferer node will behave similar to a PU. The situation is not exactly the
same, as the node that is being interfered by the hidden node will possibly sense the ongoing
transmission and vice versa. Therefore, this node adjusts its sending pattern due to the CSMA
carrier sensing and vice versa.
4.4 Genetic algorithm evaluation and tests
In this section, the GA is used in different scenarios and used to perform different evaluations
such as selection method, fitness functions and diversity methods. The number of tests that it is
possible to perform is extremely high (due to all the different GA parameters). Several studies can
be found in the literature that analyse the impact of these parameters (population number, crossover
probability, etc.) on the GA performance [114] and [115].
In this thesis, the values of parameters for the GA are taken from [27], [42] and [43]. A deeper
analysis of their impact on the performance of the GA is not in the scope of this thesis. However,
in some specific situation, we may change the settings in order to adapt to the time series proposed.
96 CHAPTER 4. EVALUATION
The effects of the fitness method, the diversity selection and the selection method are analysed in
several scenarios. First, a periodic and symmetric function (i.e. a cosine function) will be used in
order to test and verify the correct operation of the GA. Then, the available TCP throughput
obtained from NS2 simulation will be used. First, a non-random pattern will be considered in order
to observe the effects for a traffic that is periodical but not symmetrical. Then, the randomness in
the PU activity will be taken into account in order to analyse the performance of the prediction in
a more realistic situation. Finally, the performance of the GA will be tested considering the
available TCP throughput from real traffic.
Therefore, each proposed scenario is broken down into three parts: the evaluation of the fitness
function, the evaluation of the selection method and diversity, and an example of the prediction.
It is noteworthy to mention that for the evaluation, the mean absolute percentage error (MAPE)
has been used. This is because this is a relative measure and does not depend on the scale used in
the data input factor. Therefore, it is easier to do a final comparison and evaluation in all the
scenarios presented with this measure.
Table 12 relates the nomenclature used for the numeration of the fitness equations in this thesis
and the variables used during the GA, which in this section will be shown.
Numeration in the thesis GA variables
Equation 3.13 Eq1
Equation 3.14 Eq2
Equation 3.15 Eq3
Equation 3.16 Eq4
Table 12: Nomenclature of the fitness equations
In addition, Table 13 relates the name of the selection method with the variable used in the GA.
Selection method GA variables
Roulette wheel selection real_fit
Ranking roulette wheel selection ranking
Exponential ranking wheel selection exponent
Table 13: Nomenclature of the selection method
CHAPTER 4. EVALUATION 97
An overview of all the results (fitness functions and diversity methods) can be found in
APPENDIX E.
4.4.1 Periodic and symmetric function evaluation
In this scenario, a cosine function is used to evaluate the different methods proposed as the
fitness function, diversity and selection method. This function is proposed because of the
periodicity and symmetry (Y-axis), which is interesting in order to observe the GA behaviour using
the different methods proposed.
4.4.1.1 Evaluation of the fitness function
In this section, the different fitness functions implemented are evaluated in order to find that
one that better results gives for a cosine function.
It is noteworthy the difficulty to compare these fitness due to all the random process in the search
of the best solution. As it was explained, the first population is generated randomly. Then, during
the reproduction process, the chromosomes are selected, crossed over and mutated with certain
probability.
Therefore, two scenarios are proposed:
• In one scenario, the same initial population is used for the tests in order to reduce the
randomness in the initial population. In addition, no new chromosomes are generated
during the GA process.
• In the second scenario, random initial population is used with diversity (generation of
new chromosomes), as the default configuration of the GA.
4.4.1.1.1 Evaluation of the fitness function for same initial population
Even if the initial population is fixed, it is impossible to avoid randomness during reproduction
process. Therefore, the test has been run 100 times to obtain an average of all the results.
First, the results of these 100 tests are presented. Then, an example of one of these 100 tests is
shown in section 4.4.1.3. The four fitness equations shown in section 3.4.5.4 are evaluated. In this
98 CHAPTER 4. EVALUATION
scenario, the diversity is set to 0 in order to reduce the randomness introduced by the new
chromosomes (in each generation). In additional, the selection method used for this scenario is the
RWS (real_fit). The GA set-up for this scenario is presented in Table 14.
Parameters embed_dimen = 28 delay_time = 1 Long_eq = 7 training_set = 25 generation = 30 pn=0.75
upperbound= 1000 lowerbound = -10 Elitism =0.1 mating_pool = 0.8 pm=0.05 pc=0.7
max_value =100; factor_input= 100.068; quality_error = 0.05; lengthZ=1 pref=1
Table 14: GA set-up for the cosine function without diversity and same initial population
Figure 55 shows the function to forecast. The X-axis of this figure represents the samples
corresponding to the time series obtained from the cosine function, and the Y-axis represents the
values of this cosine function with the factor input applied.
The results obtained from these tests are shown in Figure 52. In this figure, the MAPE (Mean
Absolute Percentage Error) and MAPE_bp (Mean Absolute Percentage Error before prediction)
are displayed.
Better results are obtained in this scenario with equation 2 and equation 3. However, there is a
higher gap between the MAPE and MAPE_bp in equation 2. However, this gap is lower in equation
1 and equation 3.
Figure 52: Fitness selection evaluation for the same initial population with periodic and symmetric function
0
20
40
60
80
100
120
E Q 1 E Q 2 E Q 3 E Q 4
MAPE MAPE_bp
CHAPTER 4. EVALUATION 99
4.4.1.1.2 Evaluation of the fitness function for random initial population
In this scenario, a random initial population is generated in the beginning of the GA. This
scenario is proposed in order to observe the behaviour of the GA in the default mode with diversity
(set to 1) and a random initial population. The four fitness equations are evaluated for this scenario.
The GA set-up for this scenario is presented in Table 14.
The results obtained from these tests are shown in Figure 53. As can be observed, equation 3 results
in the lowest MAPE and MAPE_bp.
Figure 53: Fitness selection evaluation for random initial population with a periodic and symmetric function
It can be observed that equation 3 is the best equation in both scenarios, and on the contrary,
equation 4 is the worst. These worst results in equation 4 might be due to its dependency on the
Theil’s U-statistic, instead of on the MSE.
4.4.1.2 Evaluation of the selection method and diversity for the cosine function
The selection methods as well as the diversity effects are evaluated for the cosine function.
For this study, 100 repetitions are performed and the mean of all these repetitions are presented.
In this scenario, the same initial population, as in section 4.4.1.1.1, is used for the fitness
evaluation. In this scenario, the selection methods tested are the RSW (real_fit), exponential
ranking wheel selection (exponent) and the ranking roulette wheel selection (ranking). In addition,
0
10
20
30
40
50
60
70
EQ1 EQ2 EQ3 EQ4
PE
RC
EN
TA
GE
MAPE MAPE_bp
100 CHAPTER 4. EVALUATION
the diversity is set to 0 in order to reduce the randomness introduced by the new chromosomes (in
each generation).
For these tests, the parameters used are exposed in Table 14, while Figure 54 displays the results
for the selected fitness equation 1 (Equation 3.13). Figure 54 shows the MAPE, as well as the
MAPE_bp. These values are shown as well as the number of loop to converge in order to evaluate
the faster selection method.
As expected, the MAPE is higher after the prediction, and therefore better results are obtained
during the training set (MAPE_bp). Moreover, the number of generations to converge is reduced
using diversity, with the exception of exponential method (even though this difference is small).
This may be due to the randomness introduced (with the diversity) was not feasible. These not
feasible chromosomes caused that the GA requires more generations to converge.
Figure 54: Results for the selection method with and without diversity for a periodic and symmetric function
Differences up to 80 % for the MAPE with the ranking selection, compared with and without
diversity, can be observed. Higher differences can be also discerned with real fit selection and
exponential selection applying diversity.
In conclusion, the diversity reduces the MAPE and MAPE_bp because of the introduction of new
chromosomes. Therefore, these new chromosomes may help improving the result finding a better
chromosome. Moreover, the number of loops to converge in general is reduced introducing new
chromosomes as a diversity in each generation. However, this method may lead sometimes to an
0
5
10
15
20
25
30
35
0
20
40
60
80
100
120
Eq 1 Real fit Eq1 ranking Eq1
exponential
Eq1 Real fit
d=1
Eq1 ranking
d=1
Eq1
exponential
d=1
NU
MB
ER
OF G
EN
ER
AT
ION
S
PE
RC
EN
TA
GE
MAPE MAPE_bp Generations
CHAPTER 4. EVALUATION 101
increase in the number of generations needed to converge. This is because of these new
chromosomes introduced were not feasible (worse than the current population).
In general, the best results (MAPE and MAPE_bp) are obtained with the real fit and exponential
selection.
4.4.1.3 Example of prediction with symmetric and periodical function
In this section, an example of prediction for the periodical and symmetric function is explained.
Figure 55 shows the cosine function in order to identify the instants of time that corresponds each
sample.
Figure 55: Cosine function
In the next example, depicted in Table 15, the best solution in 30 generations is found in the first
reproduction process (generation 2).
The set-up used for the GA along with this example using a cosine function is shown in Table 15.
embed_dimen = 28; delay_time = 1; Long_eq = 7; Selection_m = ‘real_fit’; training_set = 25; generation = 30; fitness_select = 'Eq1';
pn=0.75; upperbound= 1000; lowebound = -10; Elitism =0.1; mating_pool = 0.8; pm=0.05;
pc=0.7; div=0; lengthZ=1; pref=1; max_value =100; factor_input= 100.068;
102 CHAPTER 4. EVALUATION
MSE_bp =0.559079
MAE_bp =0.666854
MAPE_bp =4.824 %
Expression = ((629.1349-629.1349)-A15)
SSE_bp = 13.976979
Fitness = 62.10981
MSE =1.530442 MAE =1.072374
MAPE =5.0982 % Table 15: Example 1 for a cosine function
Table 15 shows how the best solution is found in the second generation, which will be the final
solution for the prediction.
4.4.2 Non-random traffic with ON-OFF pattern evaluation
In this scenario, the available TCP throughput for a non-random ON-OFF traffic pattern of PU
activity, which is obtained from the NS2 simulation, is evaluated through the GA.
CHAPTER 4. EVALUATION 103
The different methods proposed as the fitness function, diversity and selection method are
evaluated. This scenario is interesting in order to observe the effects described before for a non-
random ON-OFF pattern with a 50% of activity.
The values of this non-random traffic are « � 2.2 and ¬ = 1.1. The available throughput in such
ON-OFF pattern to forecast is illustrated in Figure 56, where each sample corresponds to 0.1 s.
Figure 56: Throughput from non-random pattern with alpha 2.2 and beta 1.1
4.4.2.1 Evaluation of the fitness function
In this section, the different fitness functions implemented are evaluated in order to find that
one giving the best results for the non-random traffic with the ON-OFF pattern.
As it was carried out in the previous analysis with the cosine function, two scenarios are proposed:
one with the same initial population without diversity (no generation of new chromosomes) and
the other one with a random initial population with diversity.
The results for both tests, same initial population and random initial population, are depicted in
Figure 57 and Figure 60 respectively.
4.4.2.1.1 Evaluation of the fitness function for non-random and same initial population
The selection methods as well as the diversity effects are evaluated for the proposed scenario with
the same initial population and without diversity. The RWS is used for the fitness equation
evaluation. In this scenario, the diversity is set to 0 in order to reduce the randomness introduced
104 CHAPTER 4. EVALUATION
by the new chromosomes (in each generation). In additional, the selection method used for this
scenario is the RWS (real_fit).
The tests are run 50 times using the following parameters exposed in Table 16.
Fixed parameters embed_dimen = 95; delay_time = 1; Long_eq = 7; training_set = 50; generation = 100; pn=0.75;
mating_pool = 0.8; pm=0.05; pc=0.7; lengthZ=1; pref=1;
max_value =10; factor_input= 4.789e-7; quality_error = 0.01 lowerbound = -10; Elitism =0.1; upperbound= 100;
Table 16: Parameters GA for non-random using same initial population
The results obtained in this scenario are depicted in Figure 57. In this figure, the best result in
terms of MAPE is obtained with the equation 1. On the contrary, the worst results are obtained
with the equation 2.
Figure 57: Fitness selection evaluation for the same initial population with Non-random ON-OFF pattern
It may astound how it is possible to reach a difference up to 239 % between equation 1 and 2, if
the GA is started with the same set-up and the same initial population. An explanation can be found
by analysing the equation 2 summarized in Table 17. The best and the worst cases of this equation
are exposed in Table 17.
0
50
100
150
200
250
300
Eq 1 Eq 2 Eq 3 Eq 4
PE
RC
EN
TA
GE
MAPE MAPE_bp
CHAPTER 4. EVALUATION 105
The worst MAPE results increase the average and therefore this explains why is obtained these
differences between equation 1 and equation 2.
Same initial population
Fitness equation Best MAPE result Worst MAPE result
Eq2 1.2754 % 1752 %
Table 17: Best and worst result for equation 2 with the same initial population
Figure 58 illustrates an example of the worst and best result for equation 2. In this graph, the
training set and the prediction area for both chromosomes, as well as the original, can be observed.
Figure 58: Best and worst prediction equation 2 for the same initial population
This worst prediction result obtained with equation 2 could be compared along with another
chromosome with higher MSE but with lower MAPE. Therefore, this new chromosome with
higher MSE will have a lower fitness (calculating this with equation2). This example is shown in
Table 18 and graphically depicted in Figure 59. The complete set of tests for this equation 2 can
be found in APPENDIX D.
We may infer that the selected chromosome, when the only criterion used is the MSE, can be
selected one chromosome with higher MAPE but lower MSE. This is due to the fact that, since
errors are squared, those less than the unity become smaller and smaller.
As shown in the example Figure 59, in the area where the original function is close to zero, the
chromosome with the worst prediction is close to 1.4. When this difference is squared, it becomes
106 CHAPTER 4. EVALUATION
close to two. On the other hand, with the other chromosome depicted (with MAPE equal to 27.92),
in the area where the original function is close to 9 the prediction is close to 12. Thus, when this
difference is squared, it increases up to 9. Finally, the MSE is summed up and even though the
second chromosome has a lower MAPE, this has a higher MSE. This results in choosing the
chromosome with the worst prediction instead of the other.
Therefore, even when a chromosome has a lower MSE, this could have a higher MAPE.
Before the prediction After the prediction
fitness MAE_bp MSE_bp MAPE_bp MAE MSE MAPE
Prediction 1 0.27641 0.96686 2.6178 27.912 % 0.82423 2.1023 27.443 %
Prediction 2 0.35623 1.3072 1.8072 1650 % 1.2841 1.7253 1752 %
Table 18: Worst prediction and another chromosome for tests with equation 2
Figure 59: Example of Equation 2 comparing the MSE
Equations 2 and 3, which depend greatly on the MSE, arguably have this inconvenient. This is
emphasised in equation 2, which relies heavily on the MSE. The main advantage of equation 1 is
that is composed of a combination of different criteria and not only by the MSE.
CHAPTER 4. EVALUATION 107
4.4.2.1.2 Evaluation of the fitness function for non-random and random initial population
For the second scenario, the genetic algorithm is run 100 times for each fitness equation. Then, the
results are averaged and illustrated in Figure 60. This scenario is proposed in order to observe de
behaviour of the GA in the default mode with diversity (set to 1) and a random initial population.
The four fitness equations are evaluated for this scenario. The GA set-up for this scenario is
presented in Table 16.
It can be inferred, from the results exposed in Figure 60, that equation 1 is again the best fitness
equation giving the best results in terms of MAPE and MAPE_bp. It is noteworthy that in this
scenario the worst results, in terms of the average MAPE and MAPE_bp, are obtained with
equation 3. This may be because of the diversity and the randomness of the initial population could
lead to less favourable results. In addition, another reason might be that the number of generations
up to 50 could lead to do not converge the GA (because needs more generations).
Figure 60: Fitness selection evaluation for random initial population with non-random ON-OFF pattern
Despite the worsts average results obtained in the second scenario, the best result obtained with
each equation in both scenarios can be found in Table 19. The best results are obtained with the
random initial population, as in the other scenarios.
Table 19 shows a comparison between the random initial population and same initial population,
showing the best and the worst results in terms of MAPE for each equation.
Same initial population Random initial population
0
100
200
300
400
500
600
700
Eq 1 Eq 2 Eq 3 Eq 4
PE
RC
EN
TA
GE
MAPE MAPE_bp
108 CHAPTER 4. EVALUATION
Fitness
equation
Best MAPE
results (%)
Worst MAPE
results (%)
Best MAPE
results (%)
Worst MAPE
results (%)
Eq1 7.6404 27.443 0.82204 2771.1
Eq2 1.2754 1752 0.76718 3836.5
Eq3 0.97366 1403 0.76724 5760.2
Eq4 5.7929 1212.9 0.76747 4678.6
Table 19: Best and worst results from random and same initial population
In addition, another interesting observation that can be shown consists in the presentation of
the same results that were shown before (MAPE and MAPE_bp) but suppressing those with MAPE
higher than 100 % are shown in Figure 61. This figure shows that equation 2 has better results
despite the fact that has a higher percentage of errors. This means that a higher number of GA
repetitions are needed to find a better solution. Another advantage of equation 2, as it can be seen
in this figure, is the lower number of generations that are needed to find a good solution (lower
than the 5% of quality error fixed in this scenario).
Figure 61: Random initial population for non-random alpha 2.2 and beta 1.1 with MAPE less than 100%
4.4.2.2 Evaluation of the selection method for the non-random traffic
The selection methods as well as the diversity effects are evaluated for the cosine function.
For this study, 50 repetitions are performed and the mean of all these repetitions are shown Figure
62. In this scenario, the same initial population, as in section 4.4.2.1.1, is used for the fitness
0
5
10
15
20
25
30
35
40
0
5
10
15
20
25
30
35
40
45
50
Eq 1 Eq 2 Eq 3 Eq 4
NU
MB
ER
OF G
EN
ER
AT
IOS
PE
RC
EN
TA
GE
MAPE MAPE_bp Errors generations
CHAPTER 4. EVALUATION 109
evaluation. In this scenario, the selection methods tested are the RSW (real_fit), exponential
ranking wheel selection (exponent) and the ranking roulette wheel selection (ranking). In addition,
the diversity is set to 0 in order to reduce the randomness introduced by the new chromosomes (in
each generation).
Few differences exist between the three selections methods used with diversity. However, the
worst results are found with ranking selection method without diversity. It might be due to the fact
that raking method assigns more similar probabilities to all the chromosomes. Therefore, this
increases the number of generations needed to converge in a better solution. No one of the 50
repetition converges to the solution after 100 generations; that is, a solution with a MAPE less than
5% was not found in this case.
Figure 62: Results for the selection method with and without diversity for non-random ON-OFF pattern
4.4.2.3 Example of one of the best predictions
Table 20 shows a prediction of the available throughout for a non-random traffic with ON-
OFF pattern.
Table 20 (a) shows the training set along with the prediction. The GA statistics from the last
generation are exposed in (b). The statistics correspond to the same data but with scaled data (using
a range of [0-10]). The prediction is shown in (c) along with its statistics in (d). These statistical
data correspond to the same data scaled but using a range of [0-10], where can be seen a MAPE
lower than 1%.
12
14
16
18
20
22
24
26
28
30
Eq 1 real fit Eq1 exponent Eq 1 ranking Eq 1
exponent d=1
Eq 1 ranking
d=1
Eq 1 real fit
d=1
PE
RC
EN
TA
GE
MAPE MAPE_bp
110 CHAPTER 4. EVALUATION
MSE_bp = 0.0517
MAE_bp = 0.1067
MAPE_bp = 1.1521%
SSE_bp = 2.5861
Fitness = 28.5608
Training set (a) Training set statistics (b)
MSE = 0.0260
MAE = 0.0722
MAPE = 0.7671%
SSE = 1.2995
Expression =
(cos(sin((49.5949*16.9764)))*A3
0)
Prediction (c) Prediction statistics (d)
Table 20: Best prediction example for a non-random traffic with ON-OFF pattern
4.4.3 Random traffic with ON-OFF pattern evaluation
In this scenario, the available TCP throughput for a random ON-OFF traffic pattern of PU activity
obtained from the NS2 simulation is evaluated.
In this scenario, the effects of applying different fitness equations, diversity and selection method
in a random traffic with ON-OFF pattern are analysed. For this, « � 2.2 and ¬ = 0.08 are selected.
This data is obtained from the NS2 simulator and used in the GA in order to try to forecast the next
CHAPTER 4. EVALUATION 111
samples (instants of time). The interest of this scenario lies in the fact that the source is random,
and therefore it is a chance to test the ability of the GA to adapt to a chaotic patter.
Figure 63 shows the available throughput for the specified alpha and beta, where each sample is
taken every 0.1 s.
Figure 63: Throughput from a random pattern with alpha 2.2 and beta 0.08
4.4.3.1 Evaluation of the fitness function
In this section, the different fitness functions are evaluated in order to find the one that gives
the best results.
As it was carried out with the cosine function and the non-random ON-OFF pattern, two scenarios
are proposed: one with the same initial population without diversity (no generation of new
chromosomes) and the other one with a random initial population with diversity. The results of
both scenarios are depicted in Figure 65 and Figure 67, respectively.
4.4.3.1.1 Evaluation of the fitness function for random pattern and same initial population
This section analyses the impact of different fitness functions in the proposed scenario and
considering the same initial population. In this scenario, the diversity is set to 0 in order to reduce
the randomness introduced by the new chromosomes (in each generation). In additional, the
selection method used for this scenario is the RWS (real_fit).
112 CHAPTER 4. EVALUATION
The tests are run 50 times using the parameters exposed in Table 21.
Fixed parameters embed_dimen = 95; delay_time = 1; Long_eq = 10; training_set = 40; generation = 100; pn=0.75;
mating_pool = 0.8; pm=0.05; pc=0.7; upperbound= 100; lowerbound = -10; Elitism =0.1;
lengthZ=1; pref=1; max_value =10; factor_input= 4.6598e-7;
Table 21: GA set-up for the random traffic alpha 2.2 and beta 0.08 with the same initial population
The averages throughput for this scenario is illustrated in Figure 65 and Figure 66.
If we compare Figure 65 and Figure 67, we notice that the MAPE is better than the MAPE_bp.
Figure 64 shows a drop in the throughput corresponding to sample 57 in the training set. This drop
makes it more difficult to find a chromosome that fits the function. Nevertheless, during the
prediction, the throughput does not drop suddenly. Thus, the MAPE for the prediction is better
than the MAPE_bp for the training set.
Figure 64: Training set and prediction zone for random traffic with alpha 2.2 and beta 0.08
Figure 64 is shown in order to be able to understand the results depicted. The MAPE, MAPE_bp,
along with the number of MAPE values higher than the 100% (classified as errors) is illustrated in
Figure 66.
CHAPTER 4. EVALUATION 113
In Figure 65 it can be observed that equation 2, 3 and 4 have a lower MSE_bp, even though the
MAPE_bp is higher in the same equations (2, 3 and 4).
Figure 65: Fitness selection evaluation for the same initial population with Random ON-OFF pattern
In Figure 66 the MAPE, MAPE_bp and the percentage of errors in each equation are shown. In
this case, the magnitude of MAPE_bp is depicted in the secondary axis in order to observe the
other data in a clearer way. The best results are obtained from equation 1, which not only has the
lowest MAPE but also the lower percentage of errors. On the contrary, equation 2 has the highest
percentage of errors and MAPE.
Figure 66: Fitness selection evaluation for the same initial population with Random ON-OFF pattern (with errors criterion)
0
0,5
1
1,5
2
2,5
3
0
50
100
150
200
250
300
Eq 1 Eq 2 Eq 3 Eq 4
ER
RO
R (
MSE
_B
P)
PE
RC
EN
TA
GE
MAPE MAPE_bp MSE_bp
680
700
720
740
760
780
800
820
840
0
5
10
15
20
25
Eq 1 Eq 2 Eq 3 Eq 4
MA
PE
_B
P (
PE
RC
EN
TA
GE
)
MA
PE
AN
D E
RR
OR
(P
ER
CE
NT
AG
E)
MAPE Error MAPE_bp
114 CHAPTER 4. EVALUATION
4.4.3.1.2 Evaluation of the fitness function for random pattern and random initial population
For the second scenario, the GA is run 50 times but with a random initial population. This scenario
is proposed in order to observe de behaviour of the GA in the default mode with diversity (set to
1) and a random initial population. The four fitness equations are evaluated for this scenario. The
GA set-up for this scenario is presented in Table 21.
The results of these tests can be observed in Figure 67, where the differences between equation 3
and equation 1 are minimal. However, in this scenario better results are reached with the equation
1.
Figure 67: Fitness selection evaluation for random initial population with random ON-OFF pattern
Until this point, it can be stated that equations 1 and 3 give the best results for the performed
predictions in the scenarios presented. This may be due to the chaotic behaviour, for which it is
more difficult to have a better combination in all the criteria used in equation 1, rather than in
equation 3.
4.4.3.2 Evaluation of the selection method for the random ON-OFF pattern traffic
The selection methods as well as the diversity effects for the random ON-OFF pattern traffic
are evaluated. In this scenario, the same initial population is used as the one exposed in section
4.4.3.1.1. In this scenario, the selection methods tested are the RSW (real_fit), exponential ranking
wheel selection (exponent) and the ranking roulette wheel selection (ranking). In addition, the
0
100
200
300
400
500
600
Eq 1 Eq 2 Eq 3 Eq 4
PE
RC
EN
TA
GE
MAPE MAPE_bp
CHAPTER 4. EVALUATION 115
diversity is set to 0 in order to reduce the randomness introduced by the new chromosomes (in
each generation).
The tests are run 50 times using the parameters exposed in Table 21 and the average results are
shown in Figure 68.
The best results are obtained with equation 1 with diversity as well as without diversity. However,
the worst results are found with ranking selection method without diversity, maybe because the
ranking method assigns similar probabilities to all the chromosomes. Therefore, this increases the
number of generations needed to converge in a better solution. No one of the 50 repetition
converges to the solution after 100 generations; that is, a solution with a MAPE less than 5% was
not found in this case.
Figure 68: Results for the selection method with and without diversity for random ON-OFF pattern
4.4.3.3 Example of one of the best predictions
Table 22 shows a prediction of the available throughout for a random traffic with ON-OFF
pattern.
In Table 22 (a), the training set along with the prediction is shown. The GA statistics from the last
generation are exposed in (b). These statistics correspond to the same data but with scaled data
751,29 751,29
686,41
0,41 1,52 6,436,81 6,81
6,521,67 2,84 6,56
0,00
100,00
200,00
300,00
400,00
500,00
600,00
700,00
800,00
Eq 1 real fit Eq 1 ranking Eq1 exponent Eq1 real fit d=1 Eq 1 ranking
d=1
Eq 1 exponent
d=1
PE
RC
EN
TA
GE
MSE_NC MSE_bp_NC
116 CHAPTER 4. EVALUATION
(using a range of [0-10]). The prediction is shown in (c) and its statistics in (d). These statistical
data correspond to the same data scaled but using a range of [0-10].
MSE_bp = 0.9635
MAE_bp = 0.5635
MAPE_bp = 106.4868%
SSE_bp = 38.5384
Fitness = 36.5736
Training set (a) Training set statistics (b)
MSE = 0.1401
MAE = 0.2919
MAPE = 3.1997%
SSE = 5.6032
Expression = ((A49-
sin((47.3837+17.0299)))+sin(76.9
423))
Prediction (c) Prediction statistics (d)
Table 22: Best prediction example for a random traffic with ON-OFF pattern
4.4.4 Real traffic from NS2 evaluation
In this section, the throughput obtained from real traffic acquired at the library of Karlstad
University and simulated in NS2 is considered. Different tests and evaluations are performed so to
CHAPTER 4. EVALUATION 117
analyse the ability of the GA in this scenario. Again, different fitness functions, diversity and
selection methods are considered so to study their effects on the prediction.
Figure 69 shows the available throughput in such real traffic conditions to be predicted. Each
sample of the figure corresponds to 0.1 s
Figure 69: Throughput from the real traffic simulated in NS2
4.4.4.1 Evaluation of the fitness function
In this section, the different fitness functions implemented are evaluated in order to find that
one giving the best results for the real traffic simulated in NS2.
As shown in previous sections, two scenarios are proposed: the first one with the same initial
population without diversity and the second one with a random initial population with diversity.
The results from these scenarios are illustrated in Figure 71 and Figure 72.
4.4.4.1.1 Evaluation of the fitness function for real traffic and same initial population
For the first scenario, the GA is run 50 times. These repetitions are run using the parameters
exposed in Table 23. In this scenario, the diversity is set to 0 in order to reduce the randomness
introduced by the new chromosomes (in each generation). In additional, the selection method used
for this scenario is the RWS (real_fit).
118 CHAPTER 4. EVALUATION
Fixed parameters embed_dimen = 130; delay_time = 1; Long_eq = 10; training_set = 40; generation = 100; pn=0.75;
mating_pool = 0.8; pm=0.05; pc=0.7; lengthZ=1; pref=1;
max_value =10; factor_input= 5,0468e-007; quality_error = 0.05; upperbound= 100; lowerbound = -10; Elitism =0.1;
Table 23: GA get-up for the real traffic scenario with the same initial population
The average MAPE, MAPE_bp, MSE and MSE_bp are shown in Figure 71. The MSE and
MSE_bp are shown in order to observe the correlation of these parameters with the MAPE and
MAPE_bp.
The training set and the prediction area selected for this scenario (using the scaled values) are
depicted in Figure 70.
Figure 70: Training set area and prediction area for real traffic with scaled values
As it can be observed, all equations provide good results, with the exception of equation 1 that has
higher MAPE. There exists some relation between those equations that have the MSE as a criterion
(equation 1, 2 and 3), and the MAPE. For example, equation 3 has lower MSE_bp, and lower
CHAPTER 4. EVALUATION 119
MAPE. On the other hand, equation 1 has higher MSE_bp and higher MAPE. This might be
explained due to the lower standard deviation that exists in the function to forecast.
Figure 71: Fitness selection evaluation for the same initial population with real traffic
4.4.4.1.2 Evaluation of the fitness function for real traffic and random initial population
For this second scenario, the genetic algorithm is run 50 times. This scenario is proposed in order
to observe de behaviour of the GA in the default mode with diversity (set to 1) and a random initial
population. The four fitness equations are evaluated for this scenario. The GA set-up for this
scenario is presented in Table 23.
The averages of these results are depicted in Figure 72. In this second scenario, there are not too
many differences between the proposed equations. There is only a difference of around 2%
between the best and the worst result.
2,04
1,91 1,88 1,90
0,90 0,900,90 0,919,32
0,45
22,6824,47
0
5
10
15
20
25
30
0,00
0,50
1,00
1,50
2,00
2,50
Eq 1 Eq 2 Eq 3 Eq 4
ER
RO
R (
MSE
AN
D M
SE
_B
P)
PE
RC
EN
TA
GE
MAE MAE_bp POCID fit_eq
120 CHAPTER 4. EVALUATION
Figure 72: Fitness selection evaluation for random initial population with real traffic
Although the best equation in the first scenario with the same initial population is equation 3 while
in this scenario the best equation is 2, Table 24 shows that, in this scenario, the best results in terms
of MAPE are given by equation 3.
Same initial population Random initial population
Fitness
equation
Best MAPE
results (%)
Worst MAPE
results (%)
Best MAPE
results (%)
Worst MAPE
results (%)
Eq1 25.134 38.016 14.162 60.174
Eq2 29.935 29.935 17.427 36.057
Eq3 16.829 29.935 13.463 40.059
Eq4 16.799 29.935 17.154 41.486
Table 24: Best and worst results from random and same initial population with real traffic
4.4.4.2 Evaluation of the selection method and diversity for the real traffic
The selection methods as well as the diversity effect are evaluated for the real traffic. For this
study, 50 repetitions are performed. In this scenario, the same initial population is used, as the one
exposed in section 4.4.4.1.1. The selection methods tested are the RSW (real_fit), exponential
ranking wheel selection (exponent) and the ranking roulette wheel selection (ranking). In addition,
27,7425,17
25,63 26,99
25,91
15,0514,04 14,15 14,49 14,61
5,15
4,50 4,444,84 4,58
0,62 0,48 0,53 0,550,56
0,00
1,00
2,00
3,00
4,00
5,00
6,00
0,00
5,00
10,00
15,00
20,00
25,00
30,00
Eq 1 Eq 2 Eq 3 Eq 4 Eq 5
ER
RO
R (
MSE
AN
D M
SE
_B
P)
PE
RC
EN
TA
GE
MAPE MAPE_bp MSE MSE_bp
CHAPTER 4. EVALUATION 121
the diversity is set to 0 in order to reduce the randomness introduced by the new chromosomes (in
each generation).
For these repetitions, the parameters used are exposed in the Table 23 and the results are
depicted in Figure 73.
Few differences exist between the three selections methods using diversity and without diversity.
Nevertheless, better results are obtained with real fit and ranking (with diversity and without
diversity). In spite of that, the difference between the worst result and the best in both cases is less
than 1.2 %. The number of generations was the same for all of them, because they did not converge
during the 100 generations. The reason is that the MAPE_bp did not dropped below the 5%.
Figure 73: Results for the selection method with and without diversity for real traffic
4.4.4.3 Example of one of the best predictions
Table 25 shows a prediction of the available throughout prediction for a real traffic played
back in NS2.
In Table 25 (a), the training set along with the prediction are shown. The GA statistics from the
last generation are exposed in (b). These statistics correspond to the same data but using a range
of [0-10]. The prediction is shown in (c) along with its statistics in (d). These statistical data
correspond to the same data scaled but using a range of [0-10].
10
15
20
25
30
35
Eq 1 real fit Eq1 exponent Eq 1 ranking Eq 1 exponent
d=1
Eq 1 ranking d=1 Eq real fit d=1
PE
RC
EN
TA
GE
MAPE_bp MAPE
122 CHAPTER 4. EVALUATION
MSE_bp = 0.9724
MAE_bp = 0.8760
MAPE_bp = 22.2574%
SSE_bp = 38.8954
Fitness = 24.4131
Training set (a) Training set statistics (b)
MSE = 1.0444
MAE = 0.7890
MAPE = 13.5050%
SSE = 41.7770
Expression =
(0.94391*(exp(sin((70.3626/log(9
2.4674))))+A110))
Prediction (c) Prediction statistics (d)
Table 25: Best prediction example for a real traffic played back in NS2
4.4.5 Limitation
In this section, the tests that could not have been performed due to the limitation of the GA
will be explained along with the reason why of the limitation.
These limitations appear when there are large periods of OFF, and therefore the available
throughput is zero. Figure 74 shows as, for example, a random activity pattern for an alpha 2.6 and
beta 1.3.
CHAPTER 4. EVALUATION 123
This is because a high number of samples for the training set has to be selected in order not to have
an OFF period in the training set. This is why, in the case an OFF period is selected for the training
set, this will lead with the prediction of an OFF period (if this is long). The downside of increasing
the training set is the increase of the computational time, because more samples have to be
evaluated. In addition, in order to increase the training set, this randomness in the ON-OFF pattern
makes more difficult to find a function that fits in this large training set.
Figure 74: Random activity pattern for an alpha 2.6 and beta 1.3
4.5 Summary
Large number of available bandwidth results for different PU activity patterns combinations
have been obtained. The available bandwidth tends to grow lightly with alpha and beta and the
retransmission time has a fundamental impact on the available bandwidth. Randomness in the PU
activity makes predictions very unreliable. Regarding the MAC busy time, it seems to be a relation
between TCP available bandwidth and MAC busy time. In a scenario with the “hidden node
problem”, the interferer node affects harmfully to the available bandwidth.
The implementation of the GA is successful and its correct functioning has been proved. However,
a limitation in the GA for the proposed scenarios is encountered. Regarding the available
bandwidth for ON-OFF PU activity patterns tested with the GA, even though the behaviour of the
available bandwidth is chaotic, reliable outcomes have been obtained.
124 CHAPTER 4. EVALUATION
In the scenarios proposed, introducing new chromosomes in each generation as a diversity method
improve the GA in the search of a better solution (reducing the MAPE of the best solution).
Varieties of results were obtained for the selection method. Using the average of the tests
performed, in almost all the scenarios the roulette wheel (real fit) gives better results. With
equation 1 reliable results were obtained in most scenarios, even though this has not been the
lowest result. This is because equation 1 is composed of different criteria, which results in more
robust results.
125
Chapter 5 Conclusions
5.1 Final conclusions
This chapter presents the most relevant conclusions about the project carried out. This thesis
looks into the TCP available bandwidth in different scenarios and compares the results with UDP
performance and MAC busy time in the same situations. These situations include the presence of
PU activity in the wireless medium with fixed deterministic patterns, PU activity with random
generated patterns and Secondary Users activity that can be either inside or outside the sensing
range.
In addition, a Genetic Algorithm is developed in order to forecast the available TCP throughput
for different scenarios. The scenarios tested with the GA are: a periodical and symmetrical
function; a non-random and random traffic with ON-OFF patterns; and real traffic played back
126 CHAPTER 5: CONCLUSIONS
with NS2. Not only the feasibility of the prediction has been studied, but also the effect of the use
of different fitness functions in scenarios proposed has been analysed.
5.1.1 Conclusions
Here the most important conclusions are presented. Regarding the TCP available bandwidth,
the following conclusion can be pointed out:
The available bandwidth tends to slightly grow with alpha and beta
Studying deterministic fixed periodic PU activity patterns, if the relationship between alpha and
beta is fixed, meaning that the PU busy time is constant, a relationship between the alpha/beta
values and the available TCP throughput exists. The available bandwidth, that in the UDP case is
constant for all alpha/beta values, follows a pseudo periodic behaviour giving values in a range
from the maximum achievable TCP throughput and zero. Whenever the retransmission time
matches with the PU ON period, the resultant throughput is zero as the retransmissions always
match with PU ON activity time. In the same situation, but with random generated PU activity
patterns, there is not a clear relationship between alpha/beta values and the available throughput.
However, the trend shows that the available bandwidth tends to increase as alpha/beta values do.
This is because the TCP congestion window is able to grow to greater values. A more clear increase
as alpha and beta grow was expected.
Retransmission time has a fundamental impact on the available bandwidth
If PUs whose activity patterns have long ON periods is using the medium, retransmission time
plays a fundamental role in defining the available bandwidth. This is because after multiple
retransmissions the RTO grows to large values.
If alpha is fixed and beta changes, for deterministic PU activity patterns, the available bandwidth
decreases as beta grows having a zero value if the RTO matches with the period of the pattern. In
the same case, but for random generated PU activity patterns, the results show that there is a clear
correspondence. As beta grows, the available bandwidth decreases even faster until a value of a
40% PU activity time. From this time on, the decreasing is slower.
Randomness in the PU activity makes predictions very unreliable
CHAPTER 5: CONCLUSIONS 127
There is not a clear relationship between alpha/beta values and the available throughput However,
the trend shows that the available bandwidth increases as alpha/beta values do. Therefore, a priori
estimation of the available bandwidth based only on alpha and beta values would not be reliable.
A clearer correspondence was expected.
It seems to be a relationship between TCP available bandwidth and MAC busy time
Comparing both MAC busy time and the utilisation of the bandwidth, there are some alpha-beta
combinations with a percentage of idle time being unused. As the TCP available bandwidth grows
bigger, the MAC busy time decreases. However, it does not decrease in equal proportion. As a
conclusion, a relationship exists and model for the estimation of the available bandwidth based on
the MAC busy time for different PU activity patterns may be developed.
In a scenario with only secondary users and real traffic, they share the bandwidth if they can sense
each other
Having only TCP Secondary Users in the wireless scenario, the available bandwidth is shared and
therefore the real traffic played back suffers a modification (packets delays). The real traffic uses
only a 50% of the whole bandwidth leaving the rest for the other nodes that are also using the
medium. In a scenario with the “hidden node problem”, if RTS/CTS are disabled, the medium is
not shared, the real traffic played back takes almost all the available bandwidth and the collision
percentage is very large. These results match with what we expected.
The results of the available bandwidth with PU activity can be also applied to the “hidden node
problem”
In a scenario with the hidden node problem, the available bandwidth is similar to the case of the
WLAN with PU activity. This is because, if the Interferer SU is transmitting, almost all the packets
are lost due to interferences (that will be collisions in this case). Hence, we can conclude that the
study of the PU activity impact on SU can be applied to this scenario. In that case, the interferer
node will behave in a similar way as a PU. The situation is not exactly the same, since the node
that is being interfered (i.e., the one that is receiving data and gets the collisions) will probably
sense the hidden node and vice versa. Therefore, this node will modify its behaviour according to
the hidden node and the hidden node will modify slightly its behaviour.
128 CHAPTER 5: CONCLUSIONS
The proper functioning of the GA has been proven
The GA is a powerful tool used in different fields to try to predict from a time series data. As a
definition, a basic structure is given, but high number of variations can be done in order to improve
the performance of the GA. These variations stem from the basic functions of the genetic algorithm
as the selection, crossover, mutation or encoding. Beyond these variations in the basic structure
and operation of the GA, different parameters have to be set-up. These parameters add complexity
to the GA. This is because modifying these parameters will change the behaviour of the GA, and
therefore the results.
Thus, a high number of tests combinations can be performed for each scenario proposed. However,
we have proven in this thesis that the basic configuration (based on the main GA structure) of the
GA is enough to guarantee a good prediction. Obviously, as much tests and as more adjusted is
the GA for the proposed problem a better solution could be found.
MATLAB is a good tool for the GA implementation
The main advantage of using MATLAB for the implementation is the simplicity of programming
(due to all the already developed instructions). The graphical interface, debugger, and
mathematical support along with this simplicity of programming have made possible to finish the
implementation and test the algorithm.
However, the main drawback is the computational time (if compared with another programming
language). Therefore, the computational time impacts on the time required to perform the tests and
the necessity to use several computers in parallel.
A limitation in the GA for the scenarios proposed have been encountered
A limitation on the use of the GA as a forecasting tool was found in some scenarios proposed
during the tests stage. This limitation corresponds to those random ON-OFF activity pattern
scenarios with a long period of OFF time. In order not to have a whole OFF period in the training
set, a high number of samples for the training set must be selected. For this reason, if this OFF
period is long (more than the training set), it could lead to an OFF period prediction. The downside
of increasing the training set is that more samples have to be evaluated. Therefore, the
computational cost increases. Another problem of increasing this training set is that it makes more
CHAPTER 5: CONCLUSIONS 129
difficult to find a function that fits in this large training set (due to the randomness in the ON-OFF
pattern).
Reliable results in several scenarios have been achieved
For non-random ON-OFF patterns, very good results have been found. This is a result of the
periodicity of the available bandwidth found in this scenario. In contrast, with random patterns,
worst results with non-random ON-OFF activity patterns have been obtained. Moreover, good
results have been obtained for the real traffic played back with NS2.
The GA can successfully be applied to the TCP available bandwidth forecasting
Although the GA is not a prediction tool, but rather a stochastic search method, this has been used
for the TCP available bandwidth forecasting with good results.
Better results obtained by introducing new chromosomes in each generation as diversity
In the proposed scenario, introducing new chromosomes in each generation as a diversity method
improve the GA in the search of a better solution (reducing the MAPE of the best solution.
However, the use of diversity can lead to increase the number of generations to converge. Better
results in MAPE are obtained by introducing new chromosomes rather than by mixing
chromosomes. It is noteworthy that these diversity methods were used with the same initial
population and with 30% of the reproduction mating pool. Therefore, different results can be
obtained in different scenarios and with different parameters.
Good results in almost all scenarios with the roulette wheel.
Different results were obtained for the selection method. In almost all the scenarios, the roulette
wheel selection gives better results (using the average of the tests performed). However, the best
result in each set is not always found with this selection method. The best solution (lower MAPE)
was sometimes was found using the ranking and exponential selection method. It can be deduced
that the average results of these two methods (ranking and exponential) are worse because of the
need of more generations to converge. However, this can be even worse when using the ranking
method, due to higher probabilities of reproduction of the worst chromosomes.
Equation 1 is composed of different criteria, which results in more robust results
130 CHAPTER 5: CONCLUSIONS
Different results were obtained using the proposed fitness functions. From the tests performed and
the scenarios proposed, we can conclude that equation 2 can lead to undesired results, since the
only criterion used for the fitness is the MSE. This is because the only criterion used for this fitness
is the MSE. Equation 3, which is the same as equation 2 but using also POCID, results in better
outcome (in general). Although not always the best results are obtained with this fitness function,
reliable results were obtained with equation 1. This is because equation 1 is composed of different
criteria, which results in more robust results in all the scenarios.
5.1.2 Project Evaluation
The final impression is satisfactory since several interesting conclusions have been drawn and the
main part of the objectives has been fulfilled. The large number of test run in NS2 allows obtaining
quality results because many situations and possibilities have been taken into account. The Genetic
algorithm, which requires a very big effort to be understood and programmed, works smoothly
and its predictions have reasonable accuracy. The feeling is that the study will be helpful in future
research and in our professional career.
5.1.3 Problems found during the project
Regarding problems encountered during the implementation, a relevant problem was the
compatibility problems between versions of NS2. The patches were applied manually because of
this incompatibility, and this takes time. NS2 has a large number of bugs has to be solved manually.
The most difficult part in NS2 programming (in the C++ part) is that debugging is hard as no good
debugger can be used. Therefore, finding errors is very complicated. Understanding the NS2-
CRAHN [4] implementation took a lot of time and finally we came into conclusion that the
implementation was not valid for high data rates. This encouraged this team to carry out a new
implementation of the spectrum management for PU based on the NS2-CRAHN structure.
In order to speed up the testing stage, which was very time consuming due to the significant
number of tests, we would suggest to run the tests in cloned computers or in a more powerful
machine.
CHAPTER 5: CONCLUSIONS 131
Regarding the Genetic Algorithm, different problems before and during its implementation was
found. First, before its implementation, difficulties to understand the proper functioning of the GA
as a forecasting tool were encountered. Furthermore, the initial fitness equation implemented
(taken from [27]) resulted in inadequate outcomes. Therefore, looking for alternatives for fitness
equations approaches was needed. Some equations were found, implemented and proposed for
their study in the scenarios proposed.
During the test and verification stage, we realized that in some cases the GA was stuck in a local
minimum. In order to solve this problem, some diversity methods were searched in the literature.
5.2 Future work
Using the results of this Master´s Thesis, the available bandwidth can be estimated. If the PU
activity of the network can be modelled using an ON-OFF birth-death Markovian process
distribution (a pattern with similar characteristics as the studied in this thesis), then the estimated
available bandwidth will correspond to the measured in the results. Obviously, the results will not
be accurate because a real situation will be affected by many external factors.
In relationship to the TCP available bandwidth study, for the future, a research on different
scenarios using background real traffic of Secondary Users would be interesting. A scenario with
PUs and real traffic from SUs at the same time would be also interesting to be analysed.
Another possibility is to program functions that simulate a realistic WLAN behaviour when a new
user joins the network. This situation could be done using a pattern file (similar to the PU activity
file) that contains an activity pattern for SU. The new users joining the network would have to wait
until the channel is idle (the pattern tells that it is idle) and then they will be able to transmit all the
time they require.
A MAC busy time study in order to find a model to estimate the TCP available bandwidth may be
carried out.
A similar project using birth-death Markovian process distributions with TCP traffic acting as
hidden node can be interesting.
132 CHAPTER 5: CONCLUSIONS
According to the computation time required for the GA, the forecasting of the available bandwidth
in real time is impossible. Even so, this could be used to forecast the available bandwidth for the
next days or months.
Moreover, the parameters of the GA could be optimized in order to obtain results that are more
accurate. This requires performing more tests observing the effects of these parameters on the
prediction for the scenario proposed.
Furthermore, other tools as a neural network could be tried. The drawbacks of this other tool are
the complexity and the higher computational cost.
133
References
[1] wikipedia.org, ”ISM band,” wikipedia.org, 2 June 2014. [Online]. Available:
http://en.wikipedia.org/wiki/ISM_band#ISM_bands. [Använd 10 June 2014].
[2] wikipedia.org, ”Cognitive radio,” wikipedia.org, 3 June 2014. [Online]. Available:
http://en.wikipedia.org/wiki/Cognitive_radio. [Använd 10 June 2014].
[3] S. Hanna and J. Sydor, “Spectrum metrics for 2.4 GHz ISM band Cognitive Radio
applications,” in Personal Indoor and Mobile Radio Communications (PIMRC), 2011
IEEE 22nd International Symposium on, Toronto, ON, 2011.
[4] M. D. Felice, K. R. Chowdhury, L. Bononia, A. Kassler and W. kim, “End-to-end
Protocols for Cognitive Radio Ad Hoc Networks: An evaluation study,” 2010.
[5] I. F. Akyildiz, W.-Y. Lee and K. R. Chowdhury, “CRAHNs: Cognitive radio ad hoc
networks,” Broadband Wireless Networking Laboratory, School of Electrical and
Computer Engineering, Georgia Institute of Technology, Atlanta, 2009.
[6] M. abd Rabou Ahmed Kalil, ”Modelling and Analysis of Cognitive Radio Ad Hoc
Networks,” Ilmenau University of Technology, Ilmenau, Germany, 2011.
[7] L-com Global Conectivity, “Advantages and Disadvantages of ISM Band
Frequencies,” L-com Global Conectivity, 22 October 2013. [Online]. Available:
http://www.l-com.com/content/Article.aspx?Type=N&ID=10421. [Accessed 7 May
2914].
[8] Nsnam, “Main page, ns2,” Nsnam, 4 november 2011. [Online]. Available:
http://nsnam.isi.edu/nsnam/index.php/Main_Page. [Accessed 21 February 2014].
134
[9] wikipedia.org, ”wikipedia.org,” wikipedia.org, 11 May 2014. [Online]. Available:
http://en.wikipedia.org/wiki/Methodology. [Använd 18 May 2014].
[10] Scientific Buddies, ”The Engineering Design Process,” Scientific Buddies, 2002 -
2014 . [Online]. Available: http://www.sciencebuddies.org/engineering-design-
process/engineering-design-process-steps.shtml#theengineeringdesignprocess. [Använd
2014 May 2014].
[11] Hiertz, G.R. , Denteneer, D., Stibor, L., Zang, Y., Costa, X.P. and Walke, B., “The
IEEE 802.11 universe,” IEEE Communications Magazine, vol. 48, no. 1, January 2010.
[12] Wikipedia, “IEEE 802.11,” Wikipedia.org, 18 February 2014. [Online]. Available:
http://en.wikipedia.org/wiki/IEEE_802.11. [Accessed 21 February 2014].
[13] Wikipedia, “IEEE 802.11g-2003,” wikipedia.org, 22 November 2013. [Online].
Available: http://en.wikipedia.org/wiki/IEEE_802.11g-2003. [Accessed 22 February
2014].
[14] M. C. a. E. G. L. Bononi(, “Design and Performance Evaluation of an Asymptotically
Optimal Backoff Algorithm for IEEE 802.11 Wireless LANs,” in System Sciences, 2000.
Proceedings of the 33rd Annual Hawaii International Conference, Bologna, January 2000.
[15] Y.-M. C. T.-H. L. a. A. H. Shao-Cheng Wang, “Performance Evaluations for Hybrid
IEEE 802.11b and 802.11g Wireless Networks,” Department of Electrical Engineering,
University of Southern California, U.S.A. ; Wireless Design Center Winbond Electronics
Corporation America, U.S.A.; Department of Communication Engineering, National
Chiao Tung University, Taiwan., April 2005.
[16] wikipedia.org, “DCF Interframe Space,” wikipedia.org, 26 February 2014. [Online].
Available: http://en.wikipedia.org/wiki/DCF_Interframe_Space. [Accessed 10 June
2014].
[17] K. Xu, Mario Gerla and Sang Bae, “Effectiveness of RTS/CTS handshake in IEEE
802.11 based ad hoc networks,” Elsevier B.V., 2003.
135
[18] S. Robitzsch, “SEbastian's WLAN CalcUlation Tool,” www.seronline.de, 2010.
[Online]. Available: http://seronline.de/sewcut/. [Accessed 11 February 2014].
[19] “Wireless LAN,” Conectivity Knowledge Platform, [Online]. Available:
http://ckp.made-it.com/ieee80211.html.
[20] D. A. R. P. Agency, “Transmission Control Protocol,” IETF, 1981.
[21] P. R. Egli, “docstoc.com,” 2011. [Online]. Available:
http://www.docstoc.com/docs/119960349/TCP---Transmission-Control-Protocol---
RFC793. [Accessed 3 March 2014].
[22] Wikipedia, “TCP window scale option,” wikipedia.org, December 22 2013. [Online].
Available: http://en.wikipedia.org/wiki/TCP_window_scale_option. [Accessed 3 March
2014].
[23] R. Keith W and K. James F, “TCP Congestion Control,” 1996-2000. [Online].
Available: http://210.43.128.116/jsjwl/net/ross/book/transport_layer/congestion.html.
[Accessed 16 April 2014].
[24] IETF, ”RFC 6349,” IETF, August 2011. [Online]. Available:
https://tools.ietf.org/html/rfc6349#page-12. [Använd 4 March 2014].
[25] T. Issariyakul and E. Hossain, Introduction to Network Simulator NS2, New York:
Springer, 2009, p. Preface 7.
[26] IETF, “RFC 2988 - Computing TCP's Retransmission Timer,” IETF, November
2000. [Online]. Available: http://tools.ietf.org/html/rfc2988. [Accessed 2014 01 04].
[27] D. T. &. K. Moessner, “Traffic modelling and forecasting using genetic algorithms,”
Ann. Telecommun, no. 64, p. 535–543, 2009.
[28] Wikipedia, “Wikipedia - Genetic algorithm,” 26 February 2014. [Online]. Available:
http://en.wikipedia.org/wiki/Genetic_algorithm. [Accessed 27 February 2014].
136
[29] P. T. Rodríguez-Piñero, “Introducción a los algoritmos genéticos y sus aplicaciones,”
[Online]. Available: www.uv.es/asepuma/X/J24C.pdf. [Accessed 27 February 2014].
[30] “Intelligent Systems Group,” [Online]. Available:
http://www.sc.ehu.es/ccwbayes/docencia/mmcc/docs/temageneticos.pdf. [Accessed 26
February 2014].
[31] J. S. M. T. Matías Ison, “Algoritmos genéticos: aplicación en MATLAB,” 25
November 2005. [Online]. Available:
http://users.df.uba.ar/ariel/materias/FT3_22006/Guias/old/guia_ga.pdf. [Accessed 26
February 2014].
[32] Elena Pérez, “Guía para recién llegados a los algoritmos geneticos,” 2010. [Online].
Available:
http://www.insisoc.org/elena/Elena%20Perez%20Vazquez_archivos/files_newcomers/ne
wcomers-spanish.pdf. [Accessed 26 February 2014].
[33] C. Reeves, “Genetic algorthms - Chapter 3,” [Online]. Available:
http://sci2s.ugr.es/docencia/metah/bibliografia/GeneticAlgorithms.pdf. [Accessed 28
February 2014].
[34] “University of Amsterdam - What is an Evolutionary Algorithm?,” [Online].
Available: http://www.cs.vu.nl/~gusz/ecbook/Eiben-Smith-Intro2EC-Ch2.pdf. [Accessed
28 February 2014].
[35] Wikipedia, “Wikipedia - Genotipo,” 28 February 2014. [Online]. Available:
http://es.wikipedia.org/wiki/Genotipo. [Accessed 28 February 2014].
[36] M. Gen och R. Cheng, ”Genetic Algorithms and Engineering Design,” United States
of America, John Wiley & Sons, Inc., 1997, pp. 5-8.
[37] J. G. Noraini Mohd Razali, “Genetic Algorithm Performance with Different Selection
Strategies in Solving TSP,” in WCE 2011, London, U.K., 2011.
137
[38] N. S. &. Y. S. Rahul Malhotra, “Genetic Algorithms: Concepts, Design for
Optimization of Process,” Computer and Information Science, vol. 4, no. 2, pp. 39-54,
2011.
[39] C. J. a. G. Evertsson, “Master Thesis: Optimizing Genetic Algorithms for Time
Critical Problems,” June 2003. [Online]. Available:
http://digitalamedier.bth.se/fou/cuppsats.nsf/all/7f65a646dddb44a7c1256d44003e9326/$
file/Optimizing%20Genetic%20Algorithms%20for%20time%20critical%20problems.pd
f. [Accessed 3 March 2014].
[40] L. M. Schmitt, “Theory of genetic algorithms,” Theoretical Computer Science, no.
256, pp. 1-61, 2001.
[41] H. Pohlheim, ”Evolutionary Algorithms: Overview, Methods and Operators,” i
GEATbx.com - Genetic and Evolutionary Algorithm Toolbox for Matlab, GEATbx version
3.8, 2006, pp. 9-33.
[42] A. Alvarez, A.Orfila och J. Tintore, ”DARWIN: An evolutionary program for
nonlinear modeling of chaotic time series,” Computer Physics Communications 136, pp.
334-349, 2001.
[43] G. G. Szpiro, “Forecasting chaotic time series with genetic algorithms,” The
American Physical Society, vol. 55, no. 3, pp. 2557 - 2568, 1997.
[44] Jyotishree and R. Kumar, “Blending Roulette Wheel Selection & Rank Selection in
Genetic Algorithms,” International Journal of Machine Learning and Computing, vol. 2,
no. 4, pp. 365 - 370, 2012.
[45] S. A. Burns, Recent advancesd in optimal structural design, United States of America:
ASCE, 2002, pp. 66-70.
[46] R. Chakraborty, “Fundamentals of Genetic Algorithms,” 01 June 2010. [Online].
Available: http://www.myreaders.info/09_Genetic_Algorithms.pdf. [Accessed 1 March
2014].
138
[47] Y. KAYA, M. UYAR and R. TEKIDN, “A Novel Crossover Operator for Genetic
Algorithms: Ring Crossover,” 2 May 2011. [Online]. Available:
http://arxiv.org/abs/1105.0355. [Accessed 20 May 2014].
[48] L. W. F. L. W. a. J. L. Hepu Deng, “Artificial Intelligence and Computational
Intelligence,” Berlin, Springer, 2009, p. 132.
[49] J. a. P. T. A. Arranz de la Peña, “Algoritmos geneticos,” [Online]. Available:
http://www.it.uc3m.es/jvillena/irc/practicas/06-07/05.pdf. [Accessed 1 March 2014].
[50] Wikipedia, “Wikipedia - Time series,” 11 March 2014. [Online]. Available:
http://en.wikipedia.org/wiki/Time_series. [Accessed 21 March 2014].
[51] F. Takens, ”Detecting strange attractor in turbulence,” Lecture Notes in Mathematics,
vol. 898, pp. 366-381, 1981.
[52] N. Packard, J.P.Crutchfield, J.D.Farmer and R.S.Shaw, “Geometry from a Time
Series,” PHYSICAL REVIEW LETTERS, vol. 45, no. 9, pp. 712-716, 1980.
[53] Taken's theorem in action for the Lorenz chaotic attractor. [Film]. Youtube.
[54] S. R. García, M. P. Romo och J. Figueroa-Nazuno, ”Characterization of ground
motions using recurrence plots,” Geofísica Internacional , vol. 52, nr 3, pp. 209-227, 28
June 2013.
[55] H. Kim, R. Eykholt and J. Salas, “Nonlinear dynamics, delay times, and embedding
windows,” Physica D, no. 127, pp. 48- 60, 1999.
[56] L. Cao, “Practical method for determining the minimum embedding dimension of a
scalar time series,” Physica D, no. 110, pp. 43-50, 1997.
[57] A. A. B. Perez, “Estudio del comportamiento de los precios del cobre y desarrollo de
un modelo de pronostico,” 2007. [Online]. Available: http://css.csregistry.org/tiki-
download_wiki_attachment.php?attId=135. [Accessed 31 March 2014].
139
[58] M. Hong-guang and H. Chong-zhao, “Selection of Embedding Dimension and Delay
Time in Phase Space Reconstruction,” Front. Electr. Electron. Eng. China, vol. 1, pp. 111-
114, 2006.
[59] C. Piccardi, 26 September 2006. [Online]. Available:
ftp://ftp.elet.polimi.it/users/Carlo.Piccardi/VarieCda/Articoli/CdA-Art-SerieTemporali-
4.pdf. [Accessed 2014 March 2014].
[60] D. Kugiumtzis, ”State space reconstruction parameters in the analysis of chaotic time
series - the role of the time window length,” Physica D, nr 95, pp. 13-28, 1996.
[61] R. Hegger, H. Kantz and T. Schreiber, “Practical implementation of nonlinear time
series methods: The TISEAN package,” 2007. [Online]. Available: http://www.mpipks-
dresden.mpg.de/~tisean/Tisean_3.0.1/index.html. [Accessed 12 March 2014].
[62] A. M. Fraser and H. L. Swinney, “Independent coordinates for strange attractors from
mutual information,” Physical Review A, vol. 33, no. 2, pp. 1134-1140, 1986.
[63] R. V. G, “Recurrence quantification analysis of system signals for detecting tool and
chatter in turning,” 16 November 2012. [Online]. Available:
http://hdl.handle.net/10603/5164. [Accessed 20 May 2014].
[64] M. B.Kennel, R. Brown och H. D. I. Abarbanel, ”Determining embedding dimension
for phase-space reconstruction using a geometrical construction,” PHYSICAL REVIEW A
, vol. 45, nr 6, pp. 3403-3411, 1992.
[65] L. Jiayu, H. Zhiping, W. Yueke and S. Zhenken, “Selection of proper embedding
dimension in phase space reconstruction of speech signals,” Journal of Electronics, vol.
17, no. 2, pp. 161 - 169, 2000.
[66] M. Ruey and S. Tsay, “Lecture 1: Univariate Time Series,” Autumm Quarter 2008.
[Online]. Available: http://faculty.chicagobooth.edu/ruey.tsay/teaching/uts/lec1-08.pdf.
[Accessed 21 March 2014].
140
[67] Wikipedia, ”Wikipedia - Multivariate analysis,” 19 March 2014. [Online]. Available:
http://en.wikipedia.org/wiki/Multivariate_analysis. [Använd 21 March 2014].
[68] C. Martin, ”Nonlinear prediction of chaotic time series,” Physica D, vol. 35, pp. 335-
356, 1989.
[69] A. S. Soofi and L. Cao, “Modelling and Forecasting Financial Data,” in Techniques
of Nonlinear Dynamics, Norwell, Massachusetts 02061 USA, Kluwer academic
publishers, 2002, pp. 205-211.
[70] H. Cheng, P.-N. Tan, J. Gao and J. Scripps, “Multistep-Ahead Time Series
Prediction,” Advances in Knowledge Discovery and Data Mining, vol. 3918, pp. 765-774,
2006.
[71] W. Ming, Y. Bao, Z. Hu and T. Xiong, “Multistep-Ahead Air Passengers Traffic
Prediction with Hybrid ARIMA-SVMs Models,” The Scientific World Journal, vol. 2014,
pp. 1-14, 2014.
[72] Nsnam, “User Information, ns2,” Nsnam, 4 November 2011. [Online]. Available:
http://nsnam.isi.edu/nsnam/index.php/User_Information. [Accessed 21 February 2014].
[73] Nsnam, “What is NS-3,” Nsnam, 2012. [Online]. Available:
http://www.nsnam.org/overview/what-is-ns-3/. [Accessed 21 February 2014].
[74] “Implementation of a cross-layer MAC and channel allocation scheme for Cognitive
Radio Ad Hoc Networks (CRANHs).,” 2009.
[75] Wikipedia, “Birth–death process,” wikipedia.org, 18 february 2014. [Online].
Available: http://en.wikipedia.org/wiki/Birth%E2%80%93death_process. [Accessed
2014 March 03].
[76] Y. Zhang, “Spectrum Handoff in Cognitive Radio Networks: Opportunistic and
Negotiated Situations,” Simula Research Laboratory, Norway, 2009.
141
[77] E. Kalyvas, “Thesis: Using neural networks and genetic algorithms to predict stock
market returns,” Octuber 2001. [Online]. Available:
http://www.125books.com/inc/pt4321/pt4322/pt4323/pt4324/pt4325/data_all/books/U/U
sing%20Neural%20Networks%20And%20Genetic%20Algorithms%20To%20Predict%
20Stock%20Market%20Returns.pdf. [Accessed 17 April 2014].
[78] M. Liu, R. Wang, J. Wu and R. Kemp, “A Genetic-Algorithm-Based Neural Network
Approach for Short-Term Traffic Flow Forecasting,” Advances in Neural Networks, vol.
3498, pp. 965-970, 2005.
[79] A.-a. H. Mantawy, “Genetic Algorithms Application to Electric Power Systems,”
2012. [Online]. Available: http://cdn.intechopen.com/pdfs-wm/33395.pdf. [Accessed 17
April 2014].
[80] WangYan, W. Hua och X. Limin, ”Highway Traffic Prediction with Neural Network
and Genetic Algorithms,” Vehicular Electronics and Safety, pp. 211-216, 2005.
[81] M. A. Abido och A.Elazouni, ”Improved Crossover and Mutation Operators for
Genetic Algorithm Project Scheduling,” IEEE Congress on Evolutionary Computation,
pp. 1865-1872, 2009.
[82] A. Acan, H. Altincay, Y. Tekol och A. Unveren, ”A Genetic Algorithm with Multiple
Crossover Operators for Optimal Frequency Assignment Problem,” Evolutionary
Computation, vol. 1, pp. 256-263, 2003.
[83] S. Picek, M. Golub och D. Jakobovic, ”Evaluation of Crossover Operator
Performance in Genetic Algorithms with Binary Representation,” [Online]. Available:
http://bib.irb.hr/datoteka/537238.icic.pdf. [Använd 17 April 2014].
[84] Erlend, “Erlend's Lookout Post,” 21 July 2009. [Online]. Available:
http://erl1.wordpress.com/2009/07/21/15/. [Accessed 24 April 2014].
142
[85] T. Henderson, “18.2 Two-ray ground reflection model,” Nsnam, 11 May 2011.
[Online]. Available: http://www.isi.edu/nsnam/ns/doc/node218.html. [Accessed 24 April
2014].
[86] T. Henderson, “18.1 Free space model,” Nsnam, 11 May 2011. [Online]. Available:
http://www.isi.edu/nsnam/ns/doc/node217.html. [Accessed 24 April 2014].
[87] M. Garetto, T. Salonidis och E. Knightly, ”IEEE Explore,” [Online]. Available:
http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=4457986&url=http%3A%2F%2F
ieeexplore.ieee.org%2Fiel5%2F90%2F4359146%2F04457986.pdf%3Farnumber%3D44
57986.
[88] M. Ergen, “802.11 Tutorial,” Department of Electrical Engineering and Computer
Science, University of California Berkeley, California, June 2002.
[89] D. J. Widmer, “icapeople.epfl.ch,” [Online]. Available:
http://icapeople.epfl.ch/widmer/uwb/ns-2/noah/. [Accessed 4 April 2014].
[90] D. J. Widmer, “Dr. Joerg Widmer,” [Online]. Available:
http://www.joergwidmer.org/.
[91] G. Combs, ”www.wireshark.com,” [Online]. Available:
http://www.wireshark.org/about.html. [Använd 24 April 2014].
[92] wikipedia.org, “http://en.wikipedia.org/wiki/Comma-separated_values,”
wikipedia.org, 2014 April 22. [Online]. Available: http://en.wikipedia.org/wiki/Comma-
separated_values. [Accessed 2014 April 24].
[93] perl.org, “The Perl Programming Language,” perl.org, 2014. [Online]. Available:
http://www.perl.org/about.html. [Accessed 2014 April 24].
[94] J. Kaur, ”Analysis of Available Bandwidth Measurement Techniques,” [Online].
Available: http://www.cs.unc.edu/~jasleen/Research-analysisof.htm. [Använd 22 April
2014].
143
[95] nsnam, “40.5 Applications objects,” nsnam, 05 May 2011. [Online]. Available:
http://www.isi.edu/nsnam/ns/doc/node516.html. [Accessed 05 April 2014].
[96] V. G, “Linux Shell Scripting Tutorial v1.05r3,” nixCraft Technologies, 1999-2002.
[Online]. Available: http://www.freeos.com/guides/lsst/. [Accessed 17 April 2014].
[97] Free Software Foundation Inc., “The GNU Awk User's Guide,” 2013. [Online].
Available: http://www.gnu.org/software/gawk/manual/gawk.html. [Accessed 17 April
2014].
[98] T. Williams and C. Kelley, “gnuplot homegage,” Gnuplot, February 2014. [Online].
Available: http://gnuplot.info/. [Accessed 2014 April 24].
[99] MPagan, “Xgraph General Purpose 2-D Plotter,” Xgraph , 21 February 2014.
[Online]. Available: http://www.xgraph.org/. [Accessed 24 April 2014].
[100] M. Grey, “Marc Grey's Tutorial. IX. Running Wireless Simulations in ns,” nsnam,
[Online]. Available: http://www.isi.edu/nsnam/ns/tutorial/nsscript5.html . [Accessed 24
April 2014].
[101] Wikipedia, “Wikipedia - Reverse Poslish Notation,” 10 April 2014. [Online].
Available: http://en.wikipedia.org/wiki/Reverse_Polish_notation. [Accessed 26 April
2014].
[102] A. R. L. Junior, “A Study for Multi-Objective Fitness Function for Time Series
Forecasting with Intelligent Techniques,” Proceedings of the 10th annual conference
companion on Genetic and evolutionary computation, pp. 1843-1846, 2008.
[103] G. Good, J. Hartmanis och J. v. Leeuwen, ”Advances in Neural Networks,” Berlin,
Springer, 2007, pp. 606-608.
[104] T. A. E. Ferreira, G. C. Vasconcelos and P. J. L. Adeodato, “A New Hybrid Approach
for Enhanced Times Series Prediction,” in UNISINOS, Säo Leopoldo, 2005.
144
[105] B. Can, A. Beham and C. Heavey, “A comparative study of genetic algorithm
components in simulation-based optimisation,” Proceedings of the 2008 Winter
Simulation Conference, pp. 1829-1837, 2008.
[106] S. F. Galán, ”A Novel Mating Approach for Genetic Algorithms,” 2007. [Online].
Available:
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.154.6796&rep=rep1&type=pd
f. [Använd 23 April 2014].
[107] S. Patil, ”Indian ETD Repository,” 2012/2013. [Online]. Available:
http://hdl.handle.net/10603/6111. [Använd 21 April 2014].
[108] T. Blickle and L. Thiele, “A Comparison of Selection Schemes used in Genetic
Algorithms,” 11 December 1995. [Online]. Available:
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.11.509&rep=rep1&type=pdf.
[Accessed 23 April 2014].
[109] X. Yao, ”Lecture 02 - Genetic Representation, Search, Operators, Selection Schemes
and Selection Pressure,” 5 Octuber 2009. [Online]. Available:
http://www.cs.bham.ac.uk/~pkl/teaching/2009/ec/lecture_notes/l02-operators.pdf.
[Använd 23 April 2014].
[110] B. Chakraborty and P. Chaudhuri, “On The Use of Genetic Algorithm with Elitism in
Robust and Nonparametric Multivariate Analysis,” AUSTRIAN JOURNAL OF
STATISTICS, vol. 32, no. 1&2, pp. 13-27, 2003.
[111] A. Ghosh, S. Roy, J. PalChoudhury, S. R.BhadraChaudhuri and S. Mandal, “A Novel
Approach of Genetic Algorithm in prediction ofTime Series Data,” IJCA Special Issue on
Advanced Computing and Communication Technologies for HPC Applications, no. 1, pp.
16-20, 2012.
[112] G. Bee-Hua, ”Evaluating the performance of combining neural networks and genetic
algorithms to forecast construction demand: the case of the Singapore residential sector,
145
Construction Management and Economics,” Construction Management and Economics,
pp. 209-217, 2010.
[113] N. Mastorakis, V. Mladenov och V. T. Kontargyr, ”Proceedings of the European
Computing Conference,” i Proceedings of the European Computing Conference: Volume
2, Springer Science + Business Media, LLC, 2009, pp. 9-12.
[114] A. Rexhepi, A. Maxhuni och A. Dika, ”Analysis of the impact of parameters values
on the Genetic Algorithm for TSP,” IJCSI International Journal of Computer Science
Issues, vol. 10, nr 3, pp. 158-164, 2013.
[115] M. 0. Odetayo, ”OPTIMAL POPULATION SIZE FOR GENETIC ALGORITHMS :
AN INVESTIGATION,” i Genetic Algorithms for Control Systems Engineering, IEE
Colloquium on, London, 1993.
[116] I. C. Society, “Part 11: Wireless LAN Medium Access Control (MAC) and Physical
Layer (PHY) Specifications,” in IEEE Standard for Information technology--
Telecommunications and information exchange between systems Local and metropolitan
area networks, IEEE Computer Society, 2010, p. 92.
[117] Wikipedia, “OSI model,” wikipedia.org, 25 February 2014. [Online]. Available:
http://en.wikipedia.org/wiki/OSI_model. [Accessed 26 February 2014].
[118] R. M. (B.E.), “A Generic Parallel Genetic Algorithm,” Octuber 2003. [Online].
Available: http://www.maths.tcd.ie/~rmurphy/Project/Report/report.html. [Accessed 28
February 2014].
[119] J. O.Pierini and E. A.Gómez, “TIDAL FORECASTING IN THE BAHIA BLANCA
ESTUARY, ARGENTINA,” INTERCIENCIA, vol. 34, no. 12, pp. 851-856, 2009.
146
Scenario implementation in NS2
The implementation of a wireless scenario in NS2 is performed by programming a tcl script.
First, we defined the different options for the scenario:
After the variables are set, the main program is set up, the variables are initialised and the PU
model file is loaded. A God (General Operations Director) is also created, according to [100] “a
God is the object that is used to store global information about the state of the environment,
network or nodes that an omniscient observer would have, but that should not be made known to
any participant in the simulation”.
Set val(chan) Channel/WirelessChannel; # channel type
set val(prop) Propagation/TwoRayGround; # radio-propagation model
set val(netif) Phy/WirelessPhy; # network interface type
set val(mac) Mac/802_11; # MAC type
set val(ifq) Queue/DropTail/PriQueue; # interface queue type
set val(ll) LL; # link layer type
set val(ant) Antenna/OmniAntenna; # antenna model
set val(ifqlen) 100 ; # max packet in ifq
set val(nn) 2; # number of mobile nodes
set val(rp) NOAH; # routing protocol
Mac/802_11 set basicRate_ 6Mb # set the Basic Rate
Mac/802_11 set dataRate_ 54Mb # set the Data Rate
147
The next step is to configure the nodes with parameters selected before varying their position.
We create in total nn number of nodes.
set ns_ [new Simulator] #Create a new simulator
set tracefd [open simple.tr w] #Initialise a new trace
$ns_ trace-all $tracefd
set topo [new Topography] #Set the topography
$topo load_flatgrid 1200 1200
set pumap [new PUMap] #create PUMap variable
$pumap set_input_map "map_$val(punum)_$val(wload).txt"
create-god $val(nn) # Create God
$ns_ node-config -adhocRouting $val(rp) \ # Put the configuration data
-llType $val(ll) \ # into the ns2 node config.
-macType $val(mac) \
-ifqType $val(ifq) \
-ifqLen $val(ifqlen) \
-antType $val(ant) \
-propType $val(prop) \
-channelType $val(chan) \
-phyType $val(netif) \
-topoInstance $topo \
-agentTrace ON \ #Show Agent trace
-routerTrace OFF \
-macTrace ON \ #Show MAC trace
-movementTrace OFF \
148
Then we set the node position, x, y and z. This is fundamental because the position of the nodes
have impact on PU interference..
At this point the topology, the traffic specifications and the Agents are created. There are different
possibilities, depending on if TCP or UDP is used:
• Configuration for TCP with a FTP application
for {set i 0} {$i < $val(nn) } {incr i} {
set node_($i) [$ns_ node] #Set new node
$node_($i) random-motion 0; #Disable random motion
$node_($i) node-CR-configure $pumap #Load the PU model file
}
ns-random 1
$node_(0) set X_ 200
$node_(0) set Y_ 300
$node_(0) set Z_ 0.0
$node_(1) set X_ 200
$node_(1) set Y_ 550
$node_(1) set Z_ 0.0
set tcp_(0) [new Agent/TCP] #Create a new TCP agent
set sink_(0) [new Agent/TCPSink] #Create a new TCP sink agent
$ns_ attach-agent $node_(0) $tcp_(0) #Attach node 0 to the TCP agent
$ns_ attach-agent $node_(1) $sink_(0) #Attach node 1 to the TCP sink
$ns_ connect $tcp_(0) $sink_(0) #Connect both agents
$set ftp_(0) [new Application/FTP] #Create a new FTP application
$ftp_(0) attach-agent $tcp_(0) #Attach the FTP to the TCP agent
$ns_ at 5.0 "$ftp_(0) start" # Tell the FTP to start at second 5
149
• Configuration for UDP with a Constant Bit Rate (CBR) application
Both TCP and UDP agents can be used with real traffic played back just adding as application a
trace player as described in 3.3.5.2. In following lines, we show an example of how another two
TCP traffic nodes (playing back real traffic traces) are configured. These nodes are placed inside
the scenario and their activity will affect the rest of the wireless medium in a totally different way
than primary users do.
• Adding two TCP nodes with a played back trace application.
set udp_(0) [new Agent/UDP] #Create a new UDP agent
set sink_(0) [new Agent/Null] # Create a new sink agent
$ns_ attach-agent $node_(0) $udp_(0) #Attach node 0 to the UDP agent
$ns_ attach-agent $node_(1) $sink_(0) #Attach node 1 to the sink agent
$ns_ connect $udp_(0) $sink_(0) #Connect both agents
set cbr_(0) [new Application/Traffic/CBR] #Create new CBR application
$cbr_(0) set packetSize_ 1400 #Set the packet size to 1400 bytes
$cbr_(0) set rate_ 30000000 #Set the CBR data rate (bit/s)
$cbr_(0) attach-agent $udp_(0) #Attach the CBR to the UDP agent
$ns_ at 5.0 "$cbr_(0) start" # Tell the CBR to stat at second 5
150
When the simulation is completed, the function that calculates the Total MAC busy time is called
for both nodes. This time is stored in a file.
The last commands tell the simulator to end all the agents, end the simulation and close ns2. Once
everything is set, the only step remaining is to start the NS2 simulation using the script above.
set tcp0 [new Agent/TCP]
$ns_ attach-agent $node_(2) $tcp0 Attach node 2 to to tcp0 agent
set tfile1 [new Tracefile] # Create a Trace traffic
$tfile1 filename "values3.txt.if-1.bin" #Select a file to be played
set trac0 [new Application/Traffic/Trace] # Create a new Traffec trace app
$trac0 attach-tracefile $tfile1 #Assign the file to the application
$trac0 attach-agent $tcp0 #Attach the app to it tcp0 agent
set sink0 [new Agent/TCPSink]
$ns_ attach-agent $node_(3) $sink0 #Attach the Sing to node 3
$ns_ connect $tcp0 $sink0 #Connect the Cross-Traffic
$ns_ at 5.0 "$trac0 start" #Start playing the trace
$ns_ at 100.0 "$node_(0) compute-mac-busy"
$ns_ at 100.0 "$node_(1) compute-mac-busy"
151
Physical and MAC layer
parameters for 802.11
Parameter Value Comments
Packet payload 12000 bits Useful data content of the packet
MAC header 224bits Size of the MAC header
Control frame bit/s 11Mbps Frames used to facilitate data
exchange between stations, such
as RTS, CTS, ACK, etc [12]
11g_PHY header 136bits@6Mbps N/A The PHY header is sent at 6Mbps
ACK 112bits+11g_PHY
header
Acknowledgement message
packet size.
ACK frame bit/s 24Mbps Bit rate for the ACK
Data frame bit/s 54Mbps Bit rate for the Payload
Propagation Delay 1us Time to reach the destination (over
the medium)
SIFS 10us (16us between data
and ACK)
Short Inter-frame Space
DIFS 50us DCF Inter-frame Space
Table 26: Frame parameters of 802.11g [15]
Parameter Value Comments
SlotTime 9µs Basic unit of timing for the protocol
CCATime 15µs Clear-channel assessment
152
RxTxTurnaroundTime 2µs The time that last from sending of the MAC
transmission request to the PHY layer and the
instant when the first bit is transmitted.
SIFSTime 16µs Short Inter-frame Space
PreambleLength 96bits /
128 bits
(*) Short preamble / Long preamble
PLCPHeaderLength 40bits (**)
PLCPDataRate 6Mbps Transmission rate for the PLCP (Preamble)
CWmin 15 Minimum Contention Window
CWmax 1024 Maximun Contention Window
Table 27: Timing parameters of IEEE 802.11g standard [15]
(*)Preamble length: The preamble is defined in the first part of the Physical Layer Convergence
Protocol/Procedure (PLCP) Protocol Data Unit (PDU) [116]. The PLCP layer is the interface with
the MAC protocol data units (MPDUs) to be transferred between MAC stations over the PMD.
PMD is the method of transmitting and receiving data through the wireless medium.
(**)The preamble contains a header that has information identifying the modulation scheme,
transmission rate, and transmission time of the whole data frame.
Short and Long preamble: The short PLCP preamble and header is defined as optional in 802.11g
[116]. The Short Preamble and header may be used to minimize overhead and, thus, maximize the
network data throughput.
153
Tests of TCP throughput over time
and other parameters for different PU activity ON-
OFF patterns.
In this appendix, some examples using Scenario 1 for different ON-OFF patterns are
presented.
a. 50% ON time TCP available throughput results for different values
In order to analyse this case, we set alpha equal to two times beta, resulting in a 50% of the
time the PU is ON and 50% OFF. The results show that as the retransmission time after a PU ON
period matches with the OFF periods, the resulting available throughput is greater until is zero
again. In this case, the zero throughput alpha/beta combinations are in multiples of beta equal to
0.7. There exists a relation between the alpha/beta values and the available TCP throughput. As
the ON-OFF pattern is periodic, whenever the Retransmission time matches with the PU ON
period, these Retransmission time doubles with every retransmission, matching always with ON
PU activity. Therefore, the resulting throughput is zero as the retransmissions always match with
PU ON activity time. This is depicted in Figure 75.
156
alpha 2.8 beta 1.4
Figure 75: Graphs of TCP throughput over time for 50% PU ON
b. TCP available throughput over time, Congestion window, Current
RTO multiplicative factor, Smoothed RTT factor and Slow-start
threshold.
The following parameters are presented in graphs in order to be properly visualised:
• The TCP available throughput over time
• Congestion window over time
• Current RTO multiplicative factor over time
• Smoothed RTT factor over time
• Slow-start threshold over time
In Table 28 it is clearly shown, that the congestion window grows large where the throughput
is also high. When there are a long periods where the throughput is zero, the RTO multiplicative
factor grows to a large value, meaning that large retransmission times are experienced during these
periods. The slow-start threshold goes to half with every timeout and only grows when packets are
successfully acknowledged. Similar results can be observed in Table 29 but at a lower scale. As a
consequence of this, the congestion window never grows to values as big as in Table 28.
In Table 30it is clearly shown how the congestion window reduces when a packet is dropped
and therefore a timeout event occur, but the congestion window is high almost all the time. As the
congestion window is high due to few packets dropped, also the Slow-start threshold is almost all
the time large and in contrast, the RT multiplicative factor is very low because there are no
consecutive retransmission attempt failures.
157
TCP available bandwidth over time for alpha 1.5 and beta 0.5
Congestion window over time for alpha 1.5 and beta 0.5
Current RTO multiplicative factor over time for alpha 1.5 and beta 0.5
Slow-start threshold over time for alpha 1.5 and beta 0.5
Table 28: Throughput over time, Congestion window, Current RTO multiplicative factor and Slow-start threshold for alpha 1.5 and beta 0.5
158
TCP available bandwidth over time for alpha 0.5and beta 0.1
Congestion window over time for alpha 0.5and beta 0.1
Current RTO multiplicative factor over time for alpha 0.5and beta 0.1
Slow-start threshold for alpha 0.5and beta 0.1
Table 29: Throughput over time, Congestion window, Current RTO multiplicative factor and Slow-start threshold for alpha 0.5 and beta 0.1
159
TCP available bandwidth over time for alpha 1.5 and beta 0.1
Congestion window over time for alpha 1.5 and beta 0.1
Current RTO multiplicative factor over time for alpha 1.5 and beta 0.1
Slow-start threshold for alpha 1.5 and beta 0.1
Table 30: Throughput over time, Congestion window, Current RTO multiplicative factor and Slow-start threshold for alpha 1.5 and beta 0.1
160
c. Analysis of a wide range of alpha and beta values and random
generated patterns
In order to characterize the response of the available TCP throughput for different ON-OFF
traffic patterns of PU activity, we have carried out a set of tests with 28 alpha values combined
with 28 beta values. These tests are repeated ten times to give robustness and measure the effect
of randomness in the PU activity pattern resulting in a total of 7,840 simulation runs.
The fact that the patterns are generated with different random seeds is fundamental for the results.
As the ON-OFF pattern generated is totally different every time the test a simulation run is
executed, the comparison has to be done by doing an average of several tests with the same alpha
and beta, otherwise the result are also highly random and comparison is unfair. Therefore, the
MAC busy time will be different for the same alpha/beta Also the duration of the simulation is
important. The longer it is, the more robust are the results, but due to the high computational
demand, we could not simulate very long.
alpha 0.0001, 0.001, 0.01 and from 0.1 to 2.5 with 0.1 increment beta 0.0001, 0.001, 0.01 and from 0.1 to 2.5 with 0.1 increment Repetitions 10 Simulation duration 100 seconds
161
Figure 76: TCP throughput heat-map
Figure 76 shows the result of the set of tests done for TCP. The axes represent alpha, beta and the
average throughput, and the colour represents the standard deviation among the number of test for
the same value.
Analysing the graph, that a higher throughput is achieved when beta is very close to zero and
therefore the PU ON state is very short. The maximum throughput is around 20 Mbps. The
throughput decreases as beta grows, however if alpha is also big, the decease is much lower than
the case where alpha is large. When beta is larger than alpha, the throughput is almost zero because
the probability of having PUs active is very high.
Regarding the standard deviation, when beta is higher than alpha, the throughput is very low and
the standard deviation is large. When throughput is very low or large, the standard deviation is
very close to zero as the results of the tests are very similar. The standard deviation is very low
when the throughput is maximun because there is almost no PU activity. In the areas with very
low available bandwidth, the standard deviation is low because the PUs are either all the time
active or they have long ON and very short OFF periods, so there is no time to send packets. The
162
standard deviation is big when alpha and beta are big because, as the simulations are done for a
limited period of time, there is the chance to get a random combination that gives some long OFF
periods (several seconds) inside the simulation time. Therefore, the TCP throughput will be higher
because there will be enough time to rise the TCP Congesting Window to the maximum possible.
If the random combination gives no long OFF periods during the simulation time, the throughput
will be much lower.
We use several ON/OFF patterns in order to compare the response of the available UDP throughput
with the TCP. These patterns are the same for TCP, giving a total of 7,840 tests, 10 for each
combination. Also the set up parameters are the same as in the TCP analysis (above in the section).
Figure 77: UDP throughput heat-map
Also can be seen from Figure 77, the results for UDP are similar to the TCP results. The throughput
decreases as beta becomes larger, however if alpha is also big, this decease is much lower than the
case where alpha is big. When beta is higher than alpha, the throughput is almost zero because the
probability of having PUs active is very high. If alpha and beta bot are high, the throughput
decrease is almost proportional to the relation of alpha and beta.
163
Average on tests results
In this appendix is presented one example of the average on the tests presented in section 4.
In this case, is shown in Table 31 the results of the equation 2 performed with the non-random
« � 2.2 and ¬ � 1.1 using the same initial population.
The table presents the following parameters:
MAE (Mean Absolute Error)
MSE (Mean Squared Error)
MAPE (Mean Absolute Percentage Error)
Fitness equation result
MAE_bp (MAE before prediction)
MSE_bp (MSE before prediction)
MAPE_bp (MAPE before prediction)
Loop (number of loops run during the GA)
The results shown in section 4 are the average of this Table 31, which are presented in Table
32 in this same appendix.
MAE MSE MAPE fit_eq MAE_bp MSE_bp MAPE_bp loop 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 0.38725 0.20236 401.76 0.29372 0.42624 0.2525 436.47 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 1.09 1.3648 1669.2 0.42044 1.1071 1.3785 1571.9 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 1.2841 1.7253 1752 0.35623 1.3072 1.8072 1650 100 0.12517 0.071437 2.4472 0.95208 0.11462 0.050327 2.1426 100 0.11258 0.048887 9.6118 0.3308 0.17542 0.11209 9.7925 100
164
1.1552 1.3602 1403 0.43342 1.1207 1.3073 1320.8 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 0.075527 0.028881 1.2754 0.95251 0.10371 0.049862 1.5681 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 0.1311 0.074822 3.9571 0.92631 0.1343 0.07955 3.8619 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 1.1552 1.3602 1403 0.43342 1.1207 1.3073 1320.8 100 1.0358 1.2866 1643.5 0.43792 1.0489 1.2835 1547.6 100 1.1552 1.3602 1403 0.43342 1.1207 1.3073 1320.8 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 1.1552 1.3602 1403 0.43342 1.1207 1.3073 1320.8 100 1.3915 2.2739 1240 0.30281 1.4061 2.3024 1168.1 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 0.075273 0.02611 5.959 0.95097 0.10921 0.051556 6.0353 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 0.23178 0.18826 7.3442 0.84474 0.23346 0.18379 7.0921 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100 0.82423 2.1023 27.443 0.27641 0.96686 2.6178 27.912 100
Table 31: Results of equation 2 with same initial population with non-random alpha 2.2 and beta 1.1
Average:
MAE MSE MAPE fit_eq MAE_bp MSE_bp MAPE_bp loop 0.7881786 1.72625314 266.191194 0.3635312 0.8897832 2.0880695 253.29365 100
Table 32: Average of equation 2 resulting of run the GA 50 times
165
General overview of GA results
In this appendix is presented the average results for the fitness evaluation in all the scenarios
analysed in chapter 4. In addition, the average results for the selection method and diversity with
the same initial population are shown.
Furthermore, the best and worst results (MAPE) of each individual set of tests of section 4 are
presented in this appendix.
Table 33 shows the average results of the fitness evaluation in all the scenarios analysed. In Table
33 (a) and (b), the results of same initial population and random initial population for the cosine
function are shown. Then, the results of same initial population and random initial population for
the non-random traffic with ON-OFF pattern are shown in Table 33 (c) and (d). In addition, the
results of same initial population and random initial population for the random traffic with ON-
OFF pattern are shown in Table 33 (e) and (f). Finally, in Table 33 (g) and (h) are shown the results
of same initial population and random initial population for real traffic played back in NS2.
Same initial population cosine function (a) Random initial population cosine function (b)
0
10
20
30
40
50
60
70
EQ1 EQ2 EQ3 EQ4
PE
RC
EN
TA
GE
F ITN ESS EVA LUATION
FOR CO SINE F UN C TION
MAPE MAPE_bp
0
20
40
60
80
100
120
EQ 1 EQ 2 EQ 3 EQ 4
PE
RC
EN
TA
GE
F ITN ESS EVALUATION
FO R CO SIN E FUNC TION
MAPE MAPE_bp
166
Same initial population for non-random
traffic with ON-OFF pattern (c)
Random initial population for non-random
traffic with ON-OFF pattern (d)
Same initial population random traffic with
ON-OFF pattern (e)
Random initial population random traffic with
ON-OFF pattern (f)
0
50
100
150
200
250
300
Eq 1 Eq 2 Eq 3 Eq 4
PE
RC
EN
TA
GE
F ITN ESS SE LEC TION
METH OD FO R NO N-
R AN D OM TR A FFIC
MAPE MAPE_bp
0
200
400
600
800
Eq 1 Eq 2 Eq 3 Eq 4
PE
RC
EN
TA
GE
F ITN ESS SELEC TION
METH OD NO N -
R A ND OM TR AF FIC
MAPE MAPE_bp
680
700
720
740
760
780
800
820
840
0
5
10
15
20
25
Eq 1 Eq 2 Eq 3 Eq 4
MA
PE
_B
P (
PE
RC
EN
TA
GE
)
MA
PE
AN
D E
RR
OR
(P
ER
CE
NT
AG
E)
F ITNESS SELEC TION
METH OD FO R R AN D OM
TR A FFIC
MAPE Error MAPE_bp
0
100
200
300
400
500
600
Eq 1 Eq 2 Eq 3 Eq 4
PE
RC
EN
TA
GE
F ITN ESS SE LEC TION
METH OD FO R R AN D OM
TR A FFIC
MAPE MAPE_bp
167
Same initial population for real traffic (g) Random initial population for real traffic (h)
Table 33: General fitness functions results
Table 34 shows the average results of the diversity and selection method in all the scenarios
analysed. The results of same initial population with a cosine function are shown in Table 34 (a).
In addition, the results of same initial population for the non-random traffic with ON-OFF pattern
are shown in Table 33. Then, the results of same initial population for the random traffic with ON-
OFF pattern are shown in Table 33 (c). Finally, the results of same initial population for real traffic
played back in NS2 are shown in Table 33 (d).
0
0,5
1
1,5
2
2,5
Eq 1 real fit Eq 2 Eq 3 Eq 4
PE
RC
EN
TA
GE
F ITN ESS SELEC TION
METH OD FO R R EA L
T RA FFIC
MAE MAE_bp
0
5
10
15
20
25
30
Eq 1 Eq 2 Eq 3 Eq 4
PE
RC
EN
TA
GE
F ITN ESS S ELEC TION
METH OD FO R REAL
TR A FFIC
MAPE MAPE_bp
0
0,2
0,4
0,6
0,8
1
1,2
0
20
40
60
80
100
120
NU
MB
ER
OF G
EN
ER
AT
ION
S
PE
RC
EN
TA
GE
D IVERSITY A ND
SELEC TION METH OD
EFFECT
MAPE MAPE_bp Generations
12141618202224262830
PE
RC
EN
TA
GE
D IVE RSITY A ND
SELEC TION METH OD
EFFECT
MAPE MAPE_bp
168 CHAPTER 4. EVALUATION
Same initial population cosine function (a) Same initial population non-random (b)
Same initial population random (c) Same initial population real traffic (d)
Table 34: General results for the diversity and selection method
Table 35 shows the best and the worst results in each individual set of tests for each scenario. This
could help to observe which combination of the selection method and diversity for equation 1 leads
to obtain the best result.
Cosine function Non-random Random Real traffic Eq1 Best Worst Best Worst Best Worst Best Worst
real fit d=0 3.53% 373.72% 7.640% 27.44% 3.87% 357.87% 25,13% 38,02%
ranking d=0 3.53% 983.35% 1.054% 44.85% 3.87% 357.87% 24,04% 42,80%
exponential
d=0
3.35% 193.21% 0.767% 145.39% 5.67% 345.2% 16,80% 38,02%
real fit d=1 2.20% 435.50% 0.767% 169.07% 3.81% 7.215% 13,51% 55,72%
ranking d=1 3.53% 380.11% 0.,767% 205.57% 0.611% 19.78% 14,04% 40,94%
exponential
d=1
3.00% 212.64% 0.767% 169.07% 13.13% 32.07% 13,51% 55,72%
Table 35: MAPE best and worst results
0100200300400500600700800
PE
RC
EN
TA
GE
SELEC TION A ND
D IVE RSITY METH OD
EFF ECT
MAPE MAPE_bp
10
15
20
25
30
35
PE
RC
EN
TA
GE
DIVERSITY A N D
S ELEC TIO N MET HO D
EFF ECT
MAPE_bp MAPE