D5.2 Engineering Release 1 - CogNetD5.2 – Network Security and Resilience – Engineering Release...

D5.2 – Engineering Release 1

(Secure NFV Subsystem, High Availability Framework,

Degradation Detection and Correction, Autonomic Rules

Generator and I/F, Network Resilience Framework)

Document Number D5.2

Status Completed

Work Package WP5

Deliverable Type Report

Date of Delivery 18.01.2017

Responsible Unit Fraunhofer

Editors Marius Corici (Fraunhofer)

Contributors

(alphabetic order)

Haytham Assem (IBM)

Daniele Bonadiman (UNITN)

Teodora Sandra Buda (IBM)

Eleonora Cau (Fraunhofer)

D5.2 – Network Security and Resilience – Engineering Release 1

CogNet Version 1.0 of 78

Marius Corici (Fraunhofer)

Fabrizio Granelli (UNITN)

Imen Grida Ben Yahya (Orange)

Iryna Haponchyk (UNITN)

Diego Lopez (TID)

Daniel-Ilie Gheorghe Pop (Fraunhofer)

Alessandro Moschitti (UNITN)

Antonio Pastor (TID)

Benjamin Reichel (TUB)

Ranjan Shrestha(TUB)

Mikhail Smirnov (Fraunhofer)

Kateryna Tymoshenko (UNITN)

Joe Tynan (WIT)

Lei Xu (IBM)

Reviewers

(alphabetic order)

Alberto Mozzo (UPM)

Bruno Ordozgoiti (UPM)

Martin Tolan (WIT)

Dissemination level PU



Change History

Version Date Status Editor (Unit) Description

0.1 01.11.2016 Draft Marius Corici

(Fraunhofer)

Provided initial template and deliverable structure


(Fraunhofer)

Added the input from Telefonica, WIT and IBM


(Fraunhofer)

Added the input from TUB and FOKUS


(Fraunhofer)

Added the first description of the mitigation actions

taxonomy


(Fraunhofer)

Modified the security tested and added the input from

UNITN


(Fraunhofer)

Added introduction and conclusions


(Fraunhofer)

Responded to the initial internal review


(Fraunhofer)

Double-checked the relationship between the

repository links and the deliverable


(Fraunhofer)

Added acronyms list


(Fraunhofer)

Merged the responses to the second internal review


(Fraunhofer)

Fixes in page alignment, font types, figure references.


(Fraunhofer)

Double-checked links, editorial repairs

1.0 18.01.2017 Final Marius Corici

(Fraunhofer)

Cleaned up comments and changes. Verification of

links and references. Formatted for an error free pdf



Executive Summary

This deliverable represents the first report on the implementation within

WP5 of CogNet. The deliverable reports two types of activities which are

still to be fully integrated:

the development of testbeds which enable the acquisition of the

appropriate amount of data to be able to train and later to test the

machine learning algorithms as well as to prove the feasibility of the

different mechanisms

the development of machine learning mechanisms which use the

acquired data for giving insight into specific management related

features

As this is an implementation companion deliverable, it includes a short

description of the various components developed and integrated into the

testbeds as well as the means to provide the data towards the machine

learning component, providing a best practice engineering of how

cognitive components can be integrated into the NFV environment (in

which network components are deployed as software only on the top of a

common infrastructure, as described in D5.1).

Although rather varied in format, due to the need to acquire different

types of data and due to the variation of the possible mitigation actions,

the testbeds follow the architecture described within the WP2 and applied

in the D5.1 for resilience and security related features. Additionally, the

machine learning algorithms considered are either well known algorithms,

proving that machine learning makes sense for network management or

directly derived from the WP3 algorithms and applied in the specific

context proving that machine learning techniques advancements are

providing additional benefit to network management.

The deliverable includes three testbeds used for the security related area

and one testbed for the resilience and the initial machine learning

techniques developed. Additionally, the deliverable includes a set of



taxonomy related considerations on the mitigation actions and necessary

for the later implementation within the testbeds.



Table of Contents

1. Introduction....................................................................................................................... 9

1.1. Motivation, Objective and Scope ............................................................................................................. 9

2. Distributed Security Enablement Testbeds .................................................................. 10

2.1. Distributed Security Enablement Testbed .......................................................................................... 10

2.1.1. Scope ....................................................................................................................................................... 10

2.1.2. Architecture .......................................................................................................................................... 10

2.1.3. Actual items implemented in CogNet ........................................................................................ 21

2.1.4. ML solution implementation .......................................................................................................... 21

2.1.5. Expected experimentation results ................................................................................................ 21

2.1.6. Roadmap of the testbed .................................................................................................................. 22

2.1.7. User Manual ......................................................................................................................................... 22

2.2. Honey net Testbed ...................................................................................................................................... 25

2.2.1. Scope of the testbed ......................................................................................................................... 25

2.2.2. Architecture of the testbed ............................................................................................................ 25


2.2.4. ML solution implementation .......................................................................................................... 27



2.3. NFV Security Anomaly Detection Testbed ......................................................................................... 28






2.3.6. User Manual ......................................................................................................................................... 31

2.4. Network traffic classification.................................................................................................................... 33

2.4.1. Architecture .......................................................................................................................................... 33

2.4.2. Download and Installation .............................................................................................................. 34

2.4.3. Deployment .......................................................................................................................................... 35



3. Dense urban area testbed .............................................................................................. 36

3.1. Testbed Description .................................................................................................................................... 36




3.2. OpenSourceMANO OpenVIM and OpenBaton Integration ........................................................ 48

3.3. Anomaly detection ...................................................................................................................................... 51

3.3.1. Download and Installation .............................................................................................................. 52

3.3.2. Deployment .......................................................................................................................................... 53

3.3.3. Initial Results ........................................................................................................................................ 56

3.3.4. Development status .......................................................................................................................... 56

4. Taxonomy of Mitigation Actions .................................................................................. 57

4.1. SDN/NFV specific mitigation actions ................................................................................................... 57

4.2. Roles of the Cognitive System ................................................................................................................ 59

4.3. Development of System Experience ..................................................................................................... 61

5. Visualization GUI ............................................................................................................. 63

5.1. GUI Installation .............................................................................................................................................. 64

5.2. GUI Interactions ............................................................................................................................................ 66

6. Conclusions and Further Work ...................................................................................... 71

Glossary, Acronyms and Definitions .................................................................................... 72

References ............................................................................................................................... 75

Appendix A. Distributed Security Enablement Testbed ................................................. 76

A.1. API call to Create OpenFlow Firewall rule. ......................................................................................... 76

A.2. Sequence Diagrams .................................................................................................................................... 77



List of Figures:

Figure 1 – Distributed Security Enablement Testbed ........................................................................................ 10

Figure 2 - DES modules hosted on Docker framework .................................................................................... 14

Figure 3 - DSE information flow ................................................................................................................................ 16

Figure 4 - DSE Gateway................................................................................................................................................. 18

Figure 5 - DSE (L)CSE Prediction................................................................................................................................ 19

Figure 6 - DSE Firewall Engine .................................................................................................................................... 20

Figure 7 - CogNet common infrastructure deploy dash board .................................................................... 22

Figure 8 - CogNet DSE build and deploy dash board ...................................................................................... 23

Figure 9- Mouseworld scenario to replicate security attack traffic patterns ............................................ 26

Figure 10 – SDN integration in OpenStack ........................................................................................................... 29

Figure 11 Model architecture for network traffic classification ..................................................................... 33

Figure 12 – Dense Urban Area Testbed .................................................................................................................. 37

Figure 13 – Zabbix Active Check (Trapping) ......................................................................................................... 41

Figure 14 - Zabbix Passive Check (Polling) ............................................................................................................ 42

Figure 15 - Basic Zabbix Server and its relations with other entities .......................................................... 42

Figure 16 – OpenVIM – OpenBaton Integration Architecture ....................................................................... 50

Figure 17 Anomaly Detection Ensemble (ADE) approach for early anomaly detection. ..................... 51

Figure 18 Increasing AUC with number of rounds. ............................................................................................ 54

Figure 19 Decreasing AUC with number of rounds. ......................................................................................... 54

Figure 20 Predictions for different variations of ADE strategies utilizing xgboost. ............................... 55

Figure 21 – Policy Decision Model ........................................................................................................................... 60

Figure 22 – Policy Decision Model with Cognitive System ............................................................................. 60

Figure 23 – Experience Control Loop ...................................................................................................................... 61

Figure 24 – Visualization GUI architecture............................................................................................................. 64

Figure 25 - NFV Slice visualization............................................................................................................................ 67

Figure 26 – Software Network Overview ................................................................................................................ 68

Figure 27 – Slice Overview ........................................................................................................................................... 69

Figure 28 – Time Series Visualization ...................................................................................................................... 69

Figure 29 – Prediction and Anomaly Detection Visualization ........................................................................ 70



1. Introduction

1.1. Motivation, Objective and Scope

To be able to prove the feasibility of the machine learning techniques for the resilience and the

security of the network, there is a need to acquire the appropriate type of data as well as to

target the results of the algorithms towards the specific mitigation actions.

In this deliverable a set of testbeds which enable the acquisition of the data as well as initial

versions of some of the machine learning algorithms as adopted from the literature and from

WP3 are presented targeting such optimizations of the network management system.

Testbed/Component

name

Scenario(s) Main

Language

Description

Distributed Security

Enablement Testbed

Security

Enablement

Java Full SDN system for threat

detection at data plane level

Honey Net Testbed Security

Enablement

Java External/Public acquired security

attacks

Security Anomaly

Detection

Security

Enablement

C ML-firewall at NFV infrastructure

level

Network Traffic

Classification

Security

Enablement

Python Network Traffic Classification

Model

Dense Urban Area

Testbed

Dense Urban

Area Testbed

C and Java Full NFV system

Anomaly Detection Dense Urban

Area Testbed

R LSTM based anomaly detection

algorithm

Visualization GUI All JavaScript Providing customized

visualization for ML-based

management in NFV

Table 1 List of testbeds and components described in this deliverable

Additionally a set of considerations on the possible mitigation actions for the security and

resilience were added to this deliverable as an introduction to the implementation which will

follow in the next engineering release.



2. Distributed Security Enablement

Testbeds

2.1. Distributed Security Enablement Testbed

2.1.1. Scope

The following sections describe the Distributed Security Enablement (DSE) testbed used for the

creation and ongoing maintenance of security based Service Function Chains (SFCs) that typically

resides at a Service Providers (SP) edge network. It also describes how Machine Learning (ML)

shall be considered to assist the detection of threats in a tenant’s data plane. It includes

architectural and functional concepts, principles and components used in the construction of

composite security zone services through a deployment of SFCs, which then can be considered as

a proposed solution to track and respond to flood based security threats in a 5G multitenant

network.

2.1.2. Architecture

2.1.2.1 Functional Architecture

The foundational architectural DSE concepts of a security zone and machine learning architecture

are characterised as follows:

Figure 1 – Distributed Security Enablement Testbed

Security Zone – A security zone is a collection of tenant based

network segments that share security requirements. The security zone



contains Network Functions (NFs) that include services such as a layer

3 probe service that records an IP header in a tenant’s flow, access

control (802.1X protocols) service that authorizes a tenant’s end device

access to the network through the Network Access Server (NAS),

usually based on RADIUS. It also contains an IP services component

that adds the possibility to redirect a flow, via a static route, to

applications outside of the service chain, for example to a DMZ or

quarantine service to further analyze the suspect flow packet content.

The NAS service also adds value by contributing to the machine

learning dataset in the form of RADIUS records. All security based

network functions are connected via an OpenFlow enabled switched

fabric.

Security Service Function Chain - The security zone service chain

deployment and initial orchestration are constructed by NFV

Management & Orchestration (MANO) namely NFV Orchestrator

(NFVO) and VNF Manager (VNFM) [1]. The service chain, illustrated in

Figure 1, contains multiple steps that implement different security

services on tenant’s traffic. For instance:

Network Access Server. End device authentication and accounting

logs.

Layer 2 Firewall. OpenFlow’s match and action rules (OpenVSwitch).

Layer 3 Firewall. Access Control List, logging (ACL from pfSense).

IP services. IP forwarding, flow quarantine, etc.

Probe. sFlow and NetFlow IP header samples.

Service Function Paths. A security Service Function Path (SFP) is a

mechanism used where a tenant’s data plane flow can be orchestrated

to switch traffic to different parts of the service chain, there by

applying distinct security policies to each flow path. For example, the

default path through the security zone might initially have no access

control list applied to its flow path, but it is necessary for the flow to

transverse the probing service for the gathering of monitoring



statistics. Another instance is where a tenant possibly has a default set

of destination IP addresses blacklisted, resulting in the tenant’s flow to

include a layer 2 firewall service in its service function path.

Probes and Log storage. At this stage in the service chain probes

monitor the data plane via sFlow or NetFlow methods, producing

sample statistics that will be presented to machine learning methods

operating on both streamed and batch platforms. The processing of

these statistics will be discussed later in the machine learning section.

Validation. In order to produce a more accurate threat prediction the

architecture includes an external validation component. Where by the

locally calculated threat prediction score can also be weighted to

include queries from external black list providers.

Machine Learning. Here a range of algorithms will predict security

based threats. The primary focus is to determine when a tenants traffic

can be classed as a flood based attack, namely DDoS attacks that

include SYN (both vertical and horizontal attacks), SPAM, ICMP and

DNS attacks. The architecture permits for algorithms to be executed

from batch and near real time analytics, thus allowing predictions to

be rendered from historical stored datasets as well as live streamed

datasets.

Security Zone Orchestration. The security zone’s service chain will

accept instructions from the security orchestrator. The instruction

includes the update of path flows, management of both layer 2 and

layer 3 access control lists and the authorization of an end device to

the network. The actuation part for the DSE is summarized in the later

section on Data Plane Actuation.

Object Storage. During the life cycle of a machine learning method

the current state can be serialized and stored as a snapshot for

subsequent functionality. These snapshots can be used to align the

machine learning method to a known state.



Continuous Integration. Not depicted in Figure 1 although of

significant importance, is the functional approach to Continuous

Integration (CI) and Continuous Deployment (CD) during the

development of the software release and integration cycles of the

DSEs modules. The CI component provides a mechanism to describe,

store, locate and orchestrate the testing of the DSE modules, whilst

also providing a method to automatically deploy into a production

environment during an automated build cycle.

Data plane Actuation. The DSE’s service chain has a number of flow

renders depending on the anomaly detected and firewall policy in

place.

Network Access Server. Disable/enable users network access.

Layer 2 Firewall. OpenFlow’s match and action firewall rule.

Layer 3 Firewall. Access Control List (ACL) pfSense.

IP services. IP forwarding, direct flow to remote quarantine.

2.1.2.2 DSE service state machine considerations

There are two channels available to the DSE module to record state in a snapshot. Firstly, is to

take a snapshot of the container the ML method is hosted on and secondly, is to take a snapshot

of the machine learning object at different stages. For example, DSE snapshots are milestones at

the untrained state, a trained state, events in production, etc.

DSE Docker container snapshot

docker commit -p 5842907ba04a DES_engine$buildNo

docker load -i /root/DES_engine$buildNo.tar

Machine learning information storage

Incorporated in the DSE Prediction method is the ability to export to storage a serialized machine

learning object. It uses a combination of build number and time stamp to identify and store the

object. The serialized stored object could be imported at a later stage for continuous usage,

negating the costly requirement to retrain the Machine learning method.

public class storage {

private String mongoDB_URL = "mongodb.cognet.5g-ppp.eu";

privateintmongoDB_port = 27017;

private String mongoDB_DB_name = "CSE_DSE";

private String build_no = "0";

private DB db;

privateMongoClient mongo;



public void write() {

System.out.println("storage ML object write!");

DBCollection table = db.getCollection(build_no);

table = db.getCollection(dse_ML_object.get_build_no());

table.save(dse_ML_object);

System.out.println("storage ML object complete!");

}

}

2.1.2.3 DSE running instances (Docker containers).

Currently the three DSE executables resides on three Docker containers,

1. DSE Gateway

2. DSE LCSE Prediction

3. DSE Firewall Engine

These three software module reside as the machine learning and security & NFVI orchestration

on the Distributed Security Enablement Testbed Figure 1.

Figure 2 - DES modules hosted on Docker framework

The Docker containers are built automatically by interpreting instructions contained in a Docker

file template. Shown in the latter section is the DSE Docker file that contains all the commands

that aid the creation of the DSE base image. It brings the container to a package level that can be

used in all of the DSE Docker based containers. Linux binary packages in the base image include

Oracle Java 1.8, Maven 3.3 and Git.

vi~/DCSE/Dockerfile

FROM ubuntu:14.04

MAINTAINER Joe Tynan WIT <[email protected]>

RUN apt-get update

RUN apt-get install software-properties-common -y



RUN add-apt-repository ppa:webupd8team/java -y

RUN apt-get update

RUN echodebconf shared/accepted-oracle-license-v1-1 select true |debconf-set-selections

RUN apt-get install oracle-java8-installer -y

RUN apt-get install oracle-java8-set-default

RUN apt-get install maven -y

RUN apt-get install git -y

The API call to deploy the DSE base image to Docker server: ($docker is a CI environment

variable)

curl -v -X POST -H "Content-Type:application/tar" --data-binary '@Dockerfile.tar.gz'

http://$docker:4243/build?t=wp5dse

To execute a DSE service on a Docker container: ($docker is a CI environment variable) the

following is issued each time DSE relevant code is checked into the CogNet source code control

repository.

curl -X POST -H "Content-Type: application/json"

http://$docker:4243/containers/create?name=dseML_container -d '

{

"Name": "wp5dseMLcontainer1",

"AttachStdin": "false",

"AttachStdout": "false",

"AttachStderr": "false",

"Tty": "false",

"OpenStdin": "false",

"StdinOnce": "false",

"Cmd":["/bin/bash", "-c", "echo Starting DSE;git clone

https://CogNet5GPPP:[email protected]/CogNet-5GPPP/WP5-DSE-.git;cd WP5-DSE-

/code$;mvn clean install;java -cp "target/dse-1.0-SNAPSHOT.jar:lib/*" eu.cognet.lcse.ml.dse.App

;echo Stopping"],

"Image": "wp5dse:latest",

"DisableNetwork": "false"

}

'

2.1.2.4 Information Flows and Functional Description.

As part of an autonomic process we have adapted a Monitor, Analyse, Plan and Execute (MAPE)

loop to highlight how information flows through the Distributed Security Enablement task. The

red line in Figure 3 highlights the path that the DSE dataset information will take to traverse the

CogNet Common Infrastructure. The DSE framework sequence diagram is located in Appendix

A.



Figure 3 - DSE information flow

Monitor (sFlow + NetFlow probes and the DSE probe Gateway) Active

monitoring on the tenant data plane is delivered by OpenVSwitch

probes and the DSE Gateway. The DSE Gateway disassembles probed

packets and places them on the appropriate Denial of Service (DoS)

tenant Kafka queue for the next phase in the process. The probing

service is an element of the service chain and is labelled as the L3

probing service in Figure 1. The DSE preferred monitoring protocol is

sFlow as this has a minimal impact on the performance of the probing

switch, but has the drawback that it is a sample of the data planes

traffic and not a one to one flow sample. The following command set

enables probe services on OpenVSwitch:

-bash-3.00$ sudoovs-vsctl add-brDSEbr # create bridge

-bash-3.00$ sudoovs-vsctl add-port DSEbr enp1s0f0 # add interface

-bash-3.00$ sudoovs-vsctl add-port DSEbr enp1s0f1 # add interface

-bash-3.00$ sudoovs-vsctl set-controller DSEbr tcp:162.13.119.228:6633 # add OF controller

interface

-bash-3.00$ sudoovs-vsctl -- --id=@sflow1 create sFlow agent=enp2s0

target=\"192.168.1.100:6343\" header=128 sampling=64 polling=10 -- set Bridge

DSEbrsflow=@sflow1 # enable monitoring interface



Analyze (DSE LCSE Prediction) The DSE prediction Engine has three

approaches to creating an anomaly score forecast. These include

making a prediction on the rate of change over a time series, the

second is to use decision trees that classify samples extracted from the

ring buffer, then applying a Random Forest method to create a threat

prediction and the final is to use a machine learning method from the

service catalogue described in D2.2.Samples are collected via a Kafka

queue from the relevant topic, then placed into a ring buffer. The ring

buffer provides a mechanism to store samples with a short lifespan in

a time series attribute. A full description on the prediction component

is discussed in the proceeding sections. DSE LCSE prediction engine is

deployed as a Docker container and is represented in Figure 1 as ML

and Log Storage component.

Plan (DSE Firewall Engine) The DSE firewall Engine will implement the

corresponding actuation on predicted malicious data plane flows:

drop, log, forward, quarantine, ignore.

Execute (OpenFlow firewall + iptables (pfSense) + RADIUS

Authentication + IP forwarding)

Actuation in the Service chain appears in four zones.

1. First is where the user/device access is disabled by the RADIUS

configuration database, it in turn actuates the user session

instance at the NAS server (Figure 1) in the service chain.

sed -i -e 's/device1 Cleartext-Password := "password"/#device1 Cleartext-Password :=

"password"/g' /etc/freeradius/users.conf

2. OpenFlow ACL rule via an OpenDaylight API call. The actuation is

executed by a L2 firewall service (Figure 1) as a Match action

(drop) rule. The API call is demonstrated in Annex B: OpenFlow

Firewall rule section.

3. Flow can also be directed to an open source firewall

implementation (pfSense). This allows the service chain to also

firewall IPv6 traffic in the data plane. Depicted in Figure 1as a L3

firewall service.



4. IP forwarding (Figure 1) service will allow a flow to terminate and

enter the IP routing realm. The flow could be routed to a DMZ or

quarantine for future inspection via a static route.

sudo route add -net 192.168.0.2/32 gw 192.168.1.1 netmask 255.255.0.0

2.1.2.5 Software modules currently under development.

1. DSE Gateway -The software module DSE Gateway comprises of a DSE

property file, probe listening port services and Kafka queue producer. The

DSE Gateway sequence diagram is located at Appendix A: Distributed

Security Enablement Testbed.

Figure 4 - DSE Gateway

DSE property file:

The DSE Property file defines what data fields need to be extrapolated from the probes

protocol payload, the order of tuples and its corresponding probe topic that these

samples are to be posted to. The current DSE property.prop contains the following:

#feature#kafkatopic#sampleproto#sip#dip#sport#dport#tcpproto#pktlen

DoS_dns,DSE_FLOOD_DNS_Q,sflow,sip,dip,dport,tcpproto

DoS_hor,DSE_FLOOD_SYNC_HOR_Q,sflow,sip,dip,dport

DoS_ver,DSE_FLOOD_SYNC_VER_Q,sflow,sip,dip,dport,tcpproto

DoS_spam,DSE_SPAM_SYNC_VER_Q,sflow,sip,dip,dport,tcpproto

DoS_icmp,DSE_ICMP,sflow,sip,dip,dport,tcpproto

DoS_amp,DSE_AMP,sflow,dip,dport,tcpproto,pktlen

Sample Probe port:

The DSE Gateway module includes a sample probe component that listens on a port for

tenant data plane samples. For instance, probe message encapsulated by sFlow messages

would arrive on port 6343, NetFlow probe packets would arrive on port 2055.



DoS Dataset per topic producer:

As defined in the DSE property.prop property file the Kafka producer will send samples to

queues based on topic and tuple parameters. These samples are forwarded for analyses

to the second stage of the information flow namely the DSE LCSE Prediction component.

2. DSE LCSE Prediction

The individual software components that complete the DSE LCSE Prediction module

include the Timer, Condition Samples, Empty buffer, DSE Statistic Consumer, Score

Prediction, Score Producer (Figure 5).The module consumes data statistics from the DSE

gateway that in turn produces a security score prediction. The Prediction Engine

sequence diagram is located at Appendix C: DSE Prediction Sequence Diagram

Read

Write

T+1

T+2T+3

T+N

DSE Satistic Consumer

T

DSE Score Prediction

N-grams Circular Buffer

Sample Time = T

DSE Score Procucer

Condition Samples

Empty Buffer

T+4

T+N

Timer

Figure 5 - DSE (L)CSE Prediction

Timer: The ring buffers timer thread allows the incoming statistics samples to have a time

component. The timer thread arranges which slot in the ring buffer is active for

writing/reading and select which slot is to be emptied.

Condition Samples: This allows the system to rule out low frequency sample counts from

the threat score prediction calculations.

Empty buffer: This method nullifies all the statistics in the ultimate slot of the ring buffer,

producing a clean slot for recording in the next recording interval.

DSE Statistic Consumer: The consumer software component listens for incoming

statistics from a Kafka queue. It accepts traffic on Kafka topics that are defined at

initiation time.

DSE Score Prediction: The Score Prediction software component utilizes samples

statistics recorded in the ring buffer slot to make a threat prediction. It currently has two

modes implemented (1) a simple rate change prediction and (2) a prediction based on

machine learning algorithm, namely Random Forest.



DSE Score Producer: The Producer software component publishes threat scores on the

corresponding Kafka Score queue topic.

Thread Safe: Writing and reading to the ring buffer is implemented with thread safety in

mind. Both read and write methods are synchronized.

public synchronized Integer read_circularBuff(intslot_number,String key ){

returnstatList.get(slot_number).get(key);

}

public synchronized void wirte_circularBuff(String key){

if (statList.get(0).containsKey(key)){

Integer read_value = (Integer) statList.get(0).get(key);

read_value++;

statList.get(0).put(key,read_value);

}

else {

statList.get(0).put(key,1);

}

}

3. DSE Firewall Module

Individual software components include Black list provider, DSE Firewall property, DSE

consumer, RADIUS authentication, OpenDaylight API and pfSense API. The Firewall

module sequence diagram is portrayed in Appendix A.

Figure 6 - DSE Firewall Engine

Black list provider: The Blacklist provider component gives an external third party threat

validation to the system and can also be used to add weight to the overall predicted



threat score. The service provider also host internal blacklists as part of multitenant

security policy.

DSE Firewall property: Provides information on the location of services specific to the

DSE firewall component.

RADIUS authentication: The RADIUS authentication component controls end device and

users access to the tenant’s network (written in Ansible).

OpenDaylight API: The OpenDaylight API component here issues OpenFlow firewall

instructions to the DSE service chain.

pfSense API: The pfSense API component here issues L3 firewall instructions to the DSE

service chain.

2.1.3. Actual items implemented in CogNet

To date DSE software components that are implemented include a Jenkins build environment,

with DSE jobs defined and two data plane probes based on OpenVSwitch. Also implemented are

three software modules,

DSE Gateway component that parses the probe protocol.

DSE Prediction component that contains the ring buffer and first

iteration of the Random Forest machine learning method and

DSE Firewall Engine component that contains methods that listen for

threat scores and issues firewall commands to the DSE service chain.

The afore-mentioned software components are all executable and hosted on Docker containers

on the CogNet common infrastructure. Also implemented is an instance of OpNFV (B release)

framework with public IP address space.

2.1.4. ML solution implementation

The DSE prediction module is currently based on a Random Forest machine learning method.

The machine learning model is currently being parameterized and integrated so that the

behaviour is tuned to match DSE requirements.

2.1.5. Expected experimentation results

Because of the class and predictability of IoT traffic patterns and the nature of attacks under

investigation, it is envisaged that the accuracy of predicting a DoS based attack can be between

80 and 90 percent. The outcome will be to deliver an efficient use of a Service Providers

bandwidth and network resources. The traffic pattern under investigation is sourced from IoT

networks, this makes the type of traffic less varied and more predictable, for example the traffic

pattern would include a possible DNS lookup request and a phone home post at a regular time

interval.



2.1.6. Roadmap of the testbed

At the second stage of the DSE development we plan to finalize the ML methods, then deploy

the DSE onto a NFV based service chain hosted on an OpNFV based infrastructure, and there we

will commission as part of the testbed a Kali threat server, which will include a MHN threat

honeypot monitoring service. Also we plan to instantiate the 802.1X protocol via RADIUS and

NAS services. Finally, we plan to integrate and evaluate the machine learning methods with the

CogNet common infrastructure.

2.1.7. User Manual

The current user manual reference comprises of 1) automated install of the testbed from the CI

server and 2) the automated install and usage of the DSE CI jobs and 3) how the DSE modules

can be launched from the command prompt with a list of their corresponding arguments.

At the centre of the DSE is the continuous build server [2], defined here are the compiling, testing

and roll out of the DSE software components. Also defined [3] is the DSE Docker job that

provides the resources to build and deploy the associated DSE Docker base image. All

subsequent DSE build jobs are dependent on a successful outcome of the Docker build instance

job [3].

To install the testbed infrastructure on Jenkins:

Figure 7 - CogNet common infrastructure deploy dash board



To install the DSE software components onto the infrastructure via Jenkins CI

Figure 8 - CogNet DSE build and deploy dash board

The DSE gateway has the following external variables that include Kafka server attributes, sample

protocol type, location and property file location. To execute DSE gateway from the command

line

$ java -jar target\featureExtDSE-1.0-SNAPSHOT.jar -h

-sp : Server listening Port

-sh : Server listening interface

-st :sample_Type (sflow \ netflow \ ipfix)

-kp : Kafka port

-kh : Kafka host

-p : Property file URL

$ java -jar target\featureExtDSE-1.0-SNAPSHOT.jar -sp 6343 -sh 0.0.0.0 -stsflow -kp 9092 -kh

162.13.119.237 -p DSEproperty.prop

The DSE LCSE Prediction has the following external variables that include Kafka server attributes,

sample protocol type, location and build no, Kafka production and training consumer topics. To

execute DES LCSE Prediction module from the command line:

$ java -jar target\ngramDSE-1.0-SNAPSHOT -h

-b: build no

-kh: kafka Broker Server IP

-kp: kafkaBrokerServerPort

-nc: number Consumers

-tt: training topic

-to: online topic

-m: MongDB IP



-mp: MongDB Port

To create kafka training topic "bin/kafka-topics.sh --create --topic trainingDSE --replication-factor 1 --partitions

3 --zookeeper 127.0.0.1:2181"

To create kafka online topic "bin/kafka-topics.sh --create --topic onlineDSE --replication-factor 1 --partitions

3 --zookeeper 127.0.0.1:2181"

The Firewall Engine has the following external variables that include Kafka server attributes,

locations for elements in the Service chain OpenDaylight IP, L3 firewall IP, RADIUS IP. To execute

Firewall Engine from the command line:

$ java -jar target\ofFirewallDSE-1.0-SNAPSHOT.jar -h

-o :OpenDayLight IP

-l : L3 firewall IP

-n : RADIUS IP

-t : topic

-g :groupID

-p :kafka_Port

-k :kafka_ServerIP

To create kafka online topic "bin/kafka-topics.sh --create --topic ScoreDSE --replication-factor 1 --partitions

3 --zookeeper 127.0.0.1:2181"

$ java -jar target\ofFirewallDSE-1.0-SNAPSHOT.jar -o 162.13.119.228 -l 162.13.119.222 -t DSE_firewall -p

4643 -k 162.13.119.237



2.2. Honey net Testbed

2.2.1. Scope of the testbed

The NFV and SDN technologies, as an essential part of the 5G networks, have the ambition to

offer a security capacity. One of the key goals of the new architectures is to eradicate or reduce

to acceptable values as much as possible unwanted (or illegal) traffic from the data plane thanks

to SDN dynamic forwarding graph and dynamic security VNFs deployment or scale out/in, such

as firewall VNFs deployments or traffic redirection. Applying ML technologies to identify and

solve the problem is detailed in D5.1 as part of the Distributed Security Enablement use case.

In order to apply these solutions, a clear traffic pattern must be identified. In some cases, these

patterns are clear, e.g. a DDoS attack based on volumetric traffic, but in others it is extremely

difficult (if not impossible),mainly because there is an accepted tendency in internet for pervasive

or opportunistic encryption. This situation is limiting the capability to interact with the network to

solve security incidents or attack. This problem has been identified in the research literature and

by standardisation bodies, and it is thoroughly analysed in a recent document of the IETF [4].

The scope of this testbed is to identify and classify some security attacks patterns in the data

plane, especially related with encrypted traffic. There is an expectation to be able to classify

different types of attack traffic after a training process over ML algorithms, inspecting data

packets from Layer 2 to layer 4, avoiding payload analysis (encrypted or not), and thus, improving

the privacy.

In order to achieve this objective, this testbed is setup on the Telefónica’s CogNet Mouseworld,

described in CogNet deliverable D4.1. This lab environment has the capability of replicating

different types of realistic network traffic in a fully controlled environment, where ML algorithms

can be trained and tested.

2.2.2. Architecture of the testbed

The architecture of the testbed is based on the general architecture of the Mouseworld

particularized for this scenario. Figure 9 shows how the traffic replication is based in hacking

tools, such as Kali Linux distribution [5].



Figure 9- Mouseworld scenario to replicate security attack traffic patterns

The testbed is composed of several clients generating attack traffic and some servers running

vulnerable services.

The initial flows identified in this stage includes the generation of different types of traffic from

clients to servers:

Brute force attack and session establishment to SSH servers. Brute force tools like Hydra

[6] allows us to replicate multiple access attempts to a server. Also, successfully access

and malware download and command executions can be replicated. The traffic generated

uses different cypher-suites to evaluate the ML algorithms independence of the payload.

The servers are based on the Kippo honeypot. This honeypot is not only able to accept

SSH session establishment, but also a shell emulation environment, generating an audit

log of all the commands generated, such as URL files download.

Web application vulnerabilities attacks. Some well-known attacks, such as SQL injection

or remote file inclusion. The transport protocol in this case is HTTP or HTTPS. The server is

based on the Glastopf honeypot. Glastopf is a python web application honeypot offering

different web server vulnerability types. Glastopf has the capability to deal with known

and unknown attack from several types, generating dynamically answers adapted to the

type of vulnerability, also known as “dorks” that are strings in the answer that

vulnerabilities trigger.



One key aspect in order to test and validate the algorithms and identify their performance is to

include in the training process the following conditions:

Normal traffic, non-related with security attacks. In general terms, the Mouseworld can

replicate multiple types of traffics, like web browsing, video streaming, or network

troubleshooting. Some of them replicated in a closed environment, other generated by

real access to the Internet, such as Internet-wide DNS queries or access-speed tests and

realistic browser request will be generated to produce fully real traffic. All this traffic is

mixed in the Tstat probe traffic captured with the attacks.

Real attacks. During the last phase of the testbed the ML algorithms will be tested against

real Internet traffic, obtained “in the wild” from a real Telefonica’s Honeynet, to validate

the detection capacity.

All the traffic captured is stored locally in the testbed servers. There are no private or personal

data (e.g. identities) collected in these traffic dumps. Traffic is replicated, not real. For example, all

IP address space used belongs to a private, non-routable network (RFC1918), and the traffic is

encrypted based on temporary keys.


The current status of the testbed includes the Mouseworld lab with the capture and process

capacity of the traffic data.

Also, currently there is available for data testing a real Honeynet deployed in the Telefonica

network that uses the same software that it is being used in the Mouseworld for traffic

replication. This type of networks has a key advantage from the point of view of privacy.

Honeynets collects illegal traffic and non-requested accesses, what limits to a great extent the

applicable requirements on privacy preservation and personal data management.

2.2.4. ML solution implementation

The ML solution is based on the application of supervised classification algorithms. First, we will

apply off-the-shelf techniques in order to obtain accuracy figures when state-of-the-technique

algorithms are applied. After that, some architectures of deep (convolutional) neural networks will

be designed, trained and tested in order to show that these complex models are able to capture

the essence of data better than off-the-shelf techniques.


The expected results include a success factor on identification of types of attack patterns. Above

80% of accuracy in different types of attacks will be considered as a successful KPI for an ISP. This

accuracy allows in real environments to reduce the traffic inspection capacity required to identify

suspicious traffic in nowadays networks, especially for traffics consuming a high amount of

bandwidth. This reduction in the required processing capacity will allow ISPs to deal with the

forecast 5G traffic patterns without reducing security standards for them, or even enhancing



these standards for specifically sensitive application environments or slices, to use a term that has

become common in 5G literature.

No initial results have been achieved during this phase because the activity is focused in setting

up the replication scenario and initial training stage.


The planning status and roadmap of the activities in this testbed are:

1/2016-12/2016. Setup of testbed inside Mouseworld for new type of

traffic: Threats detection

01/2017-03/2017. First ML algorithm for Network Threats detection

03/2017-04/2017. Integration of the Testbed into the common

infrastructure of COGNET.

03/2017-06/2017. ER2. Testing and Final algorithms version, including

the description of the testing and the user manual

06/2017-12/2017. Policies enforcement into Open MANO

orchestration for Network threats

2.3. NFV Security Anomaly Detection Testbed


OpenStack provides a rich environment of cloud-based services such as scalable processing,

storage and networking. The security anomaly detection testbed uses this cloud platform to

provide the required network infrastructure and implement the necessary functions in order to

detect the anomalies and to enforce appropriate actions when required.

During the life cycle of a service OpenStack networking services are partially involved such as the

binding of a network function to a virtual network. But the basic configuration and management

is static and not suitable for almost any dynamicity except for the scaling during the life-cycle of

the service. For this case we introduce the OpenSDNCore that implements a rich set of functions

to enforce requirements from upper layers such as:

traffic flow classification

traffic flow relocation

dynamic firewall rule enforcement

virtual network management



The deployment of OpenStack is done with Devstack, which provides an automated method of

deploying OpenStack and is suitable for development and operational testing. It is not a general

installer but easy to adopt to integrate individual git repositories.

The Fraunhofer OpenSDNCore tool is used only as an example of an SDN platform, but any other

being SDN platform can also be integrated. As the main goal of CogNet is to prove and to

develop machine learning techniques which could be able to be used in the carrier grade

managed networks, thus the developments being related to the acquisition of data, to its

processing through the ML mechanisms and through the actuation in the active system and not

on the development of the active system itself.


Figure 10 shows OpenStack and its components. Neutron is the component that provides

network abstraction to all other components of OpenStack. Additional agents implement the

interface to the virtual network resources. In our case OpenSDNCore is used to provide a PoC

network service framework. To allow an interaction between Neutron and OpenSDNCore an

additional ofs-agent is provided.

Figure 10 – SDN integration in OpenStack




The implementation concentrates on the integration of OpenSDNCore and OpenStack in order to

provide a basis for the development and enforcement of firewall rules coming from the machine

learning firewall module.

1. Implementation of ofs-agent:

- the agent is used to provide basic connectivity for virtual machines

- plug/unplug ports to the openflow switch

2. Implementation of the deployment environment based on devstack

- Devstack integration of OpenSDNCore

- OpenStack in combination with OpenSDNCore can be deployed with a simple install

script

A basic Layer 3 firewall was developed using the OpenSDNCore configurations and its

northbound interface, through this providing the needed mechanisms for data flow monitoring,

filtering and processing.


With such an implementation, combined with the Visualization GUI and with the monitoring

system from the dense urban area testbed, we expect to obtain the following main results:

Detection of known and unknown threats at network level by

analyzing the data traffic within the OpenStack system at

infrastructure level – this functionality is similar and may use the same

(or swap) machine learning algorithms with the previous testbeds.

Mitigation actions at the network level in the form of quarantine of

the different users as presented in the previous testbeds.

Providing means for the infrastructure provider to mitigate multiple

parallel services with different customized firewall-like functionality.

This is possible only when the infrastructure is controlled by the

infrastructure provider, separated from the software networks, which is

not the case of the previous described testbeds.


In the next release, the testbed will be integrated with the anomaly detection as presented in the

Section 3.3and used to determine different unknown threats. A set of mitigation actions will be

defined such as the re-routing towards a quarantine network in case of malicious usage of the

network.



2.3.6. User Manual

OpenStack installation with OpenSDNCore

git clone https://github.com/CogNet-5GPPP/devstack.git cd ./devstack

Configuration for devstack installation:

[[local|localrc]]

HORIZON_BRANCH=stable/newton

KEYSTONE_BRANCH=stable/newton

NOVA_BRANCH=stable/newton

NEUTRON_BRANCH=master

GLANCE_BRANCH=stable/newton

HOST_IP=192.168.178.200

IP_VERSION=4

ADMIN_PASSWORD=cognet

DATABASE_PASSWORD=stackdb

RABBIT_PASSWORD=stackqueue

SERVICE_PASSWORD=$ADMIN_PASSWORD

ENABLED_SERVICES=rabbit,mysql,key

ENABLED_SERVICES+=,n-api,n-crt,n-obj,n-cpu,n-cond,n-sch,n-novnc,n-cauth

ENABLED_SERVICES+=,g-api,g-reg

ENABLED_SERVICES+=,horizon

enable_plugin nova https://github.com/CogNet-5GPPP/nova master

enable_plugin neutron https:// github.com/CogNet-5GPPP/neutron stable/newton

DISABLED_SERVICES=n-net

ENABLED_SERVICES+=,q-svc,q-agt,q-dhcp,q-l3,q-meta,q-metering,neutron

#Q_USE_SECGROUP=True

FLOATING_RANGE="172.18.161.0/24"

FIXED_RANGE="10.0.0.0/24"

Q_FLOATING_ALLOCATION_POOL=start=172.18.161.250,end=172.18.161.254

PUBLIC_NETWORK_GATEWAY="172.18.161.1"

#PUBLIC_INTERFACE=enx5855ca260b13

# OpenSDNCore provider networking configuration

Q_PLUGIN=ml2

Q_ML2_TENANT_NETWORK_TYPE=vxlan

Q_ML2_PLUGIN_MECHANISM_DRIVERS=ofs

Q_AGENT=ofs

Q_USE_PROVIDERNET_FOR_PUBLIC=False

OFS_PHYSICAL_BRIDGE=ofsbr-main

PUBLIC_BRIDGE=ofsbr-main

OFS_BRIDGE_MAPPINGS=extnet1:ofsbr-main

OFS_ENABLE_TUNNELING=False

https://github.com/CogNet-5GPPP/devstack.git



# avoid vnc problems

NOVNC_BRANCH=v0.6.0

Based on this configuration the install process of OpenStack can be triggered:

$ ./stack

Installation of OpenSDNCore:

The installation of OpenSDNCore needs to be done before devstack installation.

Management of Switches in OpenSDNCore:

The following commands allow the management of switching instances of OpenSDNCore:

$ sudo ofts.sh --help

--- OFS ---

add-br BRIDGE [DPID]

del-br BRIDGE

add-port BRIDGE PORT

delete-port BRIDGE PORT

get-port-id BRIDGE PORT

listbr

list port

is_connected BRIDGE

is_present BRIDGE

OpenSDNCoreOpenFlow Controller API:

Example flow for DHCP-Request forwarding:

$ curl -X POST -H "Content-Type: application/json" -d '{

"id": 1,

"jsonrpc": "2.0",

"method": "ofc.send.flow_mod",

"params": {

"dpid": "0x0000000000000001",

"ofp_flow_mod": {

"command": "add",

"flags": [

"reset_counts",

"send_flow_rem"

],

"idle_timeout": 0,

"ofp_instructions": {

"write_actions": [

{

"output": {

"port_no": "0xfffffffb"

}

}

]

},

"ofp_match": [

{

"match_class": "openflow_basic",

"field": "udp_dst",

"value": "67"

}

],

"priority": 40,



"table_id": "0x00"

}

}

}' http://127.0.0.1:10010

2.4. Network traffic classification

As emphasized in Deliverable 5.1, the main goal of distributed security enablement is to classify

the traffic traversing the data plane into malicious and non-malicious. Such a traffic classification

task has a binary nature. In this Section, we will see an extension of this task to a rather general

setting, as a multitude of scenarios may arise, e.g., scenarios requiring a differentiation between

the types of malicious traffic which can apply to any of the security testbeds as described in the

previous three subsections.

We proposed a model for the network traffic classification within the NetCla: ECML-PKDD

Network Classification Challenge (http://www.neteye-blog.com/netcla-the-ecml-pkdd-network-

classification-challenge/). The objective of the challenge was to predict, for a transmission, the

type of the application that generated it. There were 20 target types of applications, thus, - a

multi-classification problem. For each data point, the measurements of a number of various

performance indicators and network parameters were provided. The data points were given in a

chronological order, sequentially, as the corresponding transactions were registered in the

network.

In the following, we will describe the classification architecture and give guidelines on how to run

the corresponding software. The tuned version of the proposed model obtained the 1st place in

the official NetCla ranking.

It should be noted that our study on application classification allows us to acquire knowledge on

how to characterize application by their generated

network activities. This enables designing of more

powerful models of anomalous applications, i.e.,

those behaving differently from the others.

2.4.1. Architecture

On the high level, the model can be represented as

in Figure 11. The raw data pass through a three-

stage feature generation and preparation process.

1. Feature Discretization

The attributes given in the data have values from

inhomogeneous number ranges. Thus, we

suggested applying a feature discretization method

to attributes having continuous values, namely,

Multi-interval Supervised Attribute Discretization

(Fayyad & Irani, 1993), which is implemented in the Figure 11 Model architecture for

network traffic classification

http://127.0.0.1:10010/

http://www.neteye-blog.com/netcla-the-ecml-pkdd-network-classification-challenge/

http://www.neteye-blog.com/netcla-the-ecml-pkdd-network-classification-challenge/



weka0F

1 toolkit.

2. Feature Generation using Random Forests

On the other hand, the raw attributes are very few for delivering a sufficient discriminative power.

To tackle this issue, we propose to generate additional feature combinations using Random

Forests (RF), which are very efficient in detecting non-linear patterns in the data. We trained,

again using weka, an RF classifier and then extracted all the paths from all the trees of the

resulting RF and used them as features.

3. Adding Label Dependencies as Features

With the third step, we aim at making use of structural information underlying the data, i.e., the

temporal information of the data points. For each data point (example) (xi,yi), we add the labels of

the preceeding Nexamples, - yi-N, yi-N-1, …, yi-1, as features. On the training stage, these

correspond to the gold labels, while, on test, we use the predictions of the classifier itself. This

way, we can encode the contextual information of each instance, namely what type of traffic

preceded the current transaction.

Once we select and extract the features, we perform training with a linear SVM classifier. Since

the data is large, moreover, with the extended feature set, we adopted a multi-class linear

classifier from the LIBLINEAR (http://www.csie.ntu.edu.tw/~cjlin/liblinear/) library, specifically

destined for large-scale computations.

Our code release provides the implementations of training and testing phases of our network

traffic classification model, which include Step 3 of our feature generation process. The features

produced by steps 1 and 2 are already pre-computed and given in the data file.

2.4.2. Download and Installation

We implemented Step 3 by making necessary modifications to the original LIBLINEAR

implementation. More specifically, we modified LIBLINEAR so that, when classifying test instance

xi, it uses its own predictions for xi-N,…,xi-1 as additional features for this data point, as described

in Step 3 of Section2.4.1.

1. Download

Use the following command to clone LIBLINEAR-LD from the repository.

git clone https://github.com/CogNet-5GPPP/WP5-CSE.git

2. Compile the code of the LIBLINEAR-LD

$ cd liblinear-ld/

1http://www.cs.waikato.ac.nz/ml/weka/

http://www.csie.ntu.edu.tw/~cjlin/liblinear/

http://www.cs.waikato.ac.nz/ml/weka/



$ make clean $ make

Alternatively, follow the original LIBLINEAR README file (in liblinear-ld/).

2.4.3. Deployment

The current distribution contains a sample dataset, comprised of training and validation parts.

The data should comply with the SVMlight1F

2 format, with each line denoting a feature vector

starting with a label (class ID: 0, 1, 8, etc.) followed by a list of sparse features, e.g.,

8 391:1 937:1 1296:1 1797:1 5100:1 7826:1 ... 87551:1

To train a model, issue the following command 2F

3:

$ ./train –s 2 –c 1 –l 20 ./data/train.data ./data/ld-s1c1l20.model

Here, -s and –c are the standard LIBLINEAR parameters, the first is used for choosing the type of

SVM solver, the second one - for setting the SVM’s regularization parameter C. With the

parameter –l, one may vary the number N of the preceding labels to consider for each data point

(0, by default). Further, go the training set file and the output model path.

To test the obtained model, run:

$ ./predict –l 20 ./data/valid.data ./data/ld-s1c1l20.model ./data/output-ld-s1c1l20

On success, the predictions can be found in./data/output-ld-s1c1l20, which has a simple

format: each line contains one class label.

2http://svmlight.joachims.org/

3Before running this and the following command, you need to unpack the train and

validation files, train.zip and valid.zip, respectively, located in the liblinear-ld/data folder.

http://svmlight.joachims.org/



3. Dense urban area testbed

3.1. Testbed Description


The role of the dense urban area testbed is to define a reference testbed infrastructure for the

reliability issues which may be solved by using the machine learning algorithms as described in

the following subsections.

As presented in D5.1, the dense urban area testbed is composed of a set of components which

emulate a comprehensive telecoms system starting with the emulation of a large number of

devices which connect to a realistic packet core network deployed on top of an NFV

environment, as is expected to happen in the near future in a carrier-grade operator network. The

NFV environment is completed with an NFV orchestrator which enables different actuation

actions specific to the life-cycle management of software only network functions. Additional to

this, as presented in D5.1, a set of actions will be executed directly in the active system through a

minimal OSS implementation, thereby emulating operations which do not pertain strictly to the

NFV management.

The following directions are taken for the data acquisition, algorithm implementation and

assessment of the mitigation actions:

1. Anomaly detection – determining whether the system maintains the appropriate

behaviour and whether compensation mitigation actions can be executed to bring it back

to the normal behaviour.

2. Clustering of users and network resources – simplifying the network management while

at the same time creating customization by clustering the users and the network

resources depending on their specific usage patterns to a reduced set of clusters with the

same behaviour

3. Fault detection and correlation – in case of failures, determining the root cause across the

multiple fault levels including hardware infrastructure, virtualization layer and service

layer.

4. Correlation of consumed resources – determining in a specific given system which

resources are needed by the components in correlation with the usage and the resources

consumed by other components. Through this a customized scaling process can be

determined for the specific deployment.


The architecture of the testbed is illustrated in Figure 12. In the following the functional

components which were integrated are functionally defined, followed by the specific work which

was developed in the CogNet project.



Figure 12 – Dense Urban Area Testbed

3.1.2.1 The Active System

The active system is based on the Fraunhofer FOKUS Open5GCore toolkit. Although not open

source, this decision to use the Open5GCore was taken as the Open5GCore represents a

reference platform for the software packet core implementation, with a first release in 2009

(under the name of OpenEPC) and a continuous functional and performance development since,

through this ensuring a minimal stability and similarity to carrier grade operator networks.

Additionally, the development of the active system is not a goal of CogNet, thus any system

which is providing enough maturity and relevance could be used for the proof-of concept, while

giving the opportunity to concentrate on the management plane related functionality, enhanced

with machine learning features.

In the following, the Open5GCore is shortly described, as the various implementation features

may affect substantially the monitored metrics as well as the results obtained through the

machine learning algorithms.

The Open5GCore runs on top of a Fraunhofer FOKUS own developed platform. The platform was

implemented specifically for running telecommunication network components, with the support

for a large number of protocols such as 3GPP S1-AP, Non-Access Stratum (NAS), GPRS Tunnelling

Protocol (GTP) or Diameter. The platform is written in C language and follows the modular design

of SIP Express Router. All the functionality (protocols, interfaces and components) are

implemented as independent modules. The Open5GCore supports an efficient mechanism for

inter-module communication. Additionally, modules can use each other to exchange functionality

in a plugin model.



Several characteristics of the platform are influencing significantly the monitored time series and

thus will provide some specific customization to the machine learning algorithms, thus are shortly

mentioned in the following, although not pertaining to the CogNet developments.

Additionally to the Open5GCore platform, the base code includes multi-process and

parallelization management, pool of workers for processing the tasks, a FIFO task queue, precise

timer and task scheduling etc. Being a telecom system, almost all the session processing can be

parallelized, the system scaling in an almost linear manner with the number of requests. However,

when the capacity limit is reached, the system misbehaves in a drastic manner (i.e. system failure),

which from an anomaly detection perspective makes the detection of the anomalies more

important (e.g. when such state is reached) as well as the sharpness of the detection (e.g. if

detected too late, the result is unusable as the system is not able to efficiently recover).

All the memory is pre-allocated as not to depend on the system allocation which produces a

large number of interrupts within the system. Additionally, a wrapper of the isolated and shared

memory is offered, enabling a fast development of new modules. This limits to a minimal the

effects of memory swapping in the operation systems, one of the most consuming operations

and a side effect into the delay of the processing of the requests. Because of this, a set of

previously determined side-effects (i.e. processing anomalies) of the operating system on the

software networks was removed.

Open5GCore features its own multi-level logging system enabling the fast spotting of different

errors depending on the logging level. As the logging relies on the pushing of the output

towards the standard output device, it should be completely de-activated during performance

measurements – the penalty of logging is around 400% in processing time due to the multiple

interruptions at operating system level. Due to the logging system, two types of measurements

were obtained – highly performant ones where it is almost not possible to fill the capacity of the

system (without the logging), system failing due to less performant programming before, and low

performant ones where the capacity is filled with the logging interruptions which are not

significant from the network capacity measurement perspective.

To run Open5GCore, there is a set of scripts which have to be executed. The scripts include the

configuration of the running nodes as pre-requisites to the actual running of the components

(installation of libraries, compilation, installation of configurations, provisioning of data bases,

installation and starting of the services. The configuration scripts automate also the network

configuration of the component, depending on its type. For making a commodity the installation

of a testbed, Open5GCore has a specific directory including the default configuration scripts for

all the components. For installing a new component the following steps have to be executed:

Installation of the Operating System

Preparation of the system for the specific component needs. A careful

attention has to be given to have the necessary network interfaces

available in both physical and virtual setups

Installation and configuration of the Open5GCore component



Restart and usage of the system

This steps were fully automated using Fraunhofer OpenBaton (www.openbaton.org), an open

source (Apache 2 license) ETSI NFV MANO compliant component able to remotely install, deploy

and configure a virtual network ecosystem, such as the one proposed in the testbed based on a

virtual network function descriptor configuration file. OpenBaton is able to orchestrate the

network deployment as well as to manage during the runtime of the testbed of the different

benchmarks. Because of this, the system can be deployed on demand when needed, not

requiring a pre-reservation of resources, making it easy to configure for different workloads and

for accumulating the monitored data.

3.1.2.2 Open5GCore Benchmarking Tool Functionality

The benchmarking tool was designed to assess the performance of the EPC core networks for

different number of subscribers, different number of eNBs and with different configurations,

enabling the quantitative evaluation of the different core network solutions. The benchmarking

tool provides the load of the system in the CogNet testbed including different types of synthetic

load as well as replaying realistic loads. In the following the benchmarking tool features are

shortly described, as they represent the major limitations of the testbed from the perspective of

workloads which can be introduced, drastically affecting the monitored data and through this the

training and the evaluation of the machine learning algorithms.

The BT includes:

Northbound API – a functional component which is able to receive the

benchmarking configuration from the test administrator.

Benchmarking Tool Rules (BTR) Module – performs the testing

process: based on the test configuration, it registers the defined UEs in

the network and requests the specified test operations through the

benchmarking tool module;

Benchmarking Tool (BT) Module - handles the EPS related

functionalities like UE creation, registration, operations and acts as a

singular or a group of eNBs that interact with the network;

UE Pool – represents a runtime subscriber database in which the state

information is maintained for each subscriber. As the expected

number of subscribers is in the order of 1 million, the state per

subscriber should be limited to a maximum of 1kB.

eNBs – at this moment the eNBs are running as separate processes. A

maximum number of eNBs per BT is 10, each representing a mobility

cluster. During the initial performance measurements, this limitation



was overtaken, due to unexpected good performance results of the

eNBs as well as of the handover network support.

For the data traffic emulation, two different network functions were developed and integrated

into the benchmarking tool: the traffic generator and the traffic analyser. They include the

following functionality.

1. Packet Injector: The packet injector generates data traffic according to different patterns. The

current supported patterns are

a. Ping: The short message transmitted with very low frequency enabling to test whether

the connectivity through the testbed is established.

b. Max. UDP: Filling up the network link with UDP messages enabling to maximize the data

traffic over the specific network link.

c. File transfers: Emulating bulk data traffic on top of UDP or TCP connections. The packet

injector generates the data packets internally, following the packet size and frequency of

the specific pattern. The data packets include the following additional fields:

i. A mask at the beginning of the data packet: Not to make any parsing and

mismatched protocol dissection into network sniffing tools such as Wireshark.

ii. A Session Id: To identify to which data flow the data packet pertains to.

iii. A Sequence Number: To identify order oriented properties of the data packets.

iv. A timestamp for the later correlation of the measurements. Similar to Iperf, the data

packets don't include the real payload as real data doesn't affect in anyway the

processing within the packet core.

2. GTP Encap/Decap: The Open5GCore GTP encapsulation/decapsulation module was added as

a part of the data path to be able to emulate the data traffic as it is transmitted to and from

the eNB.

3. IP Connectivity: A data traffic steering module over IP network was added enabling to change

the destination of the data packets as needed for the specific experiment.

4. Packet Statistics: The packet statistics module receives the data packets of the packet injector

and generates specific statistics based on them. The packet injector and the packet statistics

modules maybe co-located on the same network function and thus, sharing the clock, or

maybe synchronized via NTP. The clock information correlated with the timestamp in the data

packets gives the packet statistics module the possibility to execute the following statistics.

a. Capacity: The number of data packets sent and received at the other end representing the

network capacity of the SUT.

b. Delay: The comparison of the two timestamps gives the opportunity to measure the delay

while communication through the SUT. Selective comparison is enough.

c. Packet Loss: The number of data packets sent and not received at the destination, based

on the missing sequence numbers.



d. Jitter: The variation in time of the receiving of the data packets within the same session.

3.1.2.3 Monitoring

The Zabbix is used for monitoring the system. The Zabbix monitors numerous parameters of the

network like incoming/outgoing traffic through various interfaces as well as the health of the

system including IT infrastructure like CPU, memory, disk usages. It has a flexible notification

mechanism that offers excellent reporting and data visualization features based on the historical

and current data. Zabbix is written and distributed under GPLv2.

Features of Zabbix include: data gathering, flexible threshold definitions and highly configurable

alert mechanisms, real-time graphing and extensive visualization options, historical data storage,

network discovery and the availability of Zabbix API.

Zabbix consists of several major software components. The basic Zabbix architecture is shown in

the Figure 15.

Server: It is the central repository in which all the configuration, statistical and operational data

are stored and it is that entity that will actively alert administrators when problems arise in any

monitored system. It mainly consists of Zabbix backend server, web frontend and database

storage. The Zabbix server runs as a daemon process and can be started by executing the

zabbix_server script.

Database Storage: All the configuration information as well as data gathered by Zabbix are stored

in database (MySql, PostgreSql, Oracle, SQLite etc.) which the backend server and web frontend

interact with.

Web Interface: The frontend can be accessed from anywhere and easy to use with lots of

configurable options. This interface is part of the Zabbix server and runs on same machine as the

server. It is written in PHP.

Proxy: On behalf of Zabbix server, the proxy can collect performance and availability of the data. It

is an optional entity but can be handy when the distribution of the load is required. All the

collected data is buffered locally and then transferred to the Zabbix server the proxy belongs to.

Figure 13 – Zabbix Active Check (Trapping)



Zabbix Agent: The Zabbix agents are deployed on monitoring targets or devices to actively

monitor local resources (hard drives, memory usage, processor performance and statistics) and

applications and then report the gathered data to the Zabbix server. The agent gathers

operational information locally and reports data to the Zabbix server for further processing. It can

do passive and active checks on the system. In passive check, the agent responds to the query

request sent by the Zabbix server. In the active check, the agent first retrieves a list of items from

Zabbix server for independent processing. Then, it will periodically send new values to the server.

The check mechanism is configurable. They are shown in Figure 13 and Figure 14.

Figure 15 - Basic Zabbix Server and its relations with other entities

Data Flow within Zabbix: The data flow is quite easy to understand. A host has to be created in

order to create an item. The item gathers data. After the item is created, we can create a trigger.

The action method can be defined once we have a trigger created. Once, all those elements are

created, it is easy to see an overall flow. This can be easily done with the help of templates.

Zabbix Configuration

1. Hosts and Host Groups: The Zabbix hosts are the devices to be monitored (workstations,

servers, switches etc.). To begin with the monitoring, the first thing that has to be done is

creating a Host. Hosts are organized into host group

Figure 14 - Zabbix Passive Check (Polling)



2. Items: Items are the metrics to be monitored. Once the host is configured, monitoring items

start gathering actual data. Many Items can be quickly added by using a pre-defined

template to the host. For each item, one should specify what type of data is expected when

gathered from the host. For that, we use item key. The item key with the key system.cpu.load

gather information about processor load while the key net.if.in gathers incoming traffic data

3. Trigger: It defines an acceptable threshold or acceptable range. The logical expression in it

evaluates the data gathered by the item and can state the current state. The trigger is fired

when the value goes beyond the acceptable range changing the status to PROBLEM. It has

two states, OK and PROBLEM

4. Events: There are several types of events generated in Zabbix. The events are time-stamped

for easy identification when it occurred. The various sources of events are Trigger events,

Discovery events, Active agent auto-discovery events, internal events.

5. Visualization: It provides an excellent way to visualize data in terms of graphs. It allows

grasping data flow, correlation problems, discovering unusual patterns. It provides with a

built-in graphs of one item data, complex customized graphs, and access to comparison of

several items quickly. It is also possible to visualize the networks using network graphs

6. Templates: It is the set of entities that can be conveniently applied in multiple hosts. The

entities maybe items, triggers, graphs, applications, screens, low-level discovery rules, web

scenarios. A Template contains all the entities described for a host. It's an excellent way of

reducing the workload and reduces possible errors during configuration.

Zabbix Agent Installation:

To monitor the machine, the Zabbix agents have to be installed on them. These agents collect

metrics data and push them to the database. First step is to install the Zabbix agent of

appropriate version and configure the Server/Server Active IP to be set to the running Zabbix

Server.

For Ubuntu 14.04,

1. $ MONITORING_IP=”IP_TO_YOUR_ZABBIX_SERVER”

$ wget http://repo.zabbix.com/zabbix/3.0/ubuntu/pool/main/z/zabbix-release/zabbix-release_3.0-

1+trusty_all.deb

$ sudo dpkg -i zabbix-release_3.0-1+trusty_all.deb

$ sudo apt-get install -y zabbix-agent

$ sudo sed -i -e "s/ServerActive=127.0.0.1/ServerActive=$MONITORING_IP:10051/g" -e

"s/Server=127.0.0.1/Server=$MONITORING_IP/g" -e "s/ Hostname=Zabbix server/#Hostname=/g"

/etc/zabbix/zabbix_agentd.conf

$ sudo service zabbix-agent restart

$ rm zabbix-release_3.0-1+trusty_all.deb

Zabbix APIs

The Zabbix Server offers REST APIs to retrieve monitoring information about the hosts, items and

their values. The Open5G-GUI makes use of these REST APIs to retrieve various monitoring

information from the Zabbix Server. This is done periodically to gather information and store

them in its local mongodb database in its intended format. The backend server of Open5G-GUI

uses the request module to send the POST request to Zabbix Server.

The first step is to retrieve the token. The retrieved token is required for further retrieval of other

resources. For that following POST request can be made.



$ curl -i -X POST -H 'Content-Type:application/json' -d' {"jsonrpc":"2.0","method":"user.login",

"params":{"user":"Admin", "password":"zabbix"},"id":1}'

http://zabbix_server_ip/zabbix/api_jsonrpc.php

where, zabbix_server_ip is the accessible IP address of Zabbix Server.

Output: TOKEN ID

To retrieve the list of hosts, the POST query shall be

$ curl -i -X POST -H 'Content-Type:application/json' -d'

{"jsonrpc":"2.0","method":"host.get","params": {"output":"extend"},"auth": "TOKEN","id":1}'


where, TOKEN is the token string obtained from the previous query.

Output: hosts and host IDs.

To retrieve the list of items of a host, we need one of the previous host ids. The POST query is:

$ curl -i -X POST -H 'Content-Type:application/json' -d' {"jsonrpc":

"2.0","method":"item.get","params":{"hostids": "10105", "output": "extend"},"auth":"TOKEN", "id":

1}' http://zabbix_server_ip/zabbix/api_jsonrpc.php

where, 10105 is the host ID of one of the hosts.

Output: List of items of that host.

To retrieve the current value of the item, following POST query is required.

$ curl -X POST -H 'Content-Type:application/json' -d ' {"jsonrpc":"2.0","method":"item.get","params":

{"hostids":"10084","output":"extend", "search":{"key_":"system.time"}},"auth": “TOKEN”,"id":1}'


where, system.time is the metric or item name

10084 is the id of the host.get

Output: Returns the JSON object that contains the latest value of the metric system.time of the

host with host id 10084.

The above mentioned curl requests are direct and can be tested easily in the terminal. The usage of request

module in Visualization GUI is a bit different and carried out by the backend server of Visualization GUI.

To get the token, following JSON object options is created.

var options = {

headers: {

"Content-Type" : "application/json"

},

json: {

"jsonrpc" : zabbixParams.jsonrpcVer,

"method" : "user.login",

"params" : {

"user" : zabbixParams.username,

"password" : zabbixParams.password

},

"id" : 1

}

}



Where zabbixParams object information is retrieved from the GUI configuration file. After adding

few other parameters to above object like options.url, options.method,

options.authorization, then, the POST request is sent using the request module with

options which returns a callback.

request(options, function(error, response, body){

// Result

});

The above request returns the Token.

To retrieve the host lists, the options object is modified as

var options = {

headers: {

"Content-Type": "application/json"

},

json: {

"jsonrpc": zabbixParams.jsonrpcVer,

"method": "host.get",

"params": {"output": "extend"},

"id": 1

}

}

Note that, the method is set is host.get, hence, using the above received Token, the request is

sent using the same request module function as shown above. The callback is called once we

have the result. The result consists of list of hosts and host ids.

Similarly, to retrieve the list of items for a host, the method in the above options is changed to

item.getand hostidsis added. The hostids is the id assigned to the selected host. The

same request module function (as shown above) is used to send the POST request and once the

result is available, the callback function is triggered which returns the list of items.

The next step is to retrieve the last value of the selected item. The method in the above options

is changed to item.get and hostids is set to the id of the selected host. Furthermore, a

filter parameter is added that contains all the list of metrics that you want to retrieve the

metrics like

filter: {

key_ :selectedMetrics

}

Where selectedMetrics is an array and contains list of metrics being monitored.

The request is sent using request module function (as shown above) and when the callback is

returned, it returns the list of metrics with their corresponding metric last values.



List of Monitored Metrics

Service Independent KPIs

# Zabbix Metric Names Unit Description

1. System.cpu.util[,user] % Percentage of used CPU resource by a user

2. System.cpu.load[percpu, avg5] % Percentage of CPU load average over 5 minutes

3. vm.memory.size[available] Bytes Available RAM in the system

4. vm.memory.size[total] Bytes Total RAM of the system

5. Net.if.out[mgmt] bps Outgoing data rate measured at mgmt interface

6. Net.if.in[mgmt] bps Incoming data rate measured at mgmt interface

End to End Measurements KPIs for BT

# Metric Name Unit Description

1. bt_procedure_delay_(attach|detach|handover)_(min|max|avg)

ms Minimum, maximum and mean average time for the overall procedure to be completed (attachment, detachment, handover, service deactivation, service activation, paging, TAU etc)

2. bt_traffic_delay_packet_(min|max|avg)

ms Minimum, maximum and mean average time for the overall UE data traffic between endpoints.

3. bt_ue_packet_delay_(min|max|avg)

ms Minimum, maximum and mean average delay for UE data traffic between endpoints.

4. bt_packet_jitter 10 μs Mean packet delay variation on data path subsequent packets.

5. bt_packet_drop_rate_(min|max|avg)

Nr. per 10.000 packets

Minimum, maximum and mean average number of observed packets that failed to reach destination.

6. bt_active_session_(min|max|avg)

Nr. Minimum, maximum and mean average active sessions emulated.

7. bt_attached_devices_(min|max|avg)

Nr. Minimum, maximum and mean average of registered UE in the network.

8. bt_idle_devices_(min|max|avg) Nr. Minimum, maximum and mean average number of idle UE in the network

9. bt_data_load_(min|max|avg) Mbps Minimum, maximum and mean average network load.

10. bt_max_procedure_complete_per_sec

Operations/sec

Maximum number or procedures completed per second successfully handled by the system

11. bt_max_procedure_failed_per_sec

Operation/sec Number of observed procedures that failed in the network.

12. bt_process_delay_(attach|detach|handover)_(min|max|avg)

ms Maximum, minimum and mean average time for local processing of a single VNF during the procedure.

13. bt_max_session_drop_per_sec Session/sec Maximum number of UE session per second dropped during the procedure.

14. bt_max_procedures_requested_per_sec

Nr. Total number of procedures requested

15. bt_internal_failed_procedures_per_sec

Nr. Perceived unsuccessful procedures per 1000 subscribers.

For CogNet the system was configured to be able to replay different traces to generate the

appropriate amount of data required for the training and the verification of the machine learning

algorithms as well as for the retrieving of the specific metrics. For this, different new

configurations were devised for the benchmarking tool to emulate a predicable load of the

system as well as for the visualization GUI.

3.1.3.1 Configuration file for benchmarking tool

The Benchmarking Tool is designed to test and assess the core network of the system. It provides

a simple functionality with configuration tests to be run on the Open5GCore system. The test

configurations range from the number of UEs emulated to the number of processes and memory

allocated for the tool. The Benchmarking Tool module is creating an EnodeB, allowing for the

virtual UEs to perform actions on the network. It provides an interface for register, attach and

detach procedures as well as a callback handler for operation statistics feedback. It handles the

network operations according to its specific procedures. The benchmarking tool has the

following XML structure.

<Module binaryFile="modules/console/console.so" >

<![CDATA[

<WharfConsole>

<Prompt text=" BT >"/>

<Acceptor type="udp" port="10000" bind="192.168.254.100" />

<Acceptor type="tcp" port="10000" bind="192.168.254.100" />

</WharfConsole>]]>

</Module>

<Module binaryFile="modules/addressing/addressing.so" >

<![CDATA[

<WharfAddressingWRR type="WRR" timeout="15">

<Address ip="192.168.4.80" weight="1" />

</WharfAddressingWRR> ]]>

</Module>

<Module binaryFile="modules/gtp/gtp.so">

<![CDATA[

<GTP>

<Acceptor id="GTP-U" type="udp" port="2123" bind="192.168.4.100" />

</GTP>

]]>

</Module>



<Module binaryFile="modules/routing_gtpu/routing_gtpu.so" />

<Module binaryFile="modules/routing/routing.so" >

<![CDATA[

<WharfROUTING><Extension id="0" dst_table="teid" mod_name="routing_gtpu"

ipv4="192.168.4.100" />

<!--<Extension id="1" src_table="teid" mod_name="routing_pdcp" ipv4="192.168.6.90" />--

></WharfROUTING>]]>

</Module>

<Module binaryFile="modules/sctp/sctp.so"/>

<Module binaryFile="modules/S1AP/s1ap.so">

<![CDATA[

<WharfS1AP>

<Local addr="192.168.4.100" port="36412" />

</WharfS1AP>

]]>



</Module>

<Module binaryFile="modules/nas/nas.so" />

<Module binaryFile="modules/benchmarking_tool/benchmarking_tool.so" >

<![CDATA[

<WharfBT

hash_table_size="32"

s1="192.168.4.100"

tac="1"

cell_id="1"

mcc="1"

mnc="1"

default_apn="default"

>

</WharfBT>

]]>

</Module>

Using this configuration, a set of data was acquired, monitored from both the subscriber and the

system side, according to the previously selected metrics and constitutes the basis for the

machine learning algorithms further described.

The data is available in the CogNet repository at:

git clone https://github.com/CogNet-5GPPP/WP5-CSE.git cd ./WP5-CSE/ADE

3.2. OpenSourceMANO OpenVIM and OpenBaton Integration

One alternative to the common OpenStack solutions is to integrate the different components

within the OpenSourceMANO, specifically the OpenVIM of Telefonica and the OpenBaton of

Fraunhofer and through this to provide a comprehensive NFV management system. A proof of

concept for this alternative to the common NFV system was implemented and presented in the

following sections.

The OpenMANO plugin works as a bridge between OpenVIM and OpenBaton by translating the

messages between them. For the OpenMANO plugin to function properly, the OpenVIM and

OpenBaton have to be setup properly.

OpenVIM Installation and Configuration

For an OpenVIM to run smoothly, two machines “Compute Node” and “Controller Node” should

be setup. In the Compute Node, the VNFs are deployed. It preferably should be an Ubuntu Server

14.04, 64bit OS with KVM, qemu and libvirt installed. A user has to be created and under some

accessible path (e.g. home directory), the Ubuntu image file has to be loaded. The OpenVSwitch

can be installed and desired number of bridges can be created.

To setup Controller Node” is to setup OpenVIM. To setup OpenVIM in another VM use the

following script.

https://github.com/CogNet-5GPPP/WP5-CSE.git



# wget https://github.com/nfvlabs/openvim/raw/v0.4/scripts/install-openvim.sh

# chmod +x install-openvim.sh

# ./install-openvim.sh

It installs all the required modules to run OpenVIM, internal database and etc.

Also, to install floodlight v0.9, there is a script available.

# wget https://github.com/nfvlabs/openvim/raw/v0.4/scripts/install-floodlight.sh

# chmod +x install-floodlight.sh

# ./install-floodlight.sh

The configuration file of OpenVIM available at openvim/openvimd.cfg should be configured for

Bridge parameters, DHCP server parameters. The mode of operation can be setup to

'development' or 'normal'. In the folder openvim/test, there are configuration files for setting up

hosts, images, networks, servers, flavors. They have to be configured and created according to

the need. There are scripts to start the OpenVIM in bin folder. It can be added in $PATH and

execute as service-openvim start in the terminal.

The OpenVIM tries to access the image stored in Compute Node, so, the OpenVIM should have

access to the Compute Node VM using the user that was created.

The OpenVIM offers northbound REST APIs to access various resources regarding VNFs, networks

etc. and allows CRUD operations. The OpenMANO plugin uses these REST APIs to carry out the

operations.

DHCP Server Installation

A DHCP Server is required for OpenVIM to assign IP address to newly created VM in compute

node.

# apt-get install dhcp3-server

Edit the file /etc/default/isc-dhcp-server to enable DHCP server in appropriate interface.

# vi /etc/default/isc-dhcp-server

INTERFACES=”eth1”

Edit file /etc/dhcp/dhcpd.conf to specify the subnet, netmask and range of IP addresses to be offered by

the server.

Then, restart the service.

# serviceisc-dhcp-server restart

OpenBaton Installation

To install OpenBaton in Linux environment, execute the following command.

# sh<(curl -s http://get.openbaton.org/bootstrap) release

For more detailed description required during the setup, the following link can be consulted.

http://openbaton.github.io/documentation/nfvo-installation-deb/

The configuration file is available at openbaton/nfvo/etc/openbaton.properties

http://openbaton.github.io/documentation/nfvo-installation-deb/



OpenMANO Plugin

The OpenMANO Plugin acts a bridge between the OpenVIM and OpenBaton for translating the

messages. It uses northbound REST APIs exposed by OpenVIM to access the resources. The

source is available at

https://gitlab.fokus.fraunhofer.de/openbaton/openmano-plugin

It can be cloned as:

$ git clone https://gitlab.fokus.fraunhofer.de/openbaton/openmano-plugin.git

After the compilation, the compiled JAR file should be added in specific folder

openbaton/nfvo/plugins/vim-drivers. Then, restart the OpenBaton, it should load the OpenMANO

plugin.

Developed functionality

As illustrated in Figure 16, when the NSD is deployed in OpenBaton, the function create_server is

triggered in the plugin, which then, sends a REST API POST request to openVIM to create a server

(VM) in the KVM in Compute Node. The deployed VM gets private IP via DHCP Server. The

dhcp_thread.py in OpenVIM reads the dhcp.leases file in DHCP server to retrieve the private IP

assigned to that VM based on MAC address. The public IP in the VM is based on a bridge

configured in the Compute Node. Similarly, network information can be retrieved using the

northbound REST API.

Figure 16 – OpenVIM – OpenBaton Integration Architecture



3.3. Anomaly detection

Figure 17 Anomaly Detection Ensemble (ADE) approach for early anomaly detection.

Anomaly detection refers to the problem of finding patterns in data that do not conform to

expected behaviour. These non-conforming patterns are often referred to as anomalies, outliers,

discordant observations, exceptions, aberrations, surprises, peculiarities or contaminants in

different application domains [7]. The ability to discover anomalies within a dataset can have a

significant impact in variety of application areas, such as: fraud detection for banking and

financial industries, intrusion detection for discovering security threats, health related problems,

performance degradation detection, traffic congestion detection and so on. For instance, a failure

within a data centre can be considered an anomaly.

Proactive anomaly detection refers to anticipating anomalies or abnormal patterns within a

dataset in a timely manner. Discovering anomalies such as failures or degradations before their

occurrence can lead to great benefits such as the ability to avoid the anomaly happening by

applying some corrective measures in advance (e.g., allocating more resources for a nearly

saturated system in a data centre). We address the proactive anomaly detection problem through

machine learning and in particular ensemble learning. We propose an early Anomaly Detection

Ensemble approach, ADE, presented in Figure 17.

The approach follows the following steps:

Data preparation. Given a labeled dataset d, the data preparation

phase involves three steps: (i) Applying existing anomaly detection

techniques, (ii) Gathering the scores of each technique on the given

dataset, and (iii) Aggregating the results of each technique for training

purposes.

Anomaly window generation to be used as ground truth. In order to

prioritize discovering anomalies in a timely manner, we utilize a

weighted anomaly window as ground truth for training the model

which prioritizes early detection. Various strategies are explored for

generating ground truth windows. Results show that ADE shows

Generate and

Append Window

Anomaly

Detection

Results

Train ADE

Apply ADE

Incoming

(Test)

Dataset d'

Timestamp Value T1 ... Tn gt gtw

10.12.2015 19 0 ... 0 0 0

11.12.2015 2000 1 ... 0 0 1

12.12.2015 25 0 ... 1 1 1

10.12.2016 20 0 ... 1 0 0

Apply and aggregate

Scores of Techniques

T1 ... Tn

on Dataset d

Training

Dataset d



improvements of at least 10% in earliest detection score compared to

each individual technique across all datasets considered. The

technique proposed detected anomalies in advance up to ~16h

before they actually occurred.

Training the ensemble model using the ground truth window

generated field from the prior step. The approach combines results

of state-of-the-art anomaly detection techniques in order to provide

more accurate results than each single technique. Moreover, we utilize

a weighted anomaly window as ground truth for training the model,

which prioritizes early detection in order to discover anomalies in a

timely manner. Various strategies are explored for generating ground

truth windows. Results show that ADE shows improvements of at least

10% in earliest detection score compared to each individual technique

across all datasets considered. The technique proposed detected

anomalies in advance up to ~16h before they actually occurred.

Applying and gathering the results of applying the model on a new

incoming or test dataset.

3.3.1. Download and Installation

1. Dependencies:

R

XGBoost

o Standard R library for Extreme Gradient Boosting, which is an efficient

implementation of gradient boosting framework

The libraries associated with test data can be cloned by

git clone https://github.com/CogNet-5GPPP/WP5-CSE.git

cd ./WP5-CSE/ADE

https://github.com/CogNet-5GPPP/WP5-CSE.git



3.3.2. Deployment

All the data that are consumed in the training phase are given in the folder “data”. The data used

for training and validating our approach is from the Numenta Anomaly benchmark (NAB) 3F

4, which

provides a set of real-world time-series datasets, denoted by D (58 files). These datasets are

labelled, i.e., contain a field that can be 1 or 0 depending on whether the record is an anomaly or

not, respectively. In this paper we use these datasets for the evaluation of the ensemble

approach. The NAB benchmark also compares Nupic with Twitter Anomaly Detection R package

and ETSI Skyline. In our evaluation we also compare against these three techniques, and in

addition to IBM SPSS solution for anomaly detection. It is important to mention that for training

the ensemble model we used the scores produced by the techniques, as shown in Figure 17. The

repository contains the data sets available from the Numenta Anomaly Benchmark in the

following path:

ADE/data/

We proposed several strategies for the anomaly detection ensemble engine based on variations

of generating anomaly windows. For the ensemble model, we used the XGBboost library, which is

an optimized distributed gradient boosting library from the R programming language4F

5. The

library provides a parallel tree boosting (also known as GBDT, GBM) that is known for being

efficient and accurate.

We devised different strategies for generating anomaly windows fields in order to investigate

their impact on the early detection of anomalies, corresponding to different variations of ADE.

Some strategies focus on giving higher weights closer to the actual anomaly for improved

precision and recall (i.e., XGB_gtl). Others focus on giving higher weights closer to the beginning

of the window for earlier detection (i.e., XGB_earliest).

The R code for training the anomaly detection ensemble model needs to be invoked by

RscriptADE_joint_ensemble.R

The script first runs a function to find the optimum number of rounds for training the model in

order to maximize the area under the curve (AUC) for ranking evaluation.

For instance, the Figure 18 and Figure 19 show the AUC on test how it is initially increasing, and

then it is decreasing as the number of rounds increases.

4https://github.com/numenta/NAB

5https://github.com/dmlc/xgboost



Figure 18 Increasing AUC with number of rounds.

Figure 19 Decreasing AUC with number of rounds.

Further it trains the ensemble model using the optimum number of rounds previously retrieved

by calling the xgboost function with this parameter. The script further produces the predictions

by calling the function on the testMatrix: predict (model, testMatrix). The predictions of each ADE

variation are then merged into a single CSV file.

Figure 20shows an example of the output obtained after running ADE on this data, where initially

as it can be observed the predictions are close to 0 suggesting there is no anomaly in the

measurements.



Figure 20 Predictions for different variations of ADE strategies utilizing xgboost.



3.3.3. Initial Results

Table 2 Earliest detection for all anomalies that have been detected by at least one

technique across all testing datasets. First ranked detections are illustrated in green and

italic font. Second ranked detections are illustrated in orange and bold font.

We analyse for each technique the detection time and rank of all anomalies across all datasets

used for testing purposes. A filtered table of these results, showing only the top performers of

ADE is presented in Table 2. Moreover, as results show that XGB_window and XGB_earliest

outperform the other ADE techniques in terms of earliest detection, we only present the results of

these techniques in the table. As it can be observed XGB_window is ranked 1st in 5 out of 14

cases, in other 3 cases as 2ndand other 3 as 3rd. Twitter anomaly detector is the 2nd performer,

with 4 discoveries ranked as 1st and 3 as 2nd , and 1 as 3rd . More interesting is to observe the

difference between the detection indexes across the discovered anomaly. For instance, even

though XGB_window is ranked 2nd for the 3rd anomaly, the difference between its detection index

and Numenta's, which is ranked 1st is of just 1 position.

3.3.4. Development status

The above libraries have been applied in [7] to validate our solution for early anomaly detection.

It may be enhanced as the further investigation in the area by the IBM team based on new

requests and data available in the project.



4. Taxonomy of Mitigation Actions

The scope of this section is to provide a set of clarifications on the possible mitigation actions of

the machine learning algorithms as basis for the further implementation of the policies of the

systems developed as presented in the previous sections.

The section gives a set of considerations on the actuation of the given system based on the

results of the cognitive process. It aims to provide a comprehensive overview on the possibilities

brought by the dynamic mitigation mechanisms based on programmability. More specifically, it

concentrates on the automation of the decision process based on experience accumulated

through machine learning as an evolution beyond the basic policy system currently deployed.

In SDN and NFV, a new intermediary virtualization layer is added between the infrastructure and

the network functions. This layer enables a large amount of flexibility

Specifically, with SDN and NFV, having a new intermediary virtualization layer between the

network functions the system becomes more flexible, freeing the different components from the

limitations of single physical components and at the same time freeing the network from the

specific cable structure. In combined SDN and NFV environments there is virtually no topology to

be strictly considered.

In order to be able to profit from this flexibility a set of dynamic mechanisms can be considered

and be added to the system in order to increase the overall resilience and security. In the

following, these mitigation actions are shortly presented with examples, followed by a set of

considerations on how these mitigation actions can be dynamically programmed by the cognitive

system

4.1. SDN/NFV specific mitigation actions

In this section, the new mitigation actions which can be added to a system are presented. Some

of these actions are already implemented into the existing de-fault standard systems like

OpenStack, some other still have to be considered depending on their feasibility for the

environment use cases.

Scaling – the main characteristic of the NFV environment is that the different components

can scale on demand. The scaling presumes the deployment of more components of the

same type in parallel in order to face up the specific load and thus to better serve the

subscribers.

Dynamic load balancing – due to scaling, the load can be split between the different

components of the same type. When a new component is started, the load can be split

between the existing and the new component.

Dynamic hot standby – thanks to the dynamic deployment of components of the same

type, the NFV environment makes possible a direct hot standby mechanism where a set



of components are deployed and configured only to be able to provide the service in

case of a failure

Adding supplementary VNFs – within a specific system a new set of components can be

added transparently in order to increase the functionality of the virtual network. For

example, a firewall with more functions can be added to a system in case a threat is

detected. In this case, even though the system becomes more complex, it is better

protected during the specific attack. The process can be executed also with dynamic

replacement of existing VNFs.

Correlated scaling – at the current moment, the NFV environment considers only the on-

demand scaling of single components in case there is a need of more resources.

However, usually, with the increase of the load in the system, multiple components will be

scaled in a similar manner. Thus, a correlated scaling of multiple components would make

sense in order to be able to maintain a coherent resilience level

Flexible topology – when deploying the software network components on the same

hardware, any possible network connection can be established directly between two

components of the virtual network as this is done by the underlying network

virtualization system which, in order to make the system work, has to connect all the

various components. Thus, links between components can be created and torn down on

demand. The topology changes will require changes in the routing system. Changing of

the routing is necessary in order to benefit of the momentary optimized topology.

Security zones – with the deployment of distributed firewall components within the same

virtual network, it is very easy to deploy the VNF components in different security zones

which can be differentiated based on access rights and privacy levels. Whenever, some

area of the network requires new access rights, a new network of the same type can be

deployed with these new access rights.

Cloning of the services – one of the main advantages of the NFV environment is that the

same network service can be deployed multiple times with different security and

reliability levels while from the perspective of the subscriber it is the same service (no

modifications needed in the end device).

For example for reliability, if a network is offering a highly reliable service to a set of

subscribers and a less reliable one for others, two networks with the same components,

but with different hot stand-by levels can be deployed. However, in order to do this, the

reliability on top of the infrastructure where the service is deployed has to be understood.

Here machine learning can dynamically determine how well the service performs on top

of the specific hardware and virtualization layer.

For example for security, three networks can be deployed in parallel with the same

functionality however with different goals from the operator perspective: one normal

network with a minimal firewall and with a set of monitoring tools to determine if the

behaviour of the subscribers is appropriate, one with a more advanced firewall and with a

large number of monitoring where potentially malicious subscribers are moved in a sort



of quarantine to determine if it is an attack or not and honey nets where determined

attacks are moved for their study (i.e. to determine the attack vectors) with no access to

real private data.

Cloning of the service gives the option to experiment. Experimentation reduces the

feedback loop and provides faster the appropriate adaptation mechanisms.

Predictions may work – a very large amount of the failure events in the network can be

traced back to previous anomaly states which were not considered as detrimental to the

system (mainly because they were not failures) such as an increase of the processing in

some key control plane component for some subscribers. Such outstanding events can

predict failures of the system in a later stage. However, in order to be able to determine

such predictions and their appropriate mitigation actions, a mechanism of correlation is

needed.

Dynamic adaptation to anomalies – the current system is based on a set of static policies

especially because the number of possible mitigations is very small. Using the previous

described mitigation actions, the system has a multitude of possibilities to adapt. One of

the most important types of adaptation are to the unknown threats or failures. The

machine learning system can determine through anomaly detection unknown situations

in which case the system can take appropriate (initially default) actions and grow

experience as the situations repeat themselves.

For the next software engineering release the most relevant of these actions will be implemented

in the form of a prototype together with the appropriate mechanisms to present the machine

learning insight.

4.2. Roles of the Cognitive System

The cognitive system comes to complete the basic policy enforcement model which is currently

in use with more dynamicity. As illustrated in Figure21, the current model which is deployed for

more than 20 years includes a Policy Decision Point (PDP) which based on the policies introduced

by the system administrator, the events received from the active system and the conditions of the

active system makes decisions and enforces them on the Policy Enforcement Point (PEP). In this

case, the PDP has to be completely pre-configured with the comprehensive set of policies by the

administrator of the system.



Figure 21 – Policy Decision Model

Figure 22 – Policy Decision Model with Cognitive System

When adding a cognitive system to the policy decision model, as illustrated in Figure 22, the

cognitive system can have three different roles, depending on the degree of involvement with the

real system.

1) Immediate action – based on the insight generated by the cognition, the cognitive

system sends an immediate action to the enforcement point for executing some

operations on the active system. This type of behaviour is not beneficial in case of

resilience and security situations as it overlooks the complexity of the managed system

and it may deteriorate its behaviour compared to the PDP which includes the necessary

policies for an appropriate behaviour.

Another sort of immediate action is a policy which is transmitted to the PDP including the

specific conditions e.g. instead of having only the event included it includes also a set of

conditions which force the PDP to a specific behaviour.

Another sort of immediate action is an implicit policy e.g. when the cognitive system

transmits a “1” it means that a specific action has to be executed, which is even more

problematic as it implies a complete correlation between the PDP and the cognitive

system information on events, conditions and the policy actions.



2) Policy triggering – through the insight of the cognitive system an event is determined be

it a complex event or a prediction which cannot be immediately derived from the

monitored information and require a trained model to be determined. In this case, the

cognitive system transmits an event to the policy system which in its turn, by checking

the conditions selects the appropriate mitigation actions.

In this situation, the policies are statically introduced by the administrator of the system

and the machine learning has the role based on the dynamic information to tweak events

in such a manner to optimize the behaviour of the system in the given network context.

3) New policy/Policy Modification – in this situation, the cognitive system takes the role of

modifying the running policies based on the gained experience while analysing the

specific data. It is considered that the system is developed with a set of default policies

which are then dynamically adapted based on the specific of the deployment to the local

network conditions and to the specific usage patterns. In this context the machine

learning is the most useful as it can modify the default system to a customized one by

using the specific dynamic statistics mechanisms.

However, in order to be able to make such policy modifications, the system has to be

adapted with a set of meta-polices of experience as described in the next section.

4.3. Development of System Experience

As illustrated in Figure 23, in order to be able to modify the set of policies on which a system is

running a new dynamic control loop has to be added.

Figure 23 – Experience Control Loop

First, the basic policy control loop presumes that events are received from the running system.

Based on the events, the policy system makes a matching of the appropriate actions to execute

based on the momentary conditions within the system. Then, the actions are enforced on the

running system adapting it dynamically to the new events. In this system, the policies are static

and have to be pre-installed, including events, conditions and actions.

In case of generation of new policies based on experience, the input is also based on the events

received from the running system. The result of the experience is new policies which are installed



in the policy system. From this, it results that the action of the experience system is a set of

policies. For being able to generate these policies, the experience system has to include a set of

“meta-conditions” which can be matched on certain events. A main issue to study and to

implement as a proof-of concept on the utility of the machine learning are these meta-conditions

as a means to determine an increase on the experience of the management system on the

specific local conditions. A basic implementation of such a system will be presented in the next

deliverable.



5. Visualization GUI

One of the main tools to understand and to appropriately assess the value of a machine learning

technique as well as to enable the administrator of the system to provide human knowledge

perspective where the techniques are limits is trough visualization. In this context, a new

visualization GUI was developed as part of the CogNet project which is extending the existing

Open5GCore GUI with the specific functionality for:

Visualization of batch data – until now, all the available GUIs related to NFV management

are able to visualize only the live data

Capability to scope on different data – extension of the GUI to give to the administrator

the perspective on the specific time intervals of the data with different granularity levels

Combination of external data and the monitored data – enabling the visualization on the

same time series graphics of data resulting from the ML algorithms (e.g. prediction) and

the active data

Specific metrics which enable to determine the state of the system e.g. colour base

changing in case the system is in an abnormal state

The Visualization GUI is meant for data visualization especially in the form of time series in this

current version, later being planned to be extended with statistics values. To make it efficient and

simple to use, a good selection of technologies and quite a large number of libraries are in use.

These components work together in the GUI to provide a nice visual interpretation of data. The

GUI is made interactive by displaying number of so called 'toasts' or pop-up messages to show

the current states of GUI. The Visualization GUI main components as shown in Figure 24 are

divided mainly into three sections.

1. Frontend: The Frontend interface is developed using AngularJS framework (version 1)

empowered by Bootstrap 4 for complex CSS Styling. AngularJS framework is a very

convenient tool to develop frontend of the dashboard. It is supported by large number of

MIT licensed libraries.

2. Backend: The Nodejs (version 6.x) framework is used in the backend to create an efficient

and asynchronous server. The npm module is used for package management. The backend is

supported by large number of MIT licensed modules, most notably async, socketio, mongoose

and tiny-worker. These modules/libraries help to make the backend server efficient. The

Frontend and Backend components communicate using socketio module. The messages are

exchanged based on events over TCP.

3. Database: An efficient MongoDB database is chosen to store and retrieve data. It is a NoSQL

database and uses JSON documents with formatted/non-formatted schema. The mongoose

module in the backend interacts with this database to store and retrieve data.



5.1. GUI Installation

The dashboard installation is quite easy and quick. The GUI can be cloned from gitlab:

# git clone https://gitlab.fokus.fraunhofer.de/phoenix/open5g-gui.git

The GUI comes with a script (prereq.sh). To install the modules, you just have to run that script. It

installs all the pre-requisite modules. It is well tested in Ubuntu version 14.04 and 16.04.

# ./prereq.sh

In the same folder, there is a “config.json” file. It has to be configured before running the backend

server. You need to configure the orchestrator parameters, zabbix server parameters, OpenFlow

parameters, BT parameters, LWM2M parameters.

The backend server can be run using the script “runServer.sh”.

# ./runServer.sh

After the backend server starts running, the frontend GUI is available at

https://ip_of_server:8000

The config.jsonconfiguration file description:

Figure 24 – Visualization GUI architecture



{

"appConfig": {

"httpServer": {

"hostname": ip of the GUI(default:"localhost"),

"port": port(default:8000),

"keys": {

"key": "keys/server.key",

"cert": "keys/server.crt"

}

}

},

"open5gParams": {

"name": "open5gParameters",

"dbConfig": {

"databaseName": "Open5G_GUI",

"ipAddr": ip of mongodb machine(default:"127.0.0.1"),

"port": listening port(default:27017),

"reconnectTries": number of reconnect tries upon failure (default:10),

"maxConnSize": number of connections(default:10),

"reconnectInterval": request reconnect if failed to connect(in msec)

},

"orchestrator":{

"ipAddr": ip address of orchestrator,

"port": port,

"username":"admin",

"password":"openbaton",

"oauthTokenPpath":"/oauth/token",

"userToken":"openbatonOSClient",

"passToken":"secret",

"security": if security feature is enabled(true or false),

"grantType": "password",

"requestTimeout": request reconnect if failed to connect(in msec)

},

"zabbixServer":{

"ipAddr": ip address of zabbix server,

"port" : port(default: 80),

"username" : username(default:"Admin"),

"password" : password(default:"zabbix"),

"defaultPath" : "/zabbix/api_jsonrpc.php",

"jsonrpcVersion" : "2.0",

"startFetch": true/false for metric fetch at startup,

"defaultInterval": interval for metric fetch(in msec),

"requestTimeout": request reconnect if failed to connect(in msec)

},

"dbMySqlConfigFlowmon":{

"hostIP": ip address of flowmondb machine,

"hostPort": port(default:3306),

"user": username for db,

"password": password for db,

"database": name of the database,

"connectionLimit": max number of connections(default: 100),

"debug": false,

"startFetch":true/false for data fetch at startup,

"queryInterval": request reconnect if failed to connect(in msec)

},

"observeMetrics":{

"add": [{“host”:name of host, “metricsList”:[array of metrics to be monitored]}],



"removeHosts":[array of hostnames]

},

"btConfig":{

"ipAddr": ip address of bt machine,

"port": listening port

},

"dbMySqlConfigLWM2MSrv":{

"hostIP": ip address of the lwm2m server,

"hostPort": port(default:3306),

"user": username for db,

"password": password for db,

"database": name of the database,

"connectionLimit": 100,

"debug": false,

"startFetch": true/false for data fetch at startup,

"queryInterval": request reconnect if failed to connect(in msec)

}

}

}

5.2. GUI Interactions

When the frontend is loaded in Google Chrome, the index page is loaded with a Menu on the left

side. On the top right, you can observe the list of partners' logos. In the Menu, you can see

options like Infrastructure, Load Static Topology and Dashboard Settings.

Infrastructure: If the backend config.json is properly configured for orchestrator, it should load

the Topology by processing the Network Service Record (NSR) obtained from the orchestrator.

This is Dynamic Topology.

Load Static Topology: In absence of orchestrator, it is possible to load static topologies. Once

you click it, a modal pop-ups and allows the user to load number of topology files. The static

topology file should follow a specific format which is described below. The topologies are

distinguished by different colours.

Dashboard Settings: It offers options to select the themes. Based on themes, it loads different

colours and partners' logos.

To load the Static Topology, click on “Load Static Topology”. The format of the static topology is

explained below.

[{

"name": name of the slice,

"hostCollection":[{

"host":name of the host,

"details":[{

"hostnames":[{

"hostname": hostname,

"id":unique hostname ID,

"floatingIps":array of floating IPs,



"metricFetch": true or false,

"keyword": some fixed keyword,

"ips":[{"netName": net name,"ipAddr":array of IPs} ]

}]

},

{"datacenterName":[name of datacenter]},

{"hostId":unique host ID},

{"relations":array of dependent hosts}

]

}]

}]

After loading the Static topology, it checks whether the topology is in proper JSON and in proper

order. You can also load multiple topologies by loading multiple files. If all the conditions are

met, it loads the topology/all topologies and displays as shown in Figure 25.

The list of VNFs are displayed. The colour distinguishes the slices. A slice contains the information

of VNFs belonging to the same datacentre. In the figure above, all the VNFs are green which

means they belong to same slice and same datacentre. If you click one of them, it loads the Slice

Visualisation, Time Series frame and Benchmarking Tool frame as shown in Figure 26.

Figure 25 - NFV Slice visualization



Slice Visualisation: In this section, you view all the VNFs connected to each other based on their

relations. You can move them around to organize. The positional structure is automatically stored

in the database so that later again if you load it, it displays the same structure. The two gauges

on each monitor the CPU and RAM usages. The metrics information are retrieved from Zabbix

Server using REST APIs by the GUI backend. Below you can see some options like show/hide flow

monitoring, show/hide relations, import/export CSV files. Few more options are associated with

each VNF. The Flow Monitoring traffic is also displayed if it is configured in the config file of the

backend and it is properly connected to the database that stores the flow information. The traffic

is displayed using dotted lines and red bubbles shows the traffic flow and its size determine the

traffic size. If you right click any of them, a menu appears as shown in Figure 27 for the dense

urban area testbed. It allows you to export/import CSV formatted files, view Time Series of its

metrics (particularly meant for Anomaly detection), Metrics List that contains all the metrics

extracted from Zabbix Server which you can select for monitoring the metrics, Show Monitored

Metrics displays list of metrics being monitored by the system, and the list of IPs and Interfaces

associated with it.

Time Series: When you select a VNF in Slice Visualisation, it starts displaying the Time-Series of

the selected metrics. If you click on the top in the frame, an expanded view appears where you

can have controls to input time range, interval duration as shown in Figure 28.

Benchmarking Tool: You can set number of parameters associated with it by moving the sliders

and click on 'start' to send the command to the BT machine running in some environment like

OpenStack, VMWare, physical machine which is accessible by the GUI backend (e.g. OpenStack).

An interactive display is shown in the form of pie diagram. The message being sent will be the

Figure 26 – Software Network Overview



UDP message. The IP address and port information of the BT machine has to be set in the

configuration file (config.json) before running the backend. It is illustrated in Figure 26.

Figure 27 – Slice Overview

Figure 28 – Time Series Visualization



The Figure 29 shows the time-series for Anomaly Detection. After the processed CSV data is

imported to the GUI, you can select the 'timeseries' option on host right click to open the

window. For the host selected, you can select a number of metrics from the list to load the time

series – Actual vs. Predicted and Anomaly Score Vs. Anomaly Label.

Figure 29 – Prediction and Anomaly Detection Visualization



6. Conclusions and Further Work

This deliverable includes a set of practical implementations in regard to the specific system where

the machine learning techniques can be applied and could provide positive results as well as the

development of the machine learning techniques based on the data acquired from the different

testbeds. Until now, the testbeds were mainly used to acquire data and to train the machine

learning algorithms. Because of this, a large insight was acquired on the specific behaviour of the

systems which was useful for the administrators to configure them and to make them more

reliable for the long duration measurements needed for accumulating the data required by

machine learning. In the following steps, this insight of the machine learning algorithms will be

used to make dynamic decisions.

The next release will close the gap between the testbeds and the machine learning mechanisms

by implementing the results of the machine learning algorithms in the form of mitigation actions,

through this proving the usability of the machine learning as a technology for enhancing the

network management in the SDN/NFV environment for increasing the network resilience and

security. This includes the further development of the security and of the reliability testbeds up to

comprehensive proof-of concept levels for enabling machine learning based network

management according to the development plans presented for each of them in their respective

sections.



Glossary, Acronyms and Definitions

5G 5th generation mobile networks

ACL Access Control List

ADE Anomaly Detection Ensemble

API Application Programming Interface

AUC Area Under Curve

BT Benchmarking tool

CD Continuous deployment

CI Continuous Integration

CSE CogNet Cognitive Smart Engine

CPU Compute Processing Unit

CSS Cascading Style Sheet

CSV Comma Separated Value

DDoS Distributed Denial of Service

DHCP Dynamic Host Configuration Protocol

DMZ De-Militarized Zone

DNS Domain Name System

DoS Denial of Service Attack

DSE Distributed Security Enablement

ER Engineering Release

FIFO First In, First Out Queuing

GBDT Gradient Boosted Decision Tree

GBM General Boosted Models

GTP GPRS Tunnelling System

GUI Graphical User Interface

KPI Key Parameter Indicators

ICMP Internet Control Message Protocol

IETF Internet Engineering Task Force

IoT Internet of Things

ISP Internet Service Provider

JAR Java Archive



KVM Kilo Virtual Machine

LSTM Long Short Term Memory

MAC Medium Access Control

MANO NFV Management & Orchestration

MAPE Monitor, Analyse, Plan, Execute autonomic process loop

ML Machine Learning

NAB Numenta Anomaly Benchmark

NAS Network Access Server

NFV Network Functions Virtualization

NFVM NFV Management

NFVO NFV Orchestrator

NF Network Function

NSD Network Service Descriptor

NTP Network Time Protocol

OPNFV Open Source Project for NFV

OSS Operations Support System

PDP Policy Decision Point

PEP Policy Enforcement Point

PoC Proof of Concept

RADIUS Remote Authentication Dial-In User Service

RAM Random Access Memory

REST Representational State Transfer

RF Random Forest

SPAM Unsolicited email

SDN Software Defined Networks

SFC Service Function Chain

SFP Service Function Path

SQL Structured Query Language

SP Service Providers

SPSS Statistical Package for the Social Sciences

SSH Secure Shell (connectivity)

SUT System Under Test



SVM Support Vector Machine

SYN Synchronize message to establish TCP connection

UE User Equipment

VIM Virtual Infrastructure Manager

VM Virtual Machine

VNF Virtual Network Function

XML Extended Mark-up Language



References

[1] http://www.etsi.org/deliver/etsi_gs/NFV-MAN/001_099/001/01.01.01_60/gs_nfv-

man001v010101p.pdf

[2] https://build.cognet.5g-ppp.eu:8080

[3] http://build.cognet.5g-ppp.eu:8080/view/wp5DSE/job/DSE_docker_instance/

[4] https://tools.ietf.org/html/draft-mm-wg-effect-encrypt-04

[5] www.kali.org

[6] https://www.thc.org/thc-hydra/

[7] Chandola, V.; Banerjee, A.; Kumar, V. (2009). "Anomaly detection: A survey". ACM

Computing Surveys]

[8] Teodora Sandra Buda, Haytham Assem, Lei Xu. "ADE: An ensemble approach for early

anomaly detection".15th IFIP/IEEE International Symposium on Integrated Network

Management (IM 2017). Mini-conference track.

http://www.etsi.org/deliver/etsi_gs/NFV-MAN/001_099/001/01.01.01_60/gs_nfv-man001v010101p.pdf

http://www.etsi.org/deliver/etsi_gs/NFV-MAN/001_099/001/01.01.01_60/gs_nfv-man001v010101p.pdf

https://build.cognet.5g-ppp.eu:8080/

http://build.cognet.5g-ppp.eu:8080/view/wp5DSE/job/DSE_docker_instance/

https://tools.ietf.org/html/draft-mm-wg-effect-encrypt-04

http://www.kali.org/

https://www.thc.org/thc-hydra/



Appendix A. Distributed Security

Enablement Testbed

A.1. API call to Create OpenFlow Firewall rule.

Script to post OpenFlow Firewall rule to drops packets on port 22 with source host ip address of

10.0.0.2.

curl -X PUT -d @L3port22FW -H "Content-Type: application/xml" -H "Accept: application/xml" --user

admin:admin http:// 162.13.119.228:8181/restconf/config/opendaylight-

inventory:nodes/node/openflow:1/table/0/flow/20

root@opendaylight:~/code# more L3port22FW

<?xml version="1.0" encoding="UTF-8" standalone="no"?>

<flow xmlns="urn:opendaylight:flow:inventory">

<strict>false</strict>

<instructions>

<instruction>

<order>0</order>

<apply-actions>

<action>

<order>0</order>

<drop-action/>

</action>

</apply-actions>

</instruction>

</instructions>

<table_id>0</table_id>

<id>20</id>

<cookie_mask>255</cookie_mask>

<installHw>false</installHw>

<match>

<ethernet-match>

<ethernet-type>

<type>2048</type>

</ethernet-type>

</ethernet-match>

<ipv4-source>10.0.0.2/32</ipv4-source>

</match>

<cookie>2</cookie>

<flow-name>IricentL3port22FW</flow-name>

<priority>200</priority>

<barrier>false</barrier>

<hard-timeout>200</hard-timeout>

<idle-timeout>400</idle-timeout>

</flow>



A.2. Sequence Diagrams

Distributed Security Enablement Platform

Distributed Security Enablement Gateway



Distributed Security Enablement Prediction Engine

Distributed Security Enablement Firewall Module

D5.2 Engineering Release 1 - CogNetD5.2 – Network Security and Resilience – Engineering Release...

Documents

Transcript of D5.2 Engineering Release 1 - CogNetD5.2 – Network Security and Resilience – Engineering Release...