D5.2 Engineering Release 1 - CogNetD5.2 – Network Security and Resilience – Engineering Release...
Transcript of D5.2 Engineering Release 1 - CogNetD5.2 – Network Security and Resilience – Engineering Release...
D5.2 – Engineering Release 1
(Secure NFV Subsystem, High Availability Framework,
Degradation Detection and Correction, Autonomic Rules
Generator and I/F, Network Resilience Framework)
Document Number D5.2
Status Completed
Work Package WP5
Deliverable Type Report
Date of Delivery 18.01.2017
Responsible Unit Fraunhofer
Editors Marius Corici (Fraunhofer)
Contributors
(alphabetic order)
Haytham Assem (IBM)
Daniele Bonadiman (UNITN)
Teodora Sandra Buda (IBM)
Eleonora Cau (Fraunhofer)
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 2 of 78
Marius Corici (Fraunhofer)
Fabrizio Granelli (UNITN)
Imen Grida Ben Yahya (Orange)
Iryna Haponchyk (UNITN)
Diego Lopez (TID)
Daniel-Ilie Gheorghe Pop (Fraunhofer)
Alessandro Moschitti (UNITN)
Antonio Pastor (TID)
Benjamin Reichel (TUB)
Ranjan Shrestha(TUB)
Mikhail Smirnov (Fraunhofer)
Kateryna Tymoshenko (UNITN)
Joe Tynan (WIT)
Lei Xu (IBM)
Reviewers
(alphabetic order)
Alberto Mozzo (UPM)
Bruno Ordozgoiti (UPM)
Martin Tolan (WIT)
Dissemination level PU
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 3 of 78
Change History
Version Date Status Editor (Unit) Description
0.1 01.11.2016 Draft Marius Corici
(Fraunhofer)
Provided initial template and deliverable structure
0.2 14.11.2016 Draft Marius Corici
(Fraunhofer)
Added the input from Telefonica, WIT and IBM
0.3 19.12.2016 Draft Marius Corici
(Fraunhofer)
Added the input from TUB and FOKUS
0.4 26.12.2016 Draft Marius Corici
(Fraunhofer)
Added the first description of the mitigation actions
taxonomy
0.5 27.12.2016 Draft Marius Corici
(Fraunhofer)
Modified the security tested and added the input from
UNITN
0.6 04.01.2017 Draft Marius Corici
(Fraunhofer)
Added introduction and conclusions
0.7 09.01.2017 Draft Marius Corici
(Fraunhofer)
Responded to the initial internal review
0.8 10.01.2017 Draft Marius Corici
(Fraunhofer)
Double-checked the relationship between the
repository links and the deliverable
0.9 13.01.2017 Draft Marius Corici
(Fraunhofer)
Added acronyms list
0.10 14.01.2017 Draft Marius Corici
(Fraunhofer)
Merged the responses to the second internal review
0.11 16.01.2017 Draft Marius Corici
(Fraunhofer)
Fixes in page alignment, font types, figure references.
0.12 17.01.2017 Draft Marius Corici
(Fraunhofer)
Double-checked links, editorial repairs
1.0 18.01.2017 Final Marius Corici
(Fraunhofer)
Cleaned up comments and changes. Verification of
links and references. Formatted for an error free pdf
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 4 of 78
Executive Summary
This deliverable represents the first report on the implementation within
WP5 of CogNet. The deliverable reports two types of activities which are
still to be fully integrated:
the development of testbeds which enable the acquisition of the
appropriate amount of data to be able to train and later to test the
machine learning algorithms as well as to prove the feasibility of the
different mechanisms
the development of machine learning mechanisms which use the
acquired data for giving insight into specific management related
features
As this is an implementation companion deliverable, it includes a short
description of the various components developed and integrated into the
testbeds as well as the means to provide the data towards the machine
learning component, providing a best practice engineering of how
cognitive components can be integrated into the NFV environment (in
which network components are deployed as software only on the top of a
common infrastructure, as described in D5.1).
Although rather varied in format, due to the need to acquire different
types of data and due to the variation of the possible mitigation actions,
the testbeds follow the architecture described within the WP2 and applied
in the D5.1 for resilience and security related features. Additionally, the
machine learning algorithms considered are either well known algorithms,
proving that machine learning makes sense for network management or
directly derived from the WP3 algorithms and applied in the specific
context proving that machine learning techniques advancements are
providing additional benefit to network management.
The deliverable includes three testbeds used for the security related area
and one testbed for the resilience and the initial machine learning
techniques developed. Additionally, the deliverable includes a set of
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 5 of 78
taxonomy related considerations on the mitigation actions and necessary
for the later implementation within the testbeds.
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 6 of 78
Table of Contents
1. Introduction....................................................................................................................... 9
1.1. Motivation, Objective and Scope ............................................................................................................. 9
2. Distributed Security Enablement Testbeds .................................................................. 10
2.1. Distributed Security Enablement Testbed .......................................................................................... 10
2.1.1. Scope ....................................................................................................................................................... 10
2.1.2. Architecture .......................................................................................................................................... 10
2.1.3. Actual items implemented in CogNet ........................................................................................ 21
2.1.4. ML solution implementation .......................................................................................................... 21
2.1.5. Expected experimentation results ................................................................................................ 21
2.1.6. Roadmap of the testbed .................................................................................................................. 22
2.1.7. User Manual ......................................................................................................................................... 22
2.2. Honey net Testbed ...................................................................................................................................... 25
2.2.1. Scope of the testbed ......................................................................................................................... 25
2.2.2. Architecture of the testbed ............................................................................................................ 25
2.2.3. Actual items implemented in CogNet ........................................................................................ 27
2.2.4. ML solution implementation .......................................................................................................... 27
2.2.5. Expected experimentation results ................................................................................................ 27
2.2.6. Roadmap of the testbed .................................................................................................................. 28
2.3. NFV Security Anomaly Detection Testbed ......................................................................................... 28
2.3.1. Scope of the testbed ......................................................................................................................... 28
2.3.2. Architecture of the testbed ............................................................................................................ 29
2.3.3. Actual items implemented in CogNet ........................................................................................ 30
2.3.4. Expected experimentation results ................................................................................................ 30
2.3.5. Roadmap of the testbed .................................................................................................................. 30
2.3.6. User Manual ......................................................................................................................................... 31
2.4. Network traffic classification.................................................................................................................... 33
2.4.1. Architecture .......................................................................................................................................... 33
2.4.2. Download and Installation .............................................................................................................. 34
2.4.3. Deployment .......................................................................................................................................... 35
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 7 of 78
3. Dense urban area testbed .............................................................................................. 36
3.1. Testbed Description .................................................................................................................................... 36
3.1.1. Scope of the testbed ......................................................................................................................... 36
3.1.2. Architecture of the testbed ............................................................................................................ 36
3.1.3. Actual items implemented in CogNet ........................................................................................ 47
3.2. OpenSourceMANO OpenVIM and OpenBaton Integration ........................................................ 48
3.3. Anomaly detection ...................................................................................................................................... 51
3.3.1. Download and Installation .............................................................................................................. 52
3.3.2. Deployment .......................................................................................................................................... 53
3.3.3. Initial Results ........................................................................................................................................ 56
3.3.4. Development status .......................................................................................................................... 56
4. Taxonomy of Mitigation Actions .................................................................................. 57
4.1. SDN/NFV specific mitigation actions ................................................................................................... 57
4.2. Roles of the Cognitive System ................................................................................................................ 59
4.3. Development of System Experience ..................................................................................................... 61
5. Visualization GUI ............................................................................................................. 63
5.1. GUI Installation .............................................................................................................................................. 64
5.2. GUI Interactions ............................................................................................................................................ 66
6. Conclusions and Further Work ...................................................................................... 71
Glossary, Acronyms and Definitions .................................................................................... 72
References ............................................................................................................................... 75
Appendix A. Distributed Security Enablement Testbed ................................................. 76
A.1. API call to Create OpenFlow Firewall rule. ......................................................................................... 76
A.2. Sequence Diagrams .................................................................................................................................... 77
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 8 of 78
List of Figures:
Figure 1 – Distributed Security Enablement Testbed ........................................................................................ 10
Figure 2 - DES modules hosted on Docker framework .................................................................................... 14
Figure 3 - DSE information flow ................................................................................................................................ 16
Figure 4 - DSE Gateway................................................................................................................................................. 18
Figure 5 - DSE (L)CSE Prediction................................................................................................................................ 19
Figure 6 - DSE Firewall Engine .................................................................................................................................... 20
Figure 7 - CogNet common infrastructure deploy dash board .................................................................... 22
Figure 8 - CogNet DSE build and deploy dash board ...................................................................................... 23
Figure 9- Mouseworld scenario to replicate security attack traffic patterns ............................................ 26
Figure 10 – SDN integration in OpenStack ........................................................................................................... 29
Figure 11 Model architecture for network traffic classification ..................................................................... 33
Figure 12 – Dense Urban Area Testbed .................................................................................................................. 37
Figure 13 – Zabbix Active Check (Trapping) ......................................................................................................... 41
Figure 14 - Zabbix Passive Check (Polling) ............................................................................................................ 42
Figure 15 - Basic Zabbix Server and its relations with other entities .......................................................... 42
Figure 16 – OpenVIM – OpenBaton Integration Architecture ....................................................................... 50
Figure 17 Anomaly Detection Ensemble (ADE) approach for early anomaly detection. ..................... 51
Figure 18 Increasing AUC with number of rounds. ............................................................................................ 54
Figure 19 Decreasing AUC with number of rounds. ......................................................................................... 54
Figure 20 Predictions for different variations of ADE strategies utilizing xgboost. ............................... 55
Figure 21 – Policy Decision Model ........................................................................................................................... 60
Figure 22 – Policy Decision Model with Cognitive System ............................................................................. 60
Figure 23 – Experience Control Loop ...................................................................................................................... 61
Figure 24 – Visualization GUI architecture............................................................................................................. 64
Figure 25 - NFV Slice visualization............................................................................................................................ 67
Figure 26 – Software Network Overview ................................................................................................................ 68
Figure 27 – Slice Overview ........................................................................................................................................... 69
Figure 28 – Time Series Visualization ...................................................................................................................... 69
Figure 29 – Prediction and Anomaly Detection Visualization ........................................................................ 70
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 9 of 78
1. Introduction
1.1. Motivation, Objective and Scope
To be able to prove the feasibility of the machine learning techniques for the resilience and the
security of the network, there is a need to acquire the appropriate type of data as well as to
target the results of the algorithms towards the specific mitigation actions.
In this deliverable a set of testbeds which enable the acquisition of the data as well as initial
versions of some of the machine learning algorithms as adopted from the literature and from
WP3 are presented targeting such optimizations of the network management system.
Testbed/Component
name
Scenario(s) Main
Language
Description
Distributed Security
Enablement Testbed
Security
Enablement
Java Full SDN system for threat
detection at data plane level
Honey Net Testbed Security
Enablement
Java External/Public acquired security
attacks
Security Anomaly
Detection
Security
Enablement
C ML-firewall at NFV infrastructure
level
Network Traffic
Classification
Security
Enablement
Python Network Traffic Classification
Model
Dense Urban Area
Testbed
Dense Urban
Area Testbed
C and Java Full NFV system
Anomaly Detection Dense Urban
Area Testbed
R LSTM based anomaly detection
algorithm
Visualization GUI All JavaScript Providing customized
visualization for ML-based
management in NFV
Table 1 List of testbeds and components described in this deliverable
Additionally a set of considerations on the possible mitigation actions for the security and
resilience were added to this deliverable as an introduction to the implementation which will
follow in the next engineering release.
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 10 of 78
2. Distributed Security Enablement
Testbeds
2.1. Distributed Security Enablement Testbed
2.1.1. Scope
The following sections describe the Distributed Security Enablement (DSE) testbed used for the
creation and ongoing maintenance of security based Service Function Chains (SFCs) that typically
resides at a Service Providers (SP) edge network. It also describes how Machine Learning (ML)
shall be considered to assist the detection of threats in a tenant’s data plane. It includes
architectural and functional concepts, principles and components used in the construction of
composite security zone services through a deployment of SFCs, which then can be considered as
a proposed solution to track and respond to flood based security threats in a 5G multitenant
network.
2.1.2. Architecture
2.1.2.1 Functional Architecture
The foundational architectural DSE concepts of a security zone and machine learning architecture
are characterised as follows:
Figure 1 – Distributed Security Enablement Testbed
Security Zone – A security zone is a collection of tenant based
network segments that share security requirements. The security zone
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 11 of 78
contains Network Functions (NFs) that include services such as a layer
3 probe service that records an IP header in a tenant’s flow, access
control (802.1X protocols) service that authorizes a tenant’s end device
access to the network through the Network Access Server (NAS),
usually based on RADIUS. It also contains an IP services component
that adds the possibility to redirect a flow, via a static route, to
applications outside of the service chain, for example to a DMZ or
quarantine service to further analyze the suspect flow packet content.
The NAS service also adds value by contributing to the machine
learning dataset in the form of RADIUS records. All security based
network functions are connected via an OpenFlow enabled switched
fabric.
Security Service Function Chain - The security zone service chain
deployment and initial orchestration are constructed by NFV
Management & Orchestration (MANO) namely NFV Orchestrator
(NFVO) and VNF Manager (VNFM) [1]. The service chain, illustrated in
Figure 1, contains multiple steps that implement different security
services on tenant’s traffic. For instance:
Network Access Server. End device authentication and accounting
logs.
Layer 2 Firewall. OpenFlow’s match and action rules (OpenVSwitch).
Layer 3 Firewall. Access Control List, logging (ACL from pfSense).
IP services. IP forwarding, flow quarantine, etc.
Probe. sFlow and NetFlow IP header samples.
Service Function Paths. A security Service Function Path (SFP) is a
mechanism used where a tenant’s data plane flow can be orchestrated
to switch traffic to different parts of the service chain, there by
applying distinct security policies to each flow path. For example, the
default path through the security zone might initially have no access
control list applied to its flow path, but it is necessary for the flow to
transverse the probing service for the gathering of monitoring
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 12 of 78
statistics. Another instance is where a tenant possibly has a default set
of destination IP addresses blacklisted, resulting in the tenant’s flow to
include a layer 2 firewall service in its service function path.
Probes and Log storage. At this stage in the service chain probes
monitor the data plane via sFlow or NetFlow methods, producing
sample statistics that will be presented to machine learning methods
operating on both streamed and batch platforms. The processing of
these statistics will be discussed later in the machine learning section.
Validation. In order to produce a more accurate threat prediction the
architecture includes an external validation component. Where by the
locally calculated threat prediction score can also be weighted to
include queries from external black list providers.
Machine Learning. Here a range of algorithms will predict security
based threats. The primary focus is to determine when a tenants traffic
can be classed as a flood based attack, namely DDoS attacks that
include SYN (both vertical and horizontal attacks), SPAM, ICMP and
DNS attacks. The architecture permits for algorithms to be executed
from batch and near real time analytics, thus allowing predictions to
be rendered from historical stored datasets as well as live streamed
datasets.
Security Zone Orchestration. The security zone’s service chain will
accept instructions from the security orchestrator. The instruction
includes the update of path flows, management of both layer 2 and
layer 3 access control lists and the authorization of an end device to
the network. The actuation part for the DSE is summarized in the later
section on Data Plane Actuation.
Object Storage. During the life cycle of a machine learning method
the current state can be serialized and stored as a snapshot for
subsequent functionality. These snapshots can be used to align the
machine learning method to a known state.
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 13 of 78
Continuous Integration. Not depicted in Figure 1 although of
significant importance, is the functional approach to Continuous
Integration (CI) and Continuous Deployment (CD) during the
development of the software release and integration cycles of the
DSEs modules. The CI component provides a mechanism to describe,
store, locate and orchestrate the testing of the DSE modules, whilst
also providing a method to automatically deploy into a production
environment during an automated build cycle.
Data plane Actuation. The DSE’s service chain has a number of flow
renders depending on the anomaly detected and firewall policy in
place.
Network Access Server. Disable/enable users network access.
Layer 2 Firewall. OpenFlow’s match and action firewall rule.
Layer 3 Firewall. Access Control List (ACL) pfSense.
IP services. IP forwarding, direct flow to remote quarantine.
2.1.2.2 DSE service state machine considerations
There are two channels available to the DSE module to record state in a snapshot. Firstly, is to
take a snapshot of the container the ML method is hosted on and secondly, is to take a snapshot
of the machine learning object at different stages. For example, DSE snapshots are milestones at
the untrained state, a trained state, events in production, etc.
DSE Docker container snapshot
docker commit -p 5842907ba04a DES_engine$buildNo
docker load -i /root/DES_engine$buildNo.tar
Machine learning information storage
Incorporated in the DSE Prediction method is the ability to export to storage a serialized machine
learning object. It uses a combination of build number and time stamp to identify and store the
object. The serialized stored object could be imported at a later stage for continuous usage,
negating the costly requirement to retrain the Machine learning method.
public class storage {
private String mongoDB_URL = "mongodb.cognet.5g-ppp.eu";
privateintmongoDB_port = 27017;
private String mongoDB_DB_name = "CSE_DSE";
private String build_no = "0";
private DB db;
privateMongoClient mongo;
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 14 of 78
public void write() {
System.out.println("storage ML object write!");
DBCollection table = db.getCollection(build_no);
table = db.getCollection(dse_ML_object.get_build_no());
table.save(dse_ML_object);
System.out.println("storage ML object complete!");
}
}
2.1.2.3 DSE running instances (Docker containers).
Currently the three DSE executables resides on three Docker containers,
1. DSE Gateway
2. DSE LCSE Prediction
3. DSE Firewall Engine
These three software module reside as the machine learning and security & NFVI orchestration
on the Distributed Security Enablement Testbed Figure 1.
Figure 2 - DES modules hosted on Docker framework
The Docker containers are built automatically by interpreting instructions contained in a Docker
file template. Shown in the latter section is the DSE Docker file that contains all the commands
that aid the creation of the DSE base image. It brings the container to a package level that can be
used in all of the DSE Docker based containers. Linux binary packages in the base image include
Oracle Java 1.8, Maven 3.3 and Git.
vi~/DCSE/Dockerfile
FROM ubuntu:14.04
MAINTAINER Joe Tynan WIT <[email protected]>
RUN apt-get update
RUN apt-get install software-properties-common -y
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 15 of 78
RUN add-apt-repository ppa:webupd8team/java -y
RUN apt-get update
RUN echodebconf shared/accepted-oracle-license-v1-1 select true |debconf-set-selections
RUN apt-get install oracle-java8-installer -y
RUN apt-get install oracle-java8-set-default
RUN apt-get install maven -y
RUN apt-get install git -y
The API call to deploy the DSE base image to Docker server: ($docker is a CI environment
variable)
curl -v -X POST -H "Content-Type:application/tar" --data-binary '@Dockerfile.tar.gz'
http://$docker:4243/build?t=wp5dse
To execute a DSE service on a Docker container: ($docker is a CI environment variable) the
following is issued each time DSE relevant code is checked into the CogNet source code control
repository.
curl -X POST -H "Content-Type: application/json"
http://$docker:4243/containers/create?name=dseML_container -d '
{
"Name": "wp5dseMLcontainer1",
"AttachStdin": "false",
"AttachStdout": "false",
"AttachStderr": "false",
"Tty": "false",
"OpenStdin": "false",
"StdinOnce": "false",
"Cmd":["/bin/bash", "-c", "echo Starting DSE;git clone
https://CogNet5GPPP:[email protected]/CogNet-5GPPP/WP5-DSE-.git;cd WP5-DSE-
/code$;mvn clean install;java -cp "target/dse-1.0-SNAPSHOT.jar:lib/*" eu.cognet.lcse.ml.dse.App
;echo Stopping"],
"Image": "wp5dse:latest",
"DisableNetwork": "false"
}
'
2.1.2.4 Information Flows and Functional Description.
As part of an autonomic process we have adapted a Monitor, Analyse, Plan and Execute (MAPE)
loop to highlight how information flows through the Distributed Security Enablement task. The
red line in Figure 3 highlights the path that the DSE dataset information will take to traverse the
CogNet Common Infrastructure. The DSE framework sequence diagram is located in Appendix
A.
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 16 of 78
Figure 3 - DSE information flow
Monitor (sFlow + NetFlow probes and the DSE probe Gateway) Active
monitoring on the tenant data plane is delivered by OpenVSwitch
probes and the DSE Gateway. The DSE Gateway disassembles probed
packets and places them on the appropriate Denial of Service (DoS)
tenant Kafka queue for the next phase in the process. The probing
service is an element of the service chain and is labelled as the L3
probing service in Figure 1. The DSE preferred monitoring protocol is
sFlow as this has a minimal impact on the performance of the probing
switch, but has the drawback that it is a sample of the data planes
traffic and not a one to one flow sample. The following command set
enables probe services on OpenVSwitch:
-bash-3.00$ sudoovs-vsctl add-brDSEbr # create bridge
-bash-3.00$ sudoovs-vsctl add-port DSEbr enp1s0f0 # add interface
-bash-3.00$ sudoovs-vsctl add-port DSEbr enp1s0f1 # add interface
-bash-3.00$ sudoovs-vsctl set-controller DSEbr tcp:162.13.119.228:6633 # add OF controller
interface
-bash-3.00$ sudoovs-vsctl -- --id=@sflow1 create sFlow agent=enp2s0
target=\"192.168.1.100:6343\" header=128 sampling=64 polling=10 -- set Bridge
DSEbrsflow=@sflow1 # enable monitoring interface
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 17 of 78
Analyze (DSE LCSE Prediction) The DSE prediction Engine has three
approaches to creating an anomaly score forecast. These include
making a prediction on the rate of change over a time series, the
second is to use decision trees that classify samples extracted from the
ring buffer, then applying a Random Forest method to create a threat
prediction and the final is to use a machine learning method from the
service catalogue described in D2.2.Samples are collected via a Kafka
queue from the relevant topic, then placed into a ring buffer. The ring
buffer provides a mechanism to store samples with a short lifespan in
a time series attribute. A full description on the prediction component
is discussed in the proceeding sections. DSE LCSE prediction engine is
deployed as a Docker container and is represented in Figure 1 as ML
and Log Storage component.
Plan (DSE Firewall Engine) The DSE firewall Engine will implement the
corresponding actuation on predicted malicious data plane flows:
drop, log, forward, quarantine, ignore.
Execute (OpenFlow firewall + iptables (pfSense) + RADIUS
Authentication + IP forwarding)
Actuation in the Service chain appears in four zones.
1. First is where the user/device access is disabled by the RADIUS
configuration database, it in turn actuates the user session
instance at the NAS server (Figure 1) in the service chain.
sed -i -e 's/device1 Cleartext-Password := "password"/#device1 Cleartext-Password :=
"password"/g' /etc/freeradius/users.conf
2. OpenFlow ACL rule via an OpenDaylight API call. The actuation is
executed by a L2 firewall service (Figure 1) as a Match action
(drop) rule. The API call is demonstrated in Annex B: OpenFlow
Firewall rule section.
3. Flow can also be directed to an open source firewall
implementation (pfSense). This allows the service chain to also
firewall IPv6 traffic in the data plane. Depicted in Figure 1as a L3
firewall service.
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 18 of 78
4. IP forwarding (Figure 1) service will allow a flow to terminate and
enter the IP routing realm. The flow could be routed to a DMZ or
quarantine for future inspection via a static route.
sudo route add -net 192.168.0.2/32 gw 192.168.1.1 netmask 255.255.0.0
2.1.2.5 Software modules currently under development.
1. DSE Gateway -The software module DSE Gateway comprises of a DSE
property file, probe listening port services and Kafka queue producer. The
DSE Gateway sequence diagram is located at Appendix A: Distributed
Security Enablement Testbed.
Figure 4 - DSE Gateway
DSE property file:
The DSE Property file defines what data fields need to be extrapolated from the probes
protocol payload, the order of tuples and its corresponding probe topic that these
samples are to be posted to. The current DSE property.prop contains the following:
#feature#kafkatopic#sampleproto#sip#dip#sport#dport#tcpproto#pktlen
DoS_dns,DSE_FLOOD_DNS_Q,sflow,sip,dip,dport,tcpproto
DoS_hor,DSE_FLOOD_SYNC_HOR_Q,sflow,sip,dip,dport
DoS_ver,DSE_FLOOD_SYNC_VER_Q,sflow,sip,dip,dport,tcpproto
DoS_spam,DSE_SPAM_SYNC_VER_Q,sflow,sip,dip,dport,tcpproto
DoS_icmp,DSE_ICMP,sflow,sip,dip,dport,tcpproto
DoS_amp,DSE_AMP,sflow,dip,dport,tcpproto,pktlen
Sample Probe port:
The DSE Gateway module includes a sample probe component that listens on a port for
tenant data plane samples. For instance, probe message encapsulated by sFlow messages
would arrive on port 6343, NetFlow probe packets would arrive on port 2055.
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 19 of 78
DoS Dataset per topic producer:
As defined in the DSE property.prop property file the Kafka producer will send samples to
queues based on topic and tuple parameters. These samples are forwarded for analyses
to the second stage of the information flow namely the DSE LCSE Prediction component.
2. DSE LCSE Prediction
The individual software components that complete the DSE LCSE Prediction module
include the Timer, Condition Samples, Empty buffer, DSE Statistic Consumer, Score
Prediction, Score Producer (Figure 5).The module consumes data statistics from the DSE
gateway that in turn produces a security score prediction. The Prediction Engine
sequence diagram is located at Appendix C: DSE Prediction Sequence Diagram
Read
Write
T+1
T+2T+3
T+N
DSE Satistic Consumer
T
DSE Score Prediction
N-grams Circular Buffer
Sample Time = T
DSE Score Procucer
Condition Samples
Empty Buffer
T+4
T+N
Timer
Figure 5 - DSE (L)CSE Prediction
Timer: The ring buffers timer thread allows the incoming statistics samples to have a time
component. The timer thread arranges which slot in the ring buffer is active for
writing/reading and select which slot is to be emptied.
Condition Samples: This allows the system to rule out low frequency sample counts from
the threat score prediction calculations.
Empty buffer: This method nullifies all the statistics in the ultimate slot of the ring buffer,
producing a clean slot for recording in the next recording interval.
DSE Statistic Consumer: The consumer software component listens for incoming
statistics from a Kafka queue. It accepts traffic on Kafka topics that are defined at
initiation time.
DSE Score Prediction: The Score Prediction software component utilizes samples
statistics recorded in the ring buffer slot to make a threat prediction. It currently has two
modes implemented (1) a simple rate change prediction and (2) a prediction based on
machine learning algorithm, namely Random Forest.
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 20 of 78
DSE Score Producer: The Producer software component publishes threat scores on the
corresponding Kafka Score queue topic.
Thread Safe: Writing and reading to the ring buffer is implemented with thread safety in
mind. Both read and write methods are synchronized.
public synchronized Integer read_circularBuff(intslot_number,String key ){
returnstatList.get(slot_number).get(key);
}
public synchronized void wirte_circularBuff(String key){
if (statList.get(0).containsKey(key)){
Integer read_value = (Integer) statList.get(0).get(key);
read_value++;
statList.get(0).put(key,read_value);
}
else {
statList.get(0).put(key,1);
}
}
3. DSE Firewall Module
Individual software components include Black list provider, DSE Firewall property, DSE
consumer, RADIUS authentication, OpenDaylight API and pfSense API. The Firewall
module sequence diagram is portrayed in Appendix A.
Figure 6 - DSE Firewall Engine
Black list provider: The Blacklist provider component gives an external third party threat
validation to the system and can also be used to add weight to the overall predicted
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 21 of 78
threat score. The service provider also host internal blacklists as part of multitenant
security policy.
DSE Firewall property: Provides information on the location of services specific to the
DSE firewall component.
RADIUS authentication: The RADIUS authentication component controls end device and
users access to the tenant’s network (written in Ansible).
OpenDaylight API: The OpenDaylight API component here issues OpenFlow firewall
instructions to the DSE service chain.
pfSense API: The pfSense API component here issues L3 firewall instructions to the DSE
service chain.
2.1.3. Actual items implemented in CogNet
To date DSE software components that are implemented include a Jenkins build environment,
with DSE jobs defined and two data plane probes based on OpenVSwitch. Also implemented are
three software modules,
DSE Gateway component that parses the probe protocol.
DSE Prediction component that contains the ring buffer and first
iteration of the Random Forest machine learning method and
DSE Firewall Engine component that contains methods that listen for
threat scores and issues firewall commands to the DSE service chain.
The afore-mentioned software components are all executable and hosted on Docker containers
on the CogNet common infrastructure. Also implemented is an instance of OpNFV (B release)
framework with public IP address space.
2.1.4. ML solution implementation
The DSE prediction module is currently based on a Random Forest machine learning method.
The machine learning model is currently being parameterized and integrated so that the
behaviour is tuned to match DSE requirements.
2.1.5. Expected experimentation results
Because of the class and predictability of IoT traffic patterns and the nature of attacks under
investigation, it is envisaged that the accuracy of predicting a DoS based attack can be between
80 and 90 percent. The outcome will be to deliver an efficient use of a Service Providers
bandwidth and network resources. The traffic pattern under investigation is sourced from IoT
networks, this makes the type of traffic less varied and more predictable, for example the traffic
pattern would include a possible DNS lookup request and a phone home post at a regular time
interval.
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 22 of 78
2.1.6. Roadmap of the testbed
At the second stage of the DSE development we plan to finalize the ML methods, then deploy
the DSE onto a NFV based service chain hosted on an OpNFV based infrastructure, and there we
will commission as part of the testbed a Kali threat server, which will include a MHN threat
honeypot monitoring service. Also we plan to instantiate the 802.1X protocol via RADIUS and
NAS services. Finally, we plan to integrate and evaluate the machine learning methods with the
CogNet common infrastructure.
2.1.7. User Manual
The current user manual reference comprises of 1) automated install of the testbed from the CI
server and 2) the automated install and usage of the DSE CI jobs and 3) how the DSE modules
can be launched from the command prompt with a list of their corresponding arguments.
At the centre of the DSE is the continuous build server [2], defined here are the compiling, testing
and roll out of the DSE software components. Also defined [3] is the DSE Docker job that
provides the resources to build and deploy the associated DSE Docker base image. All
subsequent DSE build jobs are dependent on a successful outcome of the Docker build instance
job [3].
To install the testbed infrastructure on Jenkins:
Figure 7 - CogNet common infrastructure deploy dash board
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 23 of 78
To install the DSE software components onto the infrastructure via Jenkins CI
Figure 8 - CogNet DSE build and deploy dash board
The DSE gateway has the following external variables that include Kafka server attributes, sample
protocol type, location and property file location. To execute DSE gateway from the command
line
$ java -jar target\featureExtDSE-1.0-SNAPSHOT.jar -h
-sp : Server listening Port
-sh : Server listening interface
-st :sample_Type (sflow \ netflow \ ipfix)
-kp : Kafka port
-kh : Kafka host
-p : Property file URL
$ java -jar target\featureExtDSE-1.0-SNAPSHOT.jar -sp 6343 -sh 0.0.0.0 -stsflow -kp 9092 -kh
162.13.119.237 -p DSEproperty.prop
The DSE LCSE Prediction has the following external variables that include Kafka server attributes,
sample protocol type, location and build no, Kafka production and training consumer topics. To
execute DES LCSE Prediction module from the command line:
$ java -jar target\ngramDSE-1.0-SNAPSHOT -h
-b: build no
-kh: kafka Broker Server IP
-kp: kafkaBrokerServerPort
-nc: number Consumers
-tt: training topic
-to: online topic
-m: MongDB IP
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 24 of 78
-mp: MongDB Port
To create kafka training topic "bin/kafka-topics.sh --create --topic trainingDSE --replication-factor 1 --partitions
3 --zookeeper 127.0.0.1:2181"
To create kafka online topic "bin/kafka-topics.sh --create --topic onlineDSE --replication-factor 1 --partitions
3 --zookeeper 127.0.0.1:2181"
The Firewall Engine has the following external variables that include Kafka server attributes,
locations for elements in the Service chain OpenDaylight IP, L3 firewall IP, RADIUS IP. To execute
Firewall Engine from the command line:
$ java -jar target\ofFirewallDSE-1.0-SNAPSHOT.jar -h
-o :OpenDayLight IP
-l : L3 firewall IP
-n : RADIUS IP
-t : topic
-g :groupID
-p :kafka_Port
-k :kafka_ServerIP
To create kafka online topic "bin/kafka-topics.sh --create --topic ScoreDSE --replication-factor 1 --partitions
3 --zookeeper 127.0.0.1:2181"
$ java -jar target\ofFirewallDSE-1.0-SNAPSHOT.jar -o 162.13.119.228 -l 162.13.119.222 -t DSE_firewall -p
4643 -k 162.13.119.237
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 25 of 78
2.2. Honey net Testbed
2.2.1. Scope of the testbed
The NFV and SDN technologies, as an essential part of the 5G networks, have the ambition to
offer a security capacity. One of the key goals of the new architectures is to eradicate or reduce
to acceptable values as much as possible unwanted (or illegal) traffic from the data plane thanks
to SDN dynamic forwarding graph and dynamic security VNFs deployment or scale out/in, such
as firewall VNFs deployments or traffic redirection. Applying ML technologies to identify and
solve the problem is detailed in D5.1 as part of the Distributed Security Enablement use case.
In order to apply these solutions, a clear traffic pattern must be identified. In some cases, these
patterns are clear, e.g. a DDoS attack based on volumetric traffic, but in others it is extremely
difficult (if not impossible),mainly because there is an accepted tendency in internet for pervasive
or opportunistic encryption. This situation is limiting the capability to interact with the network to
solve security incidents or attack. This problem has been identified in the research literature and
by standardisation bodies, and it is thoroughly analysed in a recent document of the IETF [4].
The scope of this testbed is to identify and classify some security attacks patterns in the data
plane, especially related with encrypted traffic. There is an expectation to be able to classify
different types of attack traffic after a training process over ML algorithms, inspecting data
packets from Layer 2 to layer 4, avoiding payload analysis (encrypted or not), and thus, improving
the privacy.
In order to achieve this objective, this testbed is setup on the Telefónica’s CogNet Mouseworld,
described in CogNet deliverable D4.1. This lab environment has the capability of replicating
different types of realistic network traffic in a fully controlled environment, where ML algorithms
can be trained and tested.
2.2.2. Architecture of the testbed
The architecture of the testbed is based on the general architecture of the Mouseworld
particularized for this scenario. Figure 9 shows how the traffic replication is based in hacking
tools, such as Kali Linux distribution [5].
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 26 of 78
Figure 9- Mouseworld scenario to replicate security attack traffic patterns
The testbed is composed of several clients generating attack traffic and some servers running
vulnerable services.
The initial flows identified in this stage includes the generation of different types of traffic from
clients to servers:
Brute force attack and session establishment to SSH servers. Brute force tools like Hydra
[6] allows us to replicate multiple access attempts to a server. Also, successfully access
and malware download and command executions can be replicated. The traffic generated
uses different cypher-suites to evaluate the ML algorithms independence of the payload.
The servers are based on the Kippo honeypot. This honeypot is not only able to accept
SSH session establishment, but also a shell emulation environment, generating an audit
log of all the commands generated, such as URL files download.
Web application vulnerabilities attacks. Some well-known attacks, such as SQL injection
or remote file inclusion. The transport protocol in this case is HTTP or HTTPS. The server is
based on the Glastopf honeypot. Glastopf is a python web application honeypot offering
different web server vulnerability types. Glastopf has the capability to deal with known
and unknown attack from several types, generating dynamically answers adapted to the
type of vulnerability, also known as “dorks” that are strings in the answer that
vulnerabilities trigger.
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 27 of 78
One key aspect in order to test and validate the algorithms and identify their performance is to
include in the training process the following conditions:
Normal traffic, non-related with security attacks. In general terms, the Mouseworld can
replicate multiple types of traffics, like web browsing, video streaming, or network
troubleshooting. Some of them replicated in a closed environment, other generated by
real access to the Internet, such as Internet-wide DNS queries or access-speed tests and
realistic browser request will be generated to produce fully real traffic. All this traffic is
mixed in the Tstat probe traffic captured with the attacks.
Real attacks. During the last phase of the testbed the ML algorithms will be tested against
real Internet traffic, obtained “in the wild” from a real Telefonica’s Honeynet, to validate
the detection capacity.
All the traffic captured is stored locally in the testbed servers. There are no private or personal
data (e.g. identities) collected in these traffic dumps. Traffic is replicated, not real. For example, all
IP address space used belongs to a private, non-routable network (RFC1918), and the traffic is
encrypted based on temporary keys.
2.2.3. Actual items implemented in CogNet
The current status of the testbed includes the Mouseworld lab with the capture and process
capacity of the traffic data.
Also, currently there is available for data testing a real Honeynet deployed in the Telefonica
network that uses the same software that it is being used in the Mouseworld for traffic
replication. This type of networks has a key advantage from the point of view of privacy.
Honeynets collects illegal traffic and non-requested accesses, what limits to a great extent the
applicable requirements on privacy preservation and personal data management.
2.2.4. ML solution implementation
The ML solution is based on the application of supervised classification algorithms. First, we will
apply off-the-shelf techniques in order to obtain accuracy figures when state-of-the-technique
algorithms are applied. After that, some architectures of deep (convolutional) neural networks will
be designed, trained and tested in order to show that these complex models are able to capture
the essence of data better than off-the-shelf techniques.
2.2.5. Expected experimentation results
The expected results include a success factor on identification of types of attack patterns. Above
80% of accuracy in different types of attacks will be considered as a successful KPI for an ISP. This
accuracy allows in real environments to reduce the traffic inspection capacity required to identify
suspicious traffic in nowadays networks, especially for traffics consuming a high amount of
bandwidth. This reduction in the required processing capacity will allow ISPs to deal with the
forecast 5G traffic patterns without reducing security standards for them, or even enhancing
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 28 of 78
these standards for specifically sensitive application environments or slices, to use a term that has
become common in 5G literature.
No initial results have been achieved during this phase because the activity is focused in setting
up the replication scenario and initial training stage.
2.2.6. Roadmap of the testbed
The planning status and roadmap of the activities in this testbed are:
1/2016-12/2016. Setup of testbed inside Mouseworld for new type of
traffic: Threats detection
01/2017-03/2017. First ML algorithm for Network Threats detection
03/2017-04/2017. Integration of the Testbed into the common
infrastructure of COGNET.
03/2017-06/2017. ER2. Testing and Final algorithms version, including
the description of the testing and the user manual
06/2017-12/2017. Policies enforcement into Open MANO
orchestration for Network threats
2.3. NFV Security Anomaly Detection Testbed
2.3.1. Scope of the testbed
OpenStack provides a rich environment of cloud-based services such as scalable processing,
storage and networking. The security anomaly detection testbed uses this cloud platform to
provide the required network infrastructure and implement the necessary functions in order to
detect the anomalies and to enforce appropriate actions when required.
During the life cycle of a service OpenStack networking services are partially involved such as the
binding of a network function to a virtual network. But the basic configuration and management
is static and not suitable for almost any dynamicity except for the scaling during the life-cycle of
the service. For this case we introduce the OpenSDNCore that implements a rich set of functions
to enforce requirements from upper layers such as:
traffic flow classification
traffic flow relocation
dynamic firewall rule enforcement
virtual network management
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 29 of 78
The deployment of OpenStack is done with Devstack, which provides an automated method of
deploying OpenStack and is suitable for development and operational testing. It is not a general
installer but easy to adopt to integrate individual git repositories.
The Fraunhofer OpenSDNCore tool is used only as an example of an SDN platform, but any other
being SDN platform can also be integrated. As the main goal of CogNet is to prove and to
develop machine learning techniques which could be able to be used in the carrier grade
managed networks, thus the developments being related to the acquisition of data, to its
processing through the ML mechanisms and through the actuation in the active system and not
on the development of the active system itself.
2.3.2. Architecture of the testbed
Figure 10 shows OpenStack and its components. Neutron is the component that provides
network abstraction to all other components of OpenStack. Additional agents implement the
interface to the virtual network resources. In our case OpenSDNCore is used to provide a PoC
network service framework. To allow an interaction between Neutron and OpenSDNCore an
additional ofs-agent is provided.
Figure 10 – SDN integration in OpenStack
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 30 of 78
2.3.3. Actual items implemented in CogNet
The implementation concentrates on the integration of OpenSDNCore and OpenStack in order to
provide a basis for the development and enforcement of firewall rules coming from the machine
learning firewall module.
1. Implementation of ofs-agent:
- the agent is used to provide basic connectivity for virtual machines
- plug/unplug ports to the openflow switch
2. Implementation of the deployment environment based on devstack
- Devstack integration of OpenSDNCore
- OpenStack in combination with OpenSDNCore can be deployed with a simple install
script
A basic Layer 3 firewall was developed using the OpenSDNCore configurations and its
northbound interface, through this providing the needed mechanisms for data flow monitoring,
filtering and processing.
2.3.4. Expected experimentation results
With such an implementation, combined with the Visualization GUI and with the monitoring
system from the dense urban area testbed, we expect to obtain the following main results:
Detection of known and unknown threats at network level by
analyzing the data traffic within the OpenStack system at
infrastructure level – this functionality is similar and may use the same
(or swap) machine learning algorithms with the previous testbeds.
Mitigation actions at the network level in the form of quarantine of
the different users as presented in the previous testbeds.
Providing means for the infrastructure provider to mitigate multiple
parallel services with different customized firewall-like functionality.
This is possible only when the infrastructure is controlled by the
infrastructure provider, separated from the software networks, which is
not the case of the previous described testbeds.
2.3.5. Roadmap of the testbed
In the next release, the testbed will be integrated with the anomaly detection as presented in the
Section 3.3and used to determine different unknown threats. A set of mitigation actions will be
defined such as the re-routing towards a quarantine network in case of malicious usage of the
network.
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 31 of 78
2.3.6. User Manual
OpenStack installation with OpenSDNCore
git clone https://github.com/CogNet-5GPPP/devstack.git cd ./devstack
Configuration for devstack installation:
[[local|localrc]]
HORIZON_BRANCH=stable/newton
KEYSTONE_BRANCH=stable/newton
NOVA_BRANCH=stable/newton
NEUTRON_BRANCH=master
GLANCE_BRANCH=stable/newton
HOST_IP=192.168.178.200
IP_VERSION=4
ADMIN_PASSWORD=cognet
DATABASE_PASSWORD=stackdb
RABBIT_PASSWORD=stackqueue
SERVICE_PASSWORD=$ADMIN_PASSWORD
ENABLED_SERVICES=rabbit,mysql,key
ENABLED_SERVICES+=,n-api,n-crt,n-obj,n-cpu,n-cond,n-sch,n-novnc,n-cauth
ENABLED_SERVICES+=,g-api,g-reg
ENABLED_SERVICES+=,horizon
enable_plugin nova https://github.com/CogNet-5GPPP/nova master
enable_plugin neutron https:// github.com/CogNet-5GPPP/neutron stable/newton
DISABLED_SERVICES=n-net
ENABLED_SERVICES+=,q-svc,q-agt,q-dhcp,q-l3,q-meta,q-metering,neutron
#Q_USE_SECGROUP=True
FLOATING_RANGE="172.18.161.0/24"
FIXED_RANGE="10.0.0.0/24"
Q_FLOATING_ALLOCATION_POOL=start=172.18.161.250,end=172.18.161.254
PUBLIC_NETWORK_GATEWAY="172.18.161.1"
#PUBLIC_INTERFACE=enx5855ca260b13
# OpenSDNCore provider networking configuration
Q_PLUGIN=ml2
Q_ML2_TENANT_NETWORK_TYPE=vxlan
Q_ML2_PLUGIN_MECHANISM_DRIVERS=ofs
Q_AGENT=ofs
Q_USE_PROVIDERNET_FOR_PUBLIC=False
OFS_PHYSICAL_BRIDGE=ofsbr-main
PUBLIC_BRIDGE=ofsbr-main
OFS_BRIDGE_MAPPINGS=extnet1:ofsbr-main
OFS_ENABLE_TUNNELING=False
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 32 of 78
# avoid vnc problems
NOVNC_BRANCH=v0.6.0
Based on this configuration the install process of OpenStack can be triggered:
$ ./stack
Installation of OpenSDNCore:
The installation of OpenSDNCore needs to be done before devstack installation.
Management of Switches in OpenSDNCore:
The following commands allow the management of switching instances of OpenSDNCore:
$ sudo ofts.sh --help
--- OFS ---
add-br BRIDGE [DPID]
del-br BRIDGE
add-port BRIDGE PORT
delete-port BRIDGE PORT
get-port-id BRIDGE PORT
listbr
list port
is_connected BRIDGE
is_present BRIDGE
OpenSDNCoreOpenFlow Controller API:
Example flow for DHCP-Request forwarding:
$ curl -X POST -H "Content-Type: application/json" -d '{
"id": 1,
"jsonrpc": "2.0",
"method": "ofc.send.flow_mod",
"params": {
"dpid": "0x0000000000000001",
"ofp_flow_mod": {
"command": "add",
"flags": [
"reset_counts",
"send_flow_rem"
],
"idle_timeout": 0,
"ofp_instructions": {
"write_actions": [
{
"output": {
"port_no": "0xfffffffb"
}
}
]
},
"ofp_match": [
{
"match_class": "openflow_basic",
"field": "udp_dst",
"value": "67"
}
],
"priority": 40,
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 33 of 78
"table_id": "0x00"
}
}
}' http://127.0.0.1:10010
2.4. Network traffic classification
As emphasized in Deliverable 5.1, the main goal of distributed security enablement is to classify
the traffic traversing the data plane into malicious and non-malicious. Such a traffic classification
task has a binary nature. In this Section, we will see an extension of this task to a rather general
setting, as a multitude of scenarios may arise, e.g., scenarios requiring a differentiation between
the types of malicious traffic which can apply to any of the security testbeds as described in the
previous three subsections.
We proposed a model for the network traffic classification within the NetCla: ECML-PKDD
Network Classification Challenge (http://www.neteye-blog.com/netcla-the-ecml-pkdd-network-
classification-challenge/). The objective of the challenge was to predict, for a transmission, the
type of the application that generated it. There were 20 target types of applications, thus, - a
multi-classification problem. For each data point, the measurements of a number of various
performance indicators and network parameters were provided. The data points were given in a
chronological order, sequentially, as the corresponding transactions were registered in the
network.
In the following, we will describe the classification architecture and give guidelines on how to run
the corresponding software. The tuned version of the proposed model obtained the 1st place in
the official NetCla ranking.
It should be noted that our study on application classification allows us to acquire knowledge on
how to characterize application by their generated
network activities. This enables designing of more
powerful models of anomalous applications, i.e.,
those behaving differently from the others.
2.4.1. Architecture
On the high level, the model can be represented as
in Figure 11. The raw data pass through a three-
stage feature generation and preparation process.
1. Feature Discretization
The attributes given in the data have values from
inhomogeneous number ranges. Thus, we
suggested applying a feature discretization method
to attributes having continuous values, namely,
Multi-interval Supervised Attribute Discretization
(Fayyad & Irani, 1993), which is implemented in the Figure 11 Model architecture for
network traffic classification
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 34 of 78
weka0F
1 toolkit.
2. Feature Generation using Random Forests
On the other hand, the raw attributes are very few for delivering a sufficient discriminative power.
To tackle this issue, we propose to generate additional feature combinations using Random
Forests (RF), which are very efficient in detecting non-linear patterns in the data. We trained,
again using weka, an RF classifier and then extracted all the paths from all the trees of the
resulting RF and used them as features.
3. Adding Label Dependencies as Features
With the third step, we aim at making use of structural information underlying the data, i.e., the
temporal information of the data points. For each data point (example) (xi,yi), we add the labels of
the preceeding Nexamples, - yi-N, yi-N-1, …, yi-1, as features. On the training stage, these
correspond to the gold labels, while, on test, we use the predictions of the classifier itself. This
way, we can encode the contextual information of each instance, namely what type of traffic
preceded the current transaction.
Once we select and extract the features, we perform training with a linear SVM classifier. Since
the data is large, moreover, with the extended feature set, we adopted a multi-class linear
classifier from the LIBLINEAR (http://www.csie.ntu.edu.tw/~cjlin/liblinear/) library, specifically
destined for large-scale computations.
Our code release provides the implementations of training and testing phases of our network
traffic classification model, which include Step 3 of our feature generation process. The features
produced by steps 1 and 2 are already pre-computed and given in the data file.
2.4.2. Download and Installation
We implemented Step 3 by making necessary modifications to the original LIBLINEAR
implementation. More specifically, we modified LIBLINEAR so that, when classifying test instance
xi, it uses its own predictions for xi-N,…,xi-1 as additional features for this data point, as described
in Step 3 of Section2.4.1.
1. Download
Use the following command to clone LIBLINEAR-LD from the repository.
git clone https://github.com/CogNet-5GPPP/WP5-CSE.git
2. Compile the code of the LIBLINEAR-LD
$ cd liblinear-ld/
1http://www.cs.waikato.ac.nz/ml/weka/
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 35 of 78
$ make clean $ make
Alternatively, follow the original LIBLINEAR README file (in liblinear-ld/).
2.4.3. Deployment
The current distribution contains a sample dataset, comprised of training and validation parts.
The data should comply with the SVMlight1F
2 format, with each line denoting a feature vector
starting with a label (class ID: 0, 1, 8, etc.) followed by a list of sparse features, e.g.,
8 391:1 937:1 1296:1 1797:1 5100:1 7826:1 ... 87551:1
To train a model, issue the following command 2F
3:
$ ./train –s 2 –c 1 –l 20 ./data/train.data ./data/ld-s1c1l20.model
Here, -s and –c are the standard LIBLINEAR parameters, the first is used for choosing the type of
SVM solver, the second one - for setting the SVM’s regularization parameter C. With the
parameter –l, one may vary the number N of the preceding labels to consider for each data point
(0, by default). Further, go the training set file and the output model path.
To test the obtained model, run:
$ ./predict –l 20 ./data/valid.data ./data/ld-s1c1l20.model ./data/output-ld-s1c1l20
On success, the predictions can be found in./data/output-ld-s1c1l20, which has a simple
format: each line contains one class label.
2http://svmlight.joachims.org/
3Before running this and the following command, you need to unpack the train and
validation files, train.zip and valid.zip, respectively, located in the liblinear-ld/data folder.
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 36 of 78
3. Dense urban area testbed
3.1. Testbed Description
3.1.1. Scope of the testbed
The role of the dense urban area testbed is to define a reference testbed infrastructure for the
reliability issues which may be solved by using the machine learning algorithms as described in
the following subsections.
As presented in D5.1, the dense urban area testbed is composed of a set of components which
emulate a comprehensive telecoms system starting with the emulation of a large number of
devices which connect to a realistic packet core network deployed on top of an NFV
environment, as is expected to happen in the near future in a carrier-grade operator network. The
NFV environment is completed with an NFV orchestrator which enables different actuation
actions specific to the life-cycle management of software only network functions. Additional to
this, as presented in D5.1, a set of actions will be executed directly in the active system through a
minimal OSS implementation, thereby emulating operations which do not pertain strictly to the
NFV management.
The following directions are taken for the data acquisition, algorithm implementation and
assessment of the mitigation actions:
1. Anomaly detection – determining whether the system maintains the appropriate
behaviour and whether compensation mitigation actions can be executed to bring it back
to the normal behaviour.
2. Clustering of users and network resources – simplifying the network management while
at the same time creating customization by clustering the users and the network
resources depending on their specific usage patterns to a reduced set of clusters with the
same behaviour
3. Fault detection and correlation – in case of failures, determining the root cause across the
multiple fault levels including hardware infrastructure, virtualization layer and service
layer.
4. Correlation of consumed resources – determining in a specific given system which
resources are needed by the components in correlation with the usage and the resources
consumed by other components. Through this a customized scaling process can be
determined for the specific deployment.
3.1.2. Architecture of the testbed
The architecture of the testbed is illustrated in Figure 12. In the following the functional
components which were integrated are functionally defined, followed by the specific work which
was developed in the CogNet project.
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 37 of 78
Figure 12 – Dense Urban Area Testbed
3.1.2.1 The Active System
The active system is based on the Fraunhofer FOKUS Open5GCore toolkit. Although not open
source, this decision to use the Open5GCore was taken as the Open5GCore represents a
reference platform for the software packet core implementation, with a first release in 2009
(under the name of OpenEPC) and a continuous functional and performance development since,
through this ensuring a minimal stability and similarity to carrier grade operator networks.
Additionally, the development of the active system is not a goal of CogNet, thus any system
which is providing enough maturity and relevance could be used for the proof-of concept, while
giving the opportunity to concentrate on the management plane related functionality, enhanced
with machine learning features.
In the following, the Open5GCore is shortly described, as the various implementation features
may affect substantially the monitored metrics as well as the results obtained through the
machine learning algorithms.
The Open5GCore runs on top of a Fraunhofer FOKUS own developed platform. The platform was
implemented specifically for running telecommunication network components, with the support
for a large number of protocols such as 3GPP S1-AP, Non-Access Stratum (NAS), GPRS Tunnelling
Protocol (GTP) or Diameter. The platform is written in C language and follows the modular design
of SIP Express Router. All the functionality (protocols, interfaces and components) are
implemented as independent modules. The Open5GCore supports an efficient mechanism for
inter-module communication. Additionally, modules can use each other to exchange functionality
in a plugin model.
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 38 of 78
Several characteristics of the platform are influencing significantly the monitored time series and
thus will provide some specific customization to the machine learning algorithms, thus are shortly
mentioned in the following, although not pertaining to the CogNet developments.
Additionally to the Open5GCore platform, the base code includes multi-process and
parallelization management, pool of workers for processing the tasks, a FIFO task queue, precise
timer and task scheduling etc. Being a telecom system, almost all the session processing can be
parallelized, the system scaling in an almost linear manner with the number of requests. However,
when the capacity limit is reached, the system misbehaves in a drastic manner (i.e. system failure),
which from an anomaly detection perspective makes the detection of the anomalies more
important (e.g. when such state is reached) as well as the sharpness of the detection (e.g. if
detected too late, the result is unusable as the system is not able to efficiently recover).
All the memory is pre-allocated as not to depend on the system allocation which produces a
large number of interrupts within the system. Additionally, a wrapper of the isolated and shared
memory is offered, enabling a fast development of new modules. This limits to a minimal the
effects of memory swapping in the operation systems, one of the most consuming operations
and a side effect into the delay of the processing of the requests. Because of this, a set of
previously determined side-effects (i.e. processing anomalies) of the operating system on the
software networks was removed.
Open5GCore features its own multi-level logging system enabling the fast spotting of different
errors depending on the logging level. As the logging relies on the pushing of the output
towards the standard output device, it should be completely de-activated during performance
measurements – the penalty of logging is around 400% in processing time due to the multiple
interruptions at operating system level. Due to the logging system, two types of measurements
were obtained – highly performant ones where it is almost not possible to fill the capacity of the
system (without the logging), system failing due to less performant programming before, and low
performant ones where the capacity is filled with the logging interruptions which are not
significant from the network capacity measurement perspective.
To run Open5GCore, there is a set of scripts which have to be executed. The scripts include the
configuration of the running nodes as pre-requisites to the actual running of the components
(installation of libraries, compilation, installation of configurations, provisioning of data bases,
installation and starting of the services. The configuration scripts automate also the network
configuration of the component, depending on its type. For making a commodity the installation
of a testbed, Open5GCore has a specific directory including the default configuration scripts for
all the components. For installing a new component the following steps have to be executed:
Installation of the Operating System
Preparation of the system for the specific component needs. A careful
attention has to be given to have the necessary network interfaces
available in both physical and virtual setups
Installation and configuration of the Open5GCore component
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 39 of 78
Restart and usage of the system
This steps were fully automated using Fraunhofer OpenBaton (www.openbaton.org), an open
source (Apache 2 license) ETSI NFV MANO compliant component able to remotely install, deploy
and configure a virtual network ecosystem, such as the one proposed in the testbed based on a
virtual network function descriptor configuration file. OpenBaton is able to orchestrate the
network deployment as well as to manage during the runtime of the testbed of the different
benchmarks. Because of this, the system can be deployed on demand when needed, not
requiring a pre-reservation of resources, making it easy to configure for different workloads and
for accumulating the monitored data.
3.1.2.2 Open5GCore Benchmarking Tool Functionality
The benchmarking tool was designed to assess the performance of the EPC core networks for
different number of subscribers, different number of eNBs and with different configurations,
enabling the quantitative evaluation of the different core network solutions. The benchmarking
tool provides the load of the system in the CogNet testbed including different types of synthetic
load as well as replaying realistic loads. In the following the benchmarking tool features are
shortly described, as they represent the major limitations of the testbed from the perspective of
workloads which can be introduced, drastically affecting the monitored data and through this the
training and the evaluation of the machine learning algorithms.
The BT includes:
Northbound API – a functional component which is able to receive the
benchmarking configuration from the test administrator.
Benchmarking Tool Rules (BTR) Module – performs the testing
process: based on the test configuration, it registers the defined UEs in
the network and requests the specified test operations through the
benchmarking tool module;
Benchmarking Tool (BT) Module - handles the EPS related
functionalities like UE creation, registration, operations and acts as a
singular or a group of eNBs that interact with the network;
UE Pool – represents a runtime subscriber database in which the state
information is maintained for each subscriber. As the expected
number of subscribers is in the order of 1 million, the state per
subscriber should be limited to a maximum of 1kB.
eNBs – at this moment the eNBs are running as separate processes. A
maximum number of eNBs per BT is 10, each representing a mobility
cluster. During the initial performance measurements, this limitation
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 40 of 78
was overtaken, due to unexpected good performance results of the
eNBs as well as of the handover network support.
For the data traffic emulation, two different network functions were developed and integrated
into the benchmarking tool: the traffic generator and the traffic analyser. They include the
following functionality.
1. Packet Injector: The packet injector generates data traffic according to different patterns. The
current supported patterns are
a. Ping: The short message transmitted with very low frequency enabling to test whether
the connectivity through the testbed is established.
b. Max. UDP: Filling up the network link with UDP messages enabling to maximize the data
traffic over the specific network link.
c. File transfers: Emulating bulk data traffic on top of UDP or TCP connections. The packet
injector generates the data packets internally, following the packet size and frequency of
the specific pattern. The data packets include the following additional fields:
i. A mask at the beginning of the data packet: Not to make any parsing and
mismatched protocol dissection into network sniffing tools such as Wireshark.
ii. A Session Id: To identify to which data flow the data packet pertains to.
iii. A Sequence Number: To identify order oriented properties of the data packets.
iv. A timestamp for the later correlation of the measurements. Similar to Iperf, the data
packets don't include the real payload as real data doesn't affect in anyway the
processing within the packet core.
2. GTP Encap/Decap: The Open5GCore GTP encapsulation/decapsulation module was added as
a part of the data path to be able to emulate the data traffic as it is transmitted to and from
the eNB.
3. IP Connectivity: A data traffic steering module over IP network was added enabling to change
the destination of the data packets as needed for the specific experiment.
4. Packet Statistics: The packet statistics module receives the data packets of the packet injector
and generates specific statistics based on them. The packet injector and the packet statistics
modules maybe co-located on the same network function and thus, sharing the clock, or
maybe synchronized via NTP. The clock information correlated with the timestamp in the data
packets gives the packet statistics module the possibility to execute the following statistics.
a. Capacity: The number of data packets sent and received at the other end representing the
network capacity of the SUT.
b. Delay: The comparison of the two timestamps gives the opportunity to measure the delay
while communication through the SUT. Selective comparison is enough.
c. Packet Loss: The number of data packets sent and not received at the destination, based
on the missing sequence numbers.
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 41 of 78
d. Jitter: The variation in time of the receiving of the data packets within the same session.
3.1.2.3 Monitoring
The Zabbix is used for monitoring the system. The Zabbix monitors numerous parameters of the
network like incoming/outgoing traffic through various interfaces as well as the health of the
system including IT infrastructure like CPU, memory, disk usages. It has a flexible notification
mechanism that offers excellent reporting and data visualization features based on the historical
and current data. Zabbix is written and distributed under GPLv2.
Features of Zabbix include: data gathering, flexible threshold definitions and highly configurable
alert mechanisms, real-time graphing and extensive visualization options, historical data storage,
network discovery and the availability of Zabbix API.
Zabbix consists of several major software components. The basic Zabbix architecture is shown in
the Figure 15.
Server: It is the central repository in which all the configuration, statistical and operational data
are stored and it is that entity that will actively alert administrators when problems arise in any
monitored system. It mainly consists of Zabbix backend server, web frontend and database
storage. The Zabbix server runs as a daemon process and can be started by executing the
zabbix_server script.
Database Storage: All the configuration information as well as data gathered by Zabbix are stored
in database (MySql, PostgreSql, Oracle, SQLite etc.) which the backend server and web frontend
interact with.
Web Interface: The frontend can be accessed from anywhere and easy to use with lots of
configurable options. This interface is part of the Zabbix server and runs on same machine as the
server. It is written in PHP.
Proxy: On behalf of Zabbix server, the proxy can collect performance and availability of the data. It
is an optional entity but can be handy when the distribution of the load is required. All the
collected data is buffered locally and then transferred to the Zabbix server the proxy belongs to.
Figure 13 – Zabbix Active Check (Trapping)
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 42 of 78
Zabbix Agent: The Zabbix agents are deployed on monitoring targets or devices to actively
monitor local resources (hard drives, memory usage, processor performance and statistics) and
applications and then report the gathered data to the Zabbix server. The agent gathers
operational information locally and reports data to the Zabbix server for further processing. It can
do passive and active checks on the system. In passive check, the agent responds to the query
request sent by the Zabbix server. In the active check, the agent first retrieves a list of items from
Zabbix server for independent processing. Then, it will periodically send new values to the server.
The check mechanism is configurable. They are shown in Figure 13 and Figure 14.
Figure 15 - Basic Zabbix Server and its relations with other entities
Data Flow within Zabbix: The data flow is quite easy to understand. A host has to be created in
order to create an item. The item gathers data. After the item is created, we can create a trigger.
The action method can be defined once we have a trigger created. Once, all those elements are
created, it is easy to see an overall flow. This can be easily done with the help of templates.
Zabbix Configuration
1. Hosts and Host Groups: The Zabbix hosts are the devices to be monitored (workstations,
servers, switches etc.). To begin with the monitoring, the first thing that has to be done is
creating a Host. Hosts are organized into host group
Figure 14 - Zabbix Passive Check (Polling)
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 43 of 78
2. Items: Items are the metrics to be monitored. Once the host is configured, monitoring items
start gathering actual data. Many Items can be quickly added by using a pre-defined
template to the host. For each item, one should specify what type of data is expected when
gathered from the host. For that, we use item key. The item key with the key system.cpu.load
gather information about processor load while the key net.if.in gathers incoming traffic data
3. Trigger: It defines an acceptable threshold or acceptable range. The logical expression in it
evaluates the data gathered by the item and can state the current state. The trigger is fired
when the value goes beyond the acceptable range changing the status to PROBLEM. It has
two states, OK and PROBLEM
4. Events: There are several types of events generated in Zabbix. The events are time-stamped
for easy identification when it occurred. The various sources of events are Trigger events,
Discovery events, Active agent auto-discovery events, internal events.
5. Visualization: It provides an excellent way to visualize data in terms of graphs. It allows
grasping data flow, correlation problems, discovering unusual patterns. It provides with a
built-in graphs of one item data, complex customized graphs, and access to comparison of
several items quickly. It is also possible to visualize the networks using network graphs
6. Templates: It is the set of entities that can be conveniently applied in multiple hosts. The
entities maybe items, triggers, graphs, applications, screens, low-level discovery rules, web
scenarios. A Template contains all the entities described for a host. It's an excellent way of
reducing the workload and reduces possible errors during configuration.
Zabbix Agent Installation:
To monitor the machine, the Zabbix agents have to be installed on them. These agents collect
metrics data and push them to the database. First step is to install the Zabbix agent of
appropriate version and configure the Server/Server Active IP to be set to the running Zabbix
Server.
For Ubuntu 14.04,
1. $ MONITORING_IP=”IP_TO_YOUR_ZABBIX_SERVER”
$ wget http://repo.zabbix.com/zabbix/3.0/ubuntu/pool/main/z/zabbix-release/zabbix-release_3.0-
1+trusty_all.deb
$ sudo dpkg -i zabbix-release_3.0-1+trusty_all.deb
$ sudo apt-get install -y zabbix-agent
$ sudo sed -i -e "s/ServerActive=127.0.0.1/ServerActive=$MONITORING_IP:10051/g" -e
"s/Server=127.0.0.1/Server=$MONITORING_IP/g" -e "s/ Hostname=Zabbix server/#Hostname=/g"
/etc/zabbix/zabbix_agentd.conf
$ sudo service zabbix-agent restart
$ rm zabbix-release_3.0-1+trusty_all.deb
Zabbix APIs
The Zabbix Server offers REST APIs to retrieve monitoring information about the hosts, items and
their values. The Open5G-GUI makes use of these REST APIs to retrieve various monitoring
information from the Zabbix Server. This is done periodically to gather information and store
them in its local mongodb database in its intended format. The backend server of Open5G-GUI
uses the request module to send the POST request to Zabbix Server.
The first step is to retrieve the token. The retrieved token is required for further retrieval of other
resources. For that following POST request can be made.
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 44 of 78
$ curl -i -X POST -H 'Content-Type:application/json' -d' {"jsonrpc":"2.0","method":"user.login",
"params":{"user":"Admin", "password":"zabbix"},"id":1}'
http://zabbix_server_ip/zabbix/api_jsonrpc.php
where, zabbix_server_ip is the accessible IP address of Zabbix Server.
Output: TOKEN ID
To retrieve the list of hosts, the POST query shall be
$ curl -i -X POST -H 'Content-Type:application/json' -d'
{"jsonrpc":"2.0","method":"host.get","params": {"output":"extend"},"auth": "TOKEN","id":1}'
http://zabbix_server_ip/zabbix/api_jsonrpc.php
where, TOKEN is the token string obtained from the previous query.
Output: hosts and host IDs.
To retrieve the list of items of a host, we need one of the previous host ids. The POST query is:
$ curl -i -X POST -H 'Content-Type:application/json' -d' {"jsonrpc":
"2.0","method":"item.get","params":{"hostids": "10105", "output": "extend"},"auth":"TOKEN", "id":
1}' http://zabbix_server_ip/zabbix/api_jsonrpc.php
where, 10105 is the host ID of one of the hosts.
Output: List of items of that host.
To retrieve the current value of the item, following POST query is required.
$ curl -X POST -H 'Content-Type:application/json' -d ' {"jsonrpc":"2.0","method":"item.get","params":
{"hostids":"10084","output":"extend", "search":{"key_":"system.time"}},"auth": “TOKEN”,"id":1}'
http://zabbix_server_ip/zabbix/api_jsonrpc.php
where, system.time is the metric or item name
10084 is the id of the host.get
Output: Returns the JSON object that contains the latest value of the metric system.time of the
host with host id 10084.
The above mentioned curl requests are direct and can be tested easily in the terminal. The usage of request
module in Visualization GUI is a bit different and carried out by the backend server of Visualization GUI.
To get the token, following JSON object options is created.
var options = {
headers: {
"Content-Type" : "application/json"
},
json: {
"jsonrpc" : zabbixParams.jsonrpcVer,
"method" : "user.login",
"params" : {
"user" : zabbixParams.username,
"password" : zabbixParams.password
},
"id" : 1
}
}
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 45 of 78
Where zabbixParams object information is retrieved from the GUI configuration file. After adding
few other parameters to above object like options.url, options.method,
options.authorization, then, the POST request is sent using the request module with
options which returns a callback.
request(options, function(error, response, body){
// Result
});
The above request returns the Token.
To retrieve the host lists, the options object is modified as
var options = {
headers: {
"Content-Type": "application/json"
},
json: {
"jsonrpc": zabbixParams.jsonrpcVer,
"method": "host.get",
"params": {"output": "extend"},
"id": 1
}
}
Note that, the method is set is host.get, hence, using the above received Token, the request is
sent using the same request module function as shown above. The callback is called once we
have the result. The result consists of list of hosts and host ids.
Similarly, to retrieve the list of items for a host, the method in the above options is changed to
item.getand hostidsis added. The hostids is the id assigned to the selected host. The
same request module function (as shown above) is used to send the POST request and once the
result is available, the callback function is triggered which returns the list of items.
The next step is to retrieve the last value of the selected item. The method in the above options
is changed to item.get and hostids is set to the id of the selected host. Furthermore, a
filter parameter is added that contains all the list of metrics that you want to retrieve the
metrics like
filter: {
key_ :selectedMetrics
}
Where selectedMetrics is an array and contains list of metrics being monitored.
The request is sent using request module function (as shown above) and when the callback is
returned, it returns the list of metrics with their corresponding metric last values.
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 46 of 78
List of Monitored Metrics
Service Independent KPIs
# Zabbix Metric Names Unit Description
1. System.cpu.util[,user] % Percentage of used CPU resource by a user
2. System.cpu.load[percpu, avg5] % Percentage of CPU load average over 5 minutes
3. vm.memory.size[available] Bytes Available RAM in the system
4. vm.memory.size[total] Bytes Total RAM of the system
5. Net.if.out[mgmt] bps Outgoing data rate measured at mgmt interface
6. Net.if.in[mgmt] bps Incoming data rate measured at mgmt interface
End to End Measurements KPIs for BT
# Metric Name Unit Description
1. bt_procedure_delay_(attach|detach|handover)_(min|max|avg)
ms Minimum, maximum and mean average time for the overall procedure to be completed (attachment, detachment, handover, service deactivation, service activation, paging, TAU etc)
2. bt_traffic_delay_packet_(min|max|avg)
ms Minimum, maximum and mean average time for the overall UE data traffic between endpoints.
3. bt_ue_packet_delay_(min|max|avg)
ms Minimum, maximum and mean average delay for UE data traffic between endpoints.
4. bt_packet_jitter 10 μs Mean packet delay variation on data path subsequent packets.
5. bt_packet_drop_rate_(min|max|avg)
Nr. per 10.000 packets
Minimum, maximum and mean average number of observed packets that failed to reach destination.
6. bt_active_session_(min|max|avg)
Nr. Minimum, maximum and mean average active sessions emulated.
7. bt_attached_devices_(min|max|avg)
Nr. Minimum, maximum and mean average of registered UE in the network.
8. bt_idle_devices_(min|max|avg) Nr. Minimum, maximum and mean average number of idle UE in the network
9. bt_data_load_(min|max|avg) Mbps Minimum, maximum and mean average network load.
10. bt_max_procedure_complete_per_sec
Operations/sec
Maximum number or procedures completed per second successfully handled by the system
11. bt_max_procedure_failed_per_sec
Operation/sec Number of observed procedures that failed in the network.
12. bt_process_delay_(attach|detach|handover)_(min|max|avg)
ms Maximum, minimum and mean average time for local processing of a single VNF during the procedure.
13. bt_max_session_drop_per_sec Session/sec Maximum number of UE session per second dropped during the procedure.
14. bt_max_procedures_requested_per_sec
Nr. Total number of procedures requested
15. bt_internal_failed_procedures_per_sec
Nr. Perceived unsuccessful procedures per 1000 subscribers.
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 47 of 78
3.1.3. Actual items implemented in CogNet
For CogNet the system was configured to be able to replay different traces to generate the
appropriate amount of data required for the training and the verification of the machine learning
algorithms as well as for the retrieving of the specific metrics. For this, different new
configurations were devised for the benchmarking tool to emulate a predicable load of the
system as well as for the visualization GUI.
3.1.3.1 Configuration file for benchmarking tool
The Benchmarking Tool is designed to test and assess the core network of the system. It provides
a simple functionality with configuration tests to be run on the Open5GCore system. The test
configurations range from the number of UEs emulated to the number of processes and memory
allocated for the tool. The Benchmarking Tool module is creating an EnodeB, allowing for the
virtual UEs to perform actions on the network. It provides an interface for register, attach and
detach procedures as well as a callback handler for operation statistics feedback. It handles the
network operations according to its specific procedures. The benchmarking tool has the
following XML structure.
<Module binaryFile="modules/console/console.so" >
<![CDATA[
<WharfConsole>
<Prompt text=" BT >"/>
<Acceptor type="udp" port="10000" bind="192.168.254.100" />
<Acceptor type="tcp" port="10000" bind="192.168.254.100" />
</WharfConsole>]]>
</Module>
<Module binaryFile="modules/addressing/addressing.so" >
<![CDATA[
<WharfAddressingWRR type="WRR" timeout="15">
<Address ip="192.168.4.80" weight="1" />
</WharfAddressingWRR> ]]>
</Module>
<Module binaryFile="modules/gtp/gtp.so">
<![CDATA[
<GTP>
<Acceptor id="GTP-U" type="udp" port="2123" bind="192.168.4.100" />
</GTP>
]]>
</Module>
<!-- Routing -->
<Module binaryFile="modules/routing_gtpu/routing_gtpu.so" />
<Module binaryFile="modules/routing/routing.so" >
<![CDATA[
<WharfROUTING><Extension id="0" dst_table="teid" mod_name="routing_gtpu"
ipv4="192.168.4.100" />
<!--<Extension id="1" src_table="teid" mod_name="routing_pdcp" ipv4="192.168.6.90" />--
></WharfROUTING>]]>
</Module>
<Module binaryFile="modules/sctp/sctp.so"/>
<Module binaryFile="modules/S1AP/s1ap.so">
<![CDATA[
<WharfS1AP>
<Local addr="192.168.4.100" port="36412" />
</WharfS1AP>
]]>
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 48 of 78
</Module>
<Module binaryFile="modules/nas/nas.so" />
<Module binaryFile="modules/benchmarking_tool/benchmarking_tool.so" >
<![CDATA[
<WharfBT
hash_table_size="32"
s1="192.168.4.100"
tac="1"
cell_id="1"
mcc="1"
mnc="1"
default_apn="default"
>
</WharfBT>
]]>
</Module>
Using this configuration, a set of data was acquired, monitored from both the subscriber and the
system side, according to the previously selected metrics and constitutes the basis for the
machine learning algorithms further described.
The data is available in the CogNet repository at:
git clone https://github.com/CogNet-5GPPP/WP5-CSE.git cd ./WP5-CSE/ADE
3.2. OpenSourceMANO OpenVIM and OpenBaton Integration
One alternative to the common OpenStack solutions is to integrate the different components
within the OpenSourceMANO, specifically the OpenVIM of Telefonica and the OpenBaton of
Fraunhofer and through this to provide a comprehensive NFV management system. A proof of
concept for this alternative to the common NFV system was implemented and presented in the
following sections.
The OpenMANO plugin works as a bridge between OpenVIM and OpenBaton by translating the
messages between them. For the OpenMANO plugin to function properly, the OpenVIM and
OpenBaton have to be setup properly.
OpenVIM Installation and Configuration
For an OpenVIM to run smoothly, two machines “Compute Node” and “Controller Node” should
be setup. In the Compute Node, the VNFs are deployed. It preferably should be an Ubuntu Server
14.04, 64bit OS with KVM, qemu and libvirt installed. A user has to be created and under some
accessible path (e.g. home directory), the Ubuntu image file has to be loaded. The OpenVSwitch
can be installed and desired number of bridges can be created.
To setup Controller Node” is to setup OpenVIM. To setup OpenVIM in another VM use the
following script.
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 49 of 78
# wget https://github.com/nfvlabs/openvim/raw/v0.4/scripts/install-openvim.sh
# chmod +x install-openvim.sh
# ./install-openvim.sh
It installs all the required modules to run OpenVIM, internal database and etc.
Also, to install floodlight v0.9, there is a script available.
# wget https://github.com/nfvlabs/openvim/raw/v0.4/scripts/install-floodlight.sh
# chmod +x install-floodlight.sh
# ./install-floodlight.sh
The configuration file of OpenVIM available at openvim/openvimd.cfg should be configured for
Bridge parameters, DHCP server parameters. The mode of operation can be setup to
'development' or 'normal'. In the folder openvim/test, there are configuration files for setting up
hosts, images, networks, servers, flavors. They have to be configured and created according to
the need. There are scripts to start the OpenVIM in bin folder. It can be added in $PATH and
execute as service-openvim start in the terminal.
The OpenVIM tries to access the image stored in Compute Node, so, the OpenVIM should have
access to the Compute Node VM using the user that was created.
The OpenVIM offers northbound REST APIs to access various resources regarding VNFs, networks
etc. and allows CRUD operations. The OpenMANO plugin uses these REST APIs to carry out the
operations.
DHCP Server Installation
A DHCP Server is required for OpenVIM to assign IP address to newly created VM in compute
node.
# apt-get install dhcp3-server
Edit the file /etc/default/isc-dhcp-server to enable DHCP server in appropriate interface.
# vi /etc/default/isc-dhcp-server
INTERFACES=”eth1”
Edit file /etc/dhcp/dhcpd.conf to specify the subnet, netmask and range of IP addresses to be offered by
the server.
Then, restart the service.
# serviceisc-dhcp-server restart
OpenBaton Installation
To install OpenBaton in Linux environment, execute the following command.
# sh<(curl -s http://get.openbaton.org/bootstrap) release
For more detailed description required during the setup, the following link can be consulted.
http://openbaton.github.io/documentation/nfvo-installation-deb/
The configuration file is available at openbaton/nfvo/etc/openbaton.properties
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 50 of 78
OpenMANO Plugin
The OpenMANO Plugin acts a bridge between the OpenVIM and OpenBaton for translating the
messages. It uses northbound REST APIs exposed by OpenVIM to access the resources. The
source is available at
https://gitlab.fokus.fraunhofer.de/openbaton/openmano-plugin
It can be cloned as:
$ git clone https://gitlab.fokus.fraunhofer.de/openbaton/openmano-plugin.git
After the compilation, the compiled JAR file should be added in specific folder
openbaton/nfvo/plugins/vim-drivers. Then, restart the OpenBaton, it should load the OpenMANO
plugin.
Developed functionality
As illustrated in Figure 16, when the NSD is deployed in OpenBaton, the function create_server is
triggered in the plugin, which then, sends a REST API POST request to openVIM to create a server
(VM) in the KVM in Compute Node. The deployed VM gets private IP via DHCP Server. The
dhcp_thread.py in OpenVIM reads the dhcp.leases file in DHCP server to retrieve the private IP
assigned to that VM based on MAC address. The public IP in the VM is based on a bridge
configured in the Compute Node. Similarly, network information can be retrieved using the
northbound REST API.
Figure 16 – OpenVIM – OpenBaton Integration Architecture
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 51 of 78
3.3. Anomaly detection
Figure 17 Anomaly Detection Ensemble (ADE) approach for early anomaly detection.
Anomaly detection refers to the problem of finding patterns in data that do not conform to
expected behaviour. These non-conforming patterns are often referred to as anomalies, outliers,
discordant observations, exceptions, aberrations, surprises, peculiarities or contaminants in
different application domains [7]. The ability to discover anomalies within a dataset can have a
significant impact in variety of application areas, such as: fraud detection for banking and
financial industries, intrusion detection for discovering security threats, health related problems,
performance degradation detection, traffic congestion detection and so on. For instance, a failure
within a data centre can be considered an anomaly.
Proactive anomaly detection refers to anticipating anomalies or abnormal patterns within a
dataset in a timely manner. Discovering anomalies such as failures or degradations before their
occurrence can lead to great benefits such as the ability to avoid the anomaly happening by
applying some corrective measures in advance (e.g., allocating more resources for a nearly
saturated system in a data centre). We address the proactive anomaly detection problem through
machine learning and in particular ensemble learning. We propose an early Anomaly Detection
Ensemble approach, ADE, presented in Figure 17.
The approach follows the following steps:
Data preparation. Given a labeled dataset d, the data preparation
phase involves three steps: (i) Applying existing anomaly detection
techniques, (ii) Gathering the scores of each technique on the given
dataset, and (iii) Aggregating the results of each technique for training
purposes.
Anomaly window generation to be used as ground truth. In order to
prioritize discovering anomalies in a timely manner, we utilize a
weighted anomaly window as ground truth for training the model
which prioritizes early detection. Various strategies are explored for
generating ground truth windows. Results show that ADE shows
Generate and
Append Window
Anomaly
Detection
Results
Train ADE
Apply ADE
Incoming
(Test)
Dataset d'
Timestamp Value T1 ... Tn gt gtw
10.12.2015 19 0 ... 0 0 0
11.12.2015 2000 1 ... 0 0 1
12.12.2015 25 0 ... 1 1 1
10.12.2016 20 0 ... 1 0 0
Apply and aggregate
Scores of Techniques
T1 ... Tn
on Dataset d
Training
Dataset d
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 52 of 78
improvements of at least 10% in earliest detection score compared to
each individual technique across all datasets considered. The
technique proposed detected anomalies in advance up to ~16h
before they actually occurred.
Training the ensemble model using the ground truth window
generated field from the prior step. The approach combines results
of state-of-the-art anomaly detection techniques in order to provide
more accurate results than each single technique. Moreover, we utilize
a weighted anomaly window as ground truth for training the model,
which prioritizes early detection in order to discover anomalies in a
timely manner. Various strategies are explored for generating ground
truth windows. Results show that ADE shows improvements of at least
10% in earliest detection score compared to each individual technique
across all datasets considered. The technique proposed detected
anomalies in advance up to ~16h before they actually occurred.
Applying and gathering the results of applying the model on a new
incoming or test dataset.
3.3.1. Download and Installation
1. Dependencies:
R
XGBoost
o Standard R library for Extreme Gradient Boosting, which is an efficient
implementation of gradient boosting framework
The libraries associated with test data can be cloned by
git clone https://github.com/CogNet-5GPPP/WP5-CSE.git
cd ./WP5-CSE/ADE
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 53 of 78
3.3.2. Deployment
All the data that are consumed in the training phase are given in the folder “data”. The data used
for training and validating our approach is from the Numenta Anomaly benchmark (NAB) 3F
4, which
provides a set of real-world time-series datasets, denoted by D (58 files). These datasets are
labelled, i.e., contain a field that can be 1 or 0 depending on whether the record is an anomaly or
not, respectively. In this paper we use these datasets for the evaluation of the ensemble
approach. The NAB benchmark also compares Nupic with Twitter Anomaly Detection R package
and ETSI Skyline. In our evaluation we also compare against these three techniques, and in
addition to IBM SPSS solution for anomaly detection. It is important to mention that for training
the ensemble model we used the scores produced by the techniques, as shown in Figure 17. The
repository contains the data sets available from the Numenta Anomaly Benchmark in the
following path:
ADE/data/
We proposed several strategies for the anomaly detection ensemble engine based on variations
of generating anomaly windows. For the ensemble model, we used the XGBboost library, which is
an optimized distributed gradient boosting library from the R programming language4F
5. The
library provides a parallel tree boosting (also known as GBDT, GBM) that is known for being
efficient and accurate.
We devised different strategies for generating anomaly windows fields in order to investigate
their impact on the early detection of anomalies, corresponding to different variations of ADE.
Some strategies focus on giving higher weights closer to the actual anomaly for improved
precision and recall (i.e., XGB_gtl). Others focus on giving higher weights closer to the beginning
of the window for earlier detection (i.e., XGB_earliest).
The R code for training the anomaly detection ensemble model needs to be invoked by
RscriptADE_joint_ensemble.R
The script first runs a function to find the optimum number of rounds for training the model in
order to maximize the area under the curve (AUC) for ranking evaluation.
For instance, the Figure 18 and Figure 19 show the AUC on test how it is initially increasing, and
then it is decreasing as the number of rounds increases.
4https://github.com/numenta/NAB
5https://github.com/dmlc/xgboost
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 54 of 78
Figure 18 Increasing AUC with number of rounds.
Figure 19 Decreasing AUC with number of rounds.
Further it trains the ensemble model using the optimum number of rounds previously retrieved
by calling the xgboost function with this parameter. The script further produces the predictions
by calling the function on the testMatrix: predict (model, testMatrix). The predictions of each ADE
variation are then merged into a single CSV file.
Figure 20shows an example of the output obtained after running ADE on this data, where initially
as it can be observed the predictions are close to 0 suggesting there is no anomaly in the
measurements.
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 55 of 78
Figure 20 Predictions for different variations of ADE strategies utilizing xgboost.
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 56 of 78
3.3.3. Initial Results
Table 2 Earliest detection for all anomalies that have been detected by at least one
technique across all testing datasets. First ranked detections are illustrated in green and
italic font. Second ranked detections are illustrated in orange and bold font.
We analyse for each technique the detection time and rank of all anomalies across all datasets
used for testing purposes. A filtered table of these results, showing only the top performers of
ADE is presented in Table 2. Moreover, as results show that XGB_window and XGB_earliest
outperform the other ADE techniques in terms of earliest detection, we only present the results of
these techniques in the table. As it can be observed XGB_window is ranked 1st in 5 out of 14
cases, in other 3 cases as 2ndand other 3 as 3rd. Twitter anomaly detector is the 2nd performer,
with 4 discoveries ranked as 1st and 3 as 2nd , and 1 as 3rd . More interesting is to observe the
difference between the detection indexes across the discovered anomaly. For instance, even
though XGB_window is ranked 2nd for the 3rd anomaly, the difference between its detection index
and Numenta's, which is ranked 1st is of just 1 position.
3.3.4. Development status
The above libraries have been applied in [7] to validate our solution for early anomaly detection.
It may be enhanced as the further investigation in the area by the IBM team based on new
requests and data available in the project.
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 57 of 78
4. Taxonomy of Mitigation Actions
The scope of this section is to provide a set of clarifications on the possible mitigation actions of
the machine learning algorithms as basis for the further implementation of the policies of the
systems developed as presented in the previous sections.
The section gives a set of considerations on the actuation of the given system based on the
results of the cognitive process. It aims to provide a comprehensive overview on the possibilities
brought by the dynamic mitigation mechanisms based on programmability. More specifically, it
concentrates on the automation of the decision process based on experience accumulated
through machine learning as an evolution beyond the basic policy system currently deployed.
In SDN and NFV, a new intermediary virtualization layer is added between the infrastructure and
the network functions. This layer enables a large amount of flexibility
Specifically, with SDN and NFV, having a new intermediary virtualization layer between the
network functions the system becomes more flexible, freeing the different components from the
limitations of single physical components and at the same time freeing the network from the
specific cable structure. In combined SDN and NFV environments there is virtually no topology to
be strictly considered.
In order to be able to profit from this flexibility a set of dynamic mechanisms can be considered
and be added to the system in order to increase the overall resilience and security. In the
following, these mitigation actions are shortly presented with examples, followed by a set of
considerations on how these mitigation actions can be dynamically programmed by the cognitive
system
4.1. SDN/NFV specific mitigation actions
In this section, the new mitigation actions which can be added to a system are presented. Some
of these actions are already implemented into the existing de-fault standard systems like
OpenStack, some other still have to be considered depending on their feasibility for the
environment use cases.
Scaling – the main characteristic of the NFV environment is that the different components
can scale on demand. The scaling presumes the deployment of more components of the
same type in parallel in order to face up the specific load and thus to better serve the
subscribers.
Dynamic load balancing – due to scaling, the load can be split between the different
components of the same type. When a new component is started, the load can be split
between the existing and the new component.
Dynamic hot standby – thanks to the dynamic deployment of components of the same
type, the NFV environment makes possible a direct hot standby mechanism where a set
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 58 of 78
of components are deployed and configured only to be able to provide the service in
case of a failure
Adding supplementary VNFs – within a specific system a new set of components can be
added transparently in order to increase the functionality of the virtual network. For
example, a firewall with more functions can be added to a system in case a threat is
detected. In this case, even though the system becomes more complex, it is better
protected during the specific attack. The process can be executed also with dynamic
replacement of existing VNFs.
Correlated scaling – at the current moment, the NFV environment considers only the on-
demand scaling of single components in case there is a need of more resources.
However, usually, with the increase of the load in the system, multiple components will be
scaled in a similar manner. Thus, a correlated scaling of multiple components would make
sense in order to be able to maintain a coherent resilience level
Flexible topology – when deploying the software network components on the same
hardware, any possible network connection can be established directly between two
components of the virtual network as this is done by the underlying network
virtualization system which, in order to make the system work, has to connect all the
various components. Thus, links between components can be created and torn down on
demand. The topology changes will require changes in the routing system. Changing of
the routing is necessary in order to benefit of the momentary optimized topology.
Security zones – with the deployment of distributed firewall components within the same
virtual network, it is very easy to deploy the VNF components in different security zones
which can be differentiated based on access rights and privacy levels. Whenever, some
area of the network requires new access rights, a new network of the same type can be
deployed with these new access rights.
Cloning of the services – one of the main advantages of the NFV environment is that the
same network service can be deployed multiple times with different security and
reliability levels while from the perspective of the subscriber it is the same service (no
modifications needed in the end device).
For example for reliability, if a network is offering a highly reliable service to a set of
subscribers and a less reliable one for others, two networks with the same components,
but with different hot stand-by levels can be deployed. However, in order to do this, the
reliability on top of the infrastructure where the service is deployed has to be understood.
Here machine learning can dynamically determine how well the service performs on top
of the specific hardware and virtualization layer.
For example for security, three networks can be deployed in parallel with the same
functionality however with different goals from the operator perspective: one normal
network with a minimal firewall and with a set of monitoring tools to determine if the
behaviour of the subscribers is appropriate, one with a more advanced firewall and with a
large number of monitoring where potentially malicious subscribers are moved in a sort
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 59 of 78
of quarantine to determine if it is an attack or not and honey nets where determined
attacks are moved for their study (i.e. to determine the attack vectors) with no access to
real private data.
Cloning of the service gives the option to experiment. Experimentation reduces the
feedback loop and provides faster the appropriate adaptation mechanisms.
Predictions may work – a very large amount of the failure events in the network can be
traced back to previous anomaly states which were not considered as detrimental to the
system (mainly because they were not failures) such as an increase of the processing in
some key control plane component for some subscribers. Such outstanding events can
predict failures of the system in a later stage. However, in order to be able to determine
such predictions and their appropriate mitigation actions, a mechanism of correlation is
needed.
Dynamic adaptation to anomalies – the current system is based on a set of static policies
especially because the number of possible mitigations is very small. Using the previous
described mitigation actions, the system has a multitude of possibilities to adapt. One of
the most important types of adaptation are to the unknown threats or failures. The
machine learning system can determine through anomaly detection unknown situations
in which case the system can take appropriate (initially default) actions and grow
experience as the situations repeat themselves.
For the next software engineering release the most relevant of these actions will be implemented
in the form of a prototype together with the appropriate mechanisms to present the machine
learning insight.
4.2. Roles of the Cognitive System
The cognitive system comes to complete the basic policy enforcement model which is currently
in use with more dynamicity. As illustrated in Figure21, the current model which is deployed for
more than 20 years includes a Policy Decision Point (PDP) which based on the policies introduced
by the system administrator, the events received from the active system and the conditions of the
active system makes decisions and enforces them on the Policy Enforcement Point (PEP). In this
case, the PDP has to be completely pre-configured with the comprehensive set of policies by the
administrator of the system.
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 60 of 78
Figure 21 – Policy Decision Model
Figure 22 – Policy Decision Model with Cognitive System
When adding a cognitive system to the policy decision model, as illustrated in Figure 22, the
cognitive system can have three different roles, depending on the degree of involvement with the
real system.
1) Immediate action – based on the insight generated by the cognition, the cognitive
system sends an immediate action to the enforcement point for executing some
operations on the active system. This type of behaviour is not beneficial in case of
resilience and security situations as it overlooks the complexity of the managed system
and it may deteriorate its behaviour compared to the PDP which includes the necessary
policies for an appropriate behaviour.
Another sort of immediate action is a policy which is transmitted to the PDP including the
specific conditions e.g. instead of having only the event included it includes also a set of
conditions which force the PDP to a specific behaviour.
Another sort of immediate action is an implicit policy e.g. when the cognitive system
transmits a “1” it means that a specific action has to be executed, which is even more
problematic as it implies a complete correlation between the PDP and the cognitive
system information on events, conditions and the policy actions.
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 61 of 78
2) Policy triggering – through the insight of the cognitive system an event is determined be
it a complex event or a prediction which cannot be immediately derived from the
monitored information and require a trained model to be determined. In this case, the
cognitive system transmits an event to the policy system which in its turn, by checking
the conditions selects the appropriate mitigation actions.
In this situation, the policies are statically introduced by the administrator of the system
and the machine learning has the role based on the dynamic information to tweak events
in such a manner to optimize the behaviour of the system in the given network context.
3) New policy/Policy Modification – in this situation, the cognitive system takes the role of
modifying the running policies based on the gained experience while analysing the
specific data. It is considered that the system is developed with a set of default policies
which are then dynamically adapted based on the specific of the deployment to the local
network conditions and to the specific usage patterns. In this context the machine
learning is the most useful as it can modify the default system to a customized one by
using the specific dynamic statistics mechanisms.
However, in order to be able to make such policy modifications, the system has to be
adapted with a set of meta-polices of experience as described in the next section.
4.3. Development of System Experience
As illustrated in Figure 23, in order to be able to modify the set of policies on which a system is
running a new dynamic control loop has to be added.
Figure 23 – Experience Control Loop
First, the basic policy control loop presumes that events are received from the running system.
Based on the events, the policy system makes a matching of the appropriate actions to execute
based on the momentary conditions within the system. Then, the actions are enforced on the
running system adapting it dynamically to the new events. In this system, the policies are static
and have to be pre-installed, including events, conditions and actions.
In case of generation of new policies based on experience, the input is also based on the events
received from the running system. The result of the experience is new policies which are installed
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 62 of 78
in the policy system. From this, it results that the action of the experience system is a set of
policies. For being able to generate these policies, the experience system has to include a set of
“meta-conditions” which can be matched on certain events. A main issue to study and to
implement as a proof-of concept on the utility of the machine learning are these meta-conditions
as a means to determine an increase on the experience of the management system on the
specific local conditions. A basic implementation of such a system will be presented in the next
deliverable.
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 63 of 78
5. Visualization GUI
One of the main tools to understand and to appropriately assess the value of a machine learning
technique as well as to enable the administrator of the system to provide human knowledge
perspective where the techniques are limits is trough visualization. In this context, a new
visualization GUI was developed as part of the CogNet project which is extending the existing
Open5GCore GUI with the specific functionality for:
Visualization of batch data – until now, all the available GUIs related to NFV management
are able to visualize only the live data
Capability to scope on different data – extension of the GUI to give to the administrator
the perspective on the specific time intervals of the data with different granularity levels
Combination of external data and the monitored data – enabling the visualization on the
same time series graphics of data resulting from the ML algorithms (e.g. prediction) and
the active data
Specific metrics which enable to determine the state of the system e.g. colour base
changing in case the system is in an abnormal state
The Visualization GUI is meant for data visualization especially in the form of time series in this
current version, later being planned to be extended with statistics values. To make it efficient and
simple to use, a good selection of technologies and quite a large number of libraries are in use.
These components work together in the GUI to provide a nice visual interpretation of data. The
GUI is made interactive by displaying number of so called 'toasts' or pop-up messages to show
the current states of GUI. The Visualization GUI main components as shown in Figure 24 are
divided mainly into three sections.
1. Frontend: The Frontend interface is developed using AngularJS framework (version 1)
empowered by Bootstrap 4 for complex CSS Styling. AngularJS framework is a very
convenient tool to develop frontend of the dashboard. It is supported by large number of
MIT licensed libraries.
2. Backend: The Nodejs (version 6.x) framework is used in the backend to create an efficient
and asynchronous server. The npm module is used for package management. The backend is
supported by large number of MIT licensed modules, most notably async, socketio, mongoose
and tiny-worker. These modules/libraries help to make the backend server efficient. The
Frontend and Backend components communicate using socketio module. The messages are
exchanged based on events over TCP.
3. Database: An efficient MongoDB database is chosen to store and retrieve data. It is a NoSQL
database and uses JSON documents with formatted/non-formatted schema. The mongoose
module in the backend interacts with this database to store and retrieve data.
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 64 of 78
5.1. GUI Installation
The dashboard installation is quite easy and quick. The GUI can be cloned from gitlab:
# git clone https://gitlab.fokus.fraunhofer.de/phoenix/open5g-gui.git
The GUI comes with a script (prereq.sh). To install the modules, you just have to run that script. It
installs all the pre-requisite modules. It is well tested in Ubuntu version 14.04 and 16.04.
# ./prereq.sh
In the same folder, there is a “config.json” file. It has to be configured before running the backend
server. You need to configure the orchestrator parameters, zabbix server parameters, OpenFlow
parameters, BT parameters, LWM2M parameters.
The backend server can be run using the script “runServer.sh”.
# ./runServer.sh
After the backend server starts running, the frontend GUI is available at
https://ip_of_server:8000
The config.jsonconfiguration file description:
Figure 24 – Visualization GUI architecture
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 65 of 78
{
"appConfig": {
"httpServer": {
"hostname": ip of the GUI(default:"localhost"),
"port": port(default:8000),
"keys": {
"key": "keys/server.key",
"cert": "keys/server.crt"
}
}
},
"open5gParams": {
"name": "open5gParameters",
"dbConfig": {
"databaseName": "Open5G_GUI",
"ipAddr": ip of mongodb machine(default:"127.0.0.1"),
"port": listening port(default:27017),
"reconnectTries": number of reconnect tries upon failure (default:10),
"maxConnSize": number of connections(default:10),
"reconnectInterval": request reconnect if failed to connect(in msec)
},
"orchestrator":{
"ipAddr": ip address of orchestrator,
"port": port,
"username":"admin",
"password":"openbaton",
"oauthTokenPpath":"/oauth/token",
"userToken":"openbatonOSClient",
"passToken":"secret",
"security": if security feature is enabled(true or false),
"grantType": "password",
"requestTimeout": request reconnect if failed to connect(in msec)
},
"zabbixServer":{
"ipAddr": ip address of zabbix server,
"port" : port(default: 80),
"username" : username(default:"Admin"),
"password" : password(default:"zabbix"),
"defaultPath" : "/zabbix/api_jsonrpc.php",
"jsonrpcVersion" : "2.0",
"startFetch": true/false for metric fetch at startup,
"defaultInterval": interval for metric fetch(in msec),
"requestTimeout": request reconnect if failed to connect(in msec)
},
"dbMySqlConfigFlowmon":{
"hostIP": ip address of flowmondb machine,
"hostPort": port(default:3306),
"user": username for db,
"password": password for db,
"database": name of the database,
"connectionLimit": max number of connections(default: 100),
"debug": false,
"startFetch":true/false for data fetch at startup,
"queryInterval": request reconnect if failed to connect(in msec)
},
"observeMetrics":{
"add": [{“host”:name of host, “metricsList”:[array of metrics to be monitored]}],
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 66 of 78
"removeHosts":[array of hostnames]
},
"btConfig":{
"ipAddr": ip address of bt machine,
"port": listening port
},
"dbMySqlConfigLWM2MSrv":{
"hostIP": ip address of the lwm2m server,
"hostPort": port(default:3306),
"user": username for db,
"password": password for db,
"database": name of the database,
"connectionLimit": 100,
"debug": false,
"startFetch": true/false for data fetch at startup,
"queryInterval": request reconnect if failed to connect(in msec)
}
}
}
5.2. GUI Interactions
When the frontend is loaded in Google Chrome, the index page is loaded with a Menu on the left
side. On the top right, you can observe the list of partners' logos. In the Menu, you can see
options like Infrastructure, Load Static Topology and Dashboard Settings.
Infrastructure: If the backend config.json is properly configured for orchestrator, it should load
the Topology by processing the Network Service Record (NSR) obtained from the orchestrator.
This is Dynamic Topology.
Load Static Topology: In absence of orchestrator, it is possible to load static topologies. Once
you click it, a modal pop-ups and allows the user to load number of topology files. The static
topology file should follow a specific format which is described below. The topologies are
distinguished by different colours.
Dashboard Settings: It offers options to select the themes. Based on themes, it loads different
colours and partners' logos.
To load the Static Topology, click on “Load Static Topology”. The format of the static topology is
explained below.
[{
"name": name of the slice,
"hostCollection":[{
"host":name of the host,
"details":[{
"hostnames":[{
"hostname": hostname,
"id":unique hostname ID,
"floatingIps":array of floating IPs,
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 67 of 78
"metricFetch": true or false,
"keyword": some fixed keyword,
"ips":[{"netName": net name,"ipAddr":array of IPs} ]
}]
},
{"datacenterName":[name of datacenter]},
{"hostId":unique host ID},
{"relations":array of dependent hosts}
]
}]
}]
After loading the Static topology, it checks whether the topology is in proper JSON and in proper
order. You can also load multiple topologies by loading multiple files. If all the conditions are
met, it loads the topology/all topologies and displays as shown in Figure 25.
The list of VNFs are displayed. The colour distinguishes the slices. A slice contains the information
of VNFs belonging to the same datacentre. In the figure above, all the VNFs are green which
means they belong to same slice and same datacentre. If you click one of them, it loads the Slice
Visualisation, Time Series frame and Benchmarking Tool frame as shown in Figure 26.
Figure 25 - NFV Slice visualization
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 68 of 78
Slice Visualisation: In this section, you view all the VNFs connected to each other based on their
relations. You can move them around to organize. The positional structure is automatically stored
in the database so that later again if you load it, it displays the same structure. The two gauges
on each monitor the CPU and RAM usages. The metrics information are retrieved from Zabbix
Server using REST APIs by the GUI backend. Below you can see some options like show/hide flow
monitoring, show/hide relations, import/export CSV files. Few more options are associated with
each VNF. The Flow Monitoring traffic is also displayed if it is configured in the config file of the
backend and it is properly connected to the database that stores the flow information. The traffic
is displayed using dotted lines and red bubbles shows the traffic flow and its size determine the
traffic size. If you right click any of them, a menu appears as shown in Figure 27 for the dense
urban area testbed. It allows you to export/import CSV formatted files, view Time Series of its
metrics (particularly meant for Anomaly detection), Metrics List that contains all the metrics
extracted from Zabbix Server which you can select for monitoring the metrics, Show Monitored
Metrics displays list of metrics being monitored by the system, and the list of IPs and Interfaces
associated with it.
Time Series: When you select a VNF in Slice Visualisation, it starts displaying the Time-Series of
the selected metrics. If you click on the top in the frame, an expanded view appears where you
can have controls to input time range, interval duration as shown in Figure 28.
Benchmarking Tool: You can set number of parameters associated with it by moving the sliders
and click on 'start' to send the command to the BT machine running in some environment like
OpenStack, VMWare, physical machine which is accessible by the GUI backend (e.g. OpenStack).
An interactive display is shown in the form of pie diagram. The message being sent will be the
Figure 26 – Software Network Overview
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 69 of 78
UDP message. The IP address and port information of the BT machine has to be set in the
configuration file (config.json) before running the backend. It is illustrated in Figure 26.
Figure 27 – Slice Overview
Figure 28 – Time Series Visualization
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 70 of 78
The Figure 29 shows the time-series for Anomaly Detection. After the processed CSV data is
imported to the GUI, you can select the 'timeseries' option on host right click to open the
window. For the host selected, you can select a number of metrics from the list to load the time
series – Actual vs. Predicted and Anomaly Score Vs. Anomaly Label.
Figure 29 – Prediction and Anomaly Detection Visualization
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 71 of 78
6. Conclusions and Further Work
This deliverable includes a set of practical implementations in regard to the specific system where
the machine learning techniques can be applied and could provide positive results as well as the
development of the machine learning techniques based on the data acquired from the different
testbeds. Until now, the testbeds were mainly used to acquire data and to train the machine
learning algorithms. Because of this, a large insight was acquired on the specific behaviour of the
systems which was useful for the administrators to configure them and to make them more
reliable for the long duration measurements needed for accumulating the data required by
machine learning. In the following steps, this insight of the machine learning algorithms will be
used to make dynamic decisions.
The next release will close the gap between the testbeds and the machine learning mechanisms
by implementing the results of the machine learning algorithms in the form of mitigation actions,
through this proving the usability of the machine learning as a technology for enhancing the
network management in the SDN/NFV environment for increasing the network resilience and
security. This includes the further development of the security and of the reliability testbeds up to
comprehensive proof-of concept levels for enabling machine learning based network
management according to the development plans presented for each of them in their respective
sections.
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 72 of 78
Glossary, Acronyms and Definitions
5G 5th generation mobile networks
ACL Access Control List
ADE Anomaly Detection Ensemble
API Application Programming Interface
AUC Area Under Curve
BT Benchmarking tool
CD Continuous deployment
CI Continuous Integration
CSE CogNet Cognitive Smart Engine
CPU Compute Processing Unit
CSS Cascading Style Sheet
CSV Comma Separated Value
DDoS Distributed Denial of Service
DHCP Dynamic Host Configuration Protocol
DMZ De-Militarized Zone
DNS Domain Name System
DoS Denial of Service Attack
DSE Distributed Security Enablement
ER Engineering Release
FIFO First In, First Out Queuing
GBDT Gradient Boosted Decision Tree
GBM General Boosted Models
GTP GPRS Tunnelling System
GUI Graphical User Interface
KPI Key Parameter Indicators
ICMP Internet Control Message Protocol
IETF Internet Engineering Task Force
IoT Internet of Things
ISP Internet Service Provider
JAR Java Archive
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 73 of 78
KVM Kilo Virtual Machine
LSTM Long Short Term Memory
MAC Medium Access Control
MANO NFV Management & Orchestration
MAPE Monitor, Analyse, Plan, Execute autonomic process loop
ML Machine Learning
NAB Numenta Anomaly Benchmark
NAS Network Access Server
NFV Network Functions Virtualization
NFVM NFV Management
NFVO NFV Orchestrator
NF Network Function
NSD Network Service Descriptor
NTP Network Time Protocol
OPNFV Open Source Project for NFV
OSS Operations Support System
PDP Policy Decision Point
PEP Policy Enforcement Point
PoC Proof of Concept
RADIUS Remote Authentication Dial-In User Service
RAM Random Access Memory
REST Representational State Transfer
RF Random Forest
SPAM Unsolicited email
SDN Software Defined Networks
SFC Service Function Chain
SFP Service Function Path
SQL Structured Query Language
SP Service Providers
SPSS Statistical Package for the Social Sciences
SSH Secure Shell (connectivity)
SUT System Under Test
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 74 of 78
SVM Support Vector Machine
SYN Synchronize message to establish TCP connection
UE User Equipment
VIM Virtual Infrastructure Manager
VM Virtual Machine
VNF Virtual Network Function
XML Extended Mark-up Language
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 75 of 78
References
[1] http://www.etsi.org/deliver/etsi_gs/NFV-MAN/001_099/001/01.01.01_60/gs_nfv-
man001v010101p.pdf
[2] https://build.cognet.5g-ppp.eu:8080
[3] http://build.cognet.5g-ppp.eu:8080/view/wp5DSE/job/DSE_docker_instance/
[4] https://tools.ietf.org/html/draft-mm-wg-effect-encrypt-04
[5] www.kali.org
[6] https://www.thc.org/thc-hydra/
[7] Chandola, V.; Banerjee, A.; Kumar, V. (2009). "Anomaly detection: A survey". ACM
Computing Surveys]
[8] Teodora Sandra Buda, Haytham Assem, Lei Xu. "ADE: An ensemble approach for early
anomaly detection".15th IFIP/IEEE International Symposium on Integrated Network
Management (IM 2017). Mini-conference track.
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 76 of 78
Appendix A. Distributed Security
Enablement Testbed
A.1. API call to Create OpenFlow Firewall rule.
Script to post OpenFlow Firewall rule to drops packets on port 22 with source host ip address of
10.0.0.2.
curl -X PUT -d @L3port22FW -H "Content-Type: application/xml" -H "Accept: application/xml" --user
admin:admin http:// 162.13.119.228:8181/restconf/config/opendaylight-
inventory:nodes/node/openflow:1/table/0/flow/20
root@opendaylight:~/code# more L3port22FW
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<flow xmlns="urn:opendaylight:flow:inventory">
<strict>false</strict>
<instructions>
<instruction>
<order>0</order>
<apply-actions>
<action>
<order>0</order>
<drop-action/>
</action>
</apply-actions>
</instruction>
</instructions>
<table_id>0</table_id>
<id>20</id>
<cookie_mask>255</cookie_mask>
<installHw>false</installHw>
<match>
<ethernet-match>
<ethernet-type>
<type>2048</type>
</ethernet-type>
</ethernet-match>
<ipv4-source>10.0.0.2/32</ipv4-source>
</match>
<cookie>2</cookie>
<flow-name>IricentL3port22FW</flow-name>
<priority>200</priority>
<barrier>false</barrier>
<hard-timeout>200</hard-timeout>
<idle-timeout>400</idle-timeout>
</flow>
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 77 of 78
A.2. Sequence Diagrams
Distributed Security Enablement Platform
Distributed Security Enablement Gateway
D5.2 – Network Security and Resilience – Engineering Release 1
CogNet Version 1.0 Page 78 of 78
Distributed Security Enablement Prediction Engine
Distributed Security Enablement Firewall Module