Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction...

107
Faculty of Sciences Master’s dissertation submitted in order to obtain the academic degree of Master of Science in Computer Science Containerised cybersecurity lab for rapid and secure evaluation of threat mitigation tactics Thibault Van Geluwe de Berlaere Supervisor(s): Prof. Dr. Bruno Volckaert, Prof. Dr. ir. Filip De Turck Counsellor(s): Dr. ir. Tim Wauters, Andres Felipe Ocampo Palacio Academic year 2017-2018

Transcript of Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction...

Page 1: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

Faculty of Sciences

Master’s dissertation submitted in order to obtain the academic degree ofMaster of Science in Computer Science

Containerised cybersecurity lab for rapidand secure evaluation of threat

mitigation tactics

Thibault Van Geluwe de Berlaere

Supervisor(s): Prof. Dr. Bruno Volckaert, Prof. Dr. ir. Filip De TurckCounsellor(s): Dr. ir. Tim Wauters, Andres Felipe Ocampo Palacio

Academic year 2017-2018

Page 2: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …
Page 3: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

Faculty of Sciences

Master’s dissertation submitted in order to obtain the academic degree ofMaster of Science in Computer Science

Containerised cybersecurity lab for rapidand secure evaluation of threat

mitigation tactics

Thibault Van Geluwe de Berlaere

Supervisor(s): Prof. Dr. Bruno Volckaert, Prof. Dr. ir. Filip De TurckCounsellor(s): Dr. ir. Tim Wauters, Andres Felipe Ocampo Palacio

Academic year 2017-2018

Page 4: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

i

Preface and acknowledgments

This thesis was one of the biggest undertakings of my educational career. I hope that I will inspire like-minded

students (and researchers) to performmore research in this area, since it it a very interesting and challenging topic.

I couldn’t have done this without the continuous support of my supervisors, Bruno Volckaert and Filip De Turck,

and my counsellors, Tim Wauters and Andres Felipe Ocampo Palacio. Their time, support and insights proved indis-

pensable during this last year.

Moreover, I wish to thank Brecht Vermeulen, who helpedme set up the experiments and configure the technical

infrastructure on the Virtual Wall.

Finally I would like to thank my friends and family, for their unceasing support, love and friendship. Their

support was imperative to the success of this thesis, even though it may not always have been academically, like

providing me breakfast in the early mornings or another cup of coffee in the late nights when a stubborn bug

prevented my sleep.

Page 5: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

Permission for usage

The author gives permission to make this master dissertation available for consultation and to copy parts of this

master dissertation for personal use. In the case of any other use, the copyright terms have to be respected,

in particular with regard to the obligation to state expressly the source when quoting results from this master

dissertation.

Thibault Van Geluwe de Berlaere, June 2018

Page 6: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

CONTENTS iii

Contents

1 Introduction 1

2 Related Work 2

2.1 Network Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.1.1 Software Defined Networking (SDN) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.1.2 Virtual Switching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.1.3 Mininet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 Network Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2.1 Taxonomy & General Network Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2.2 Man-in-the-middle (MiTM) attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2.3 Distributed Denial-of-Service (DDoS) attacks . . . . . . . . . . . . . . . . . . . . . . . . 9

2.3 Intrusion Detection Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.3.1 Taxonomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.3.2 Anomaly-based Network Intrusion Detection Systems . . . . . . . . . . . . . . . . . . . 11

2.3.3 Anatomy of an IDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.4 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.4.1 Network Traffic Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.4.2 Metric Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.5 Data Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.5.1 Taxonomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.5.2 Frameworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.5.3 Using Big Data Frameworks in Network Traffic Analysis . . . . . . . . . . . . . . . . . . 23

3 Design 25

3.1 Anatomy of a Computer Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.2 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.2.1 General Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.2.2 Virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.2.3 Networking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

Page 7: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

CONTENTS iv

3.2.4 User Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.2.5 Infection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.2.6 Data Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.2.7 Data Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.2.8 Data Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.2.9 Data Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4 Implementation 31

4.1 Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.1.1 Programming Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.1.2 Hypervisor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.1.3 Virtual Networking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.1.4 User Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.1.5 Data Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.1.6 Data Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.1.7 Data Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.2 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.2.1 Building Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.2.2 Client . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.2.3 Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

4.2.4 Guest Additions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4.3 User Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.4 Data Storage & Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.5 Ansible . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

5 Evaluation 56

5.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5.1.1 General Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5.1.2 Virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

5.1.3 Networking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

5.1.4 User Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

5.1.5 Infection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

5.1.6 Data Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

5.1.7 Data Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

5.1.8 Data Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

5.1.9 Data Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

5.2 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

5.2.1 Virtualization Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

Page 8: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

CONTENTS v

5.2.2 Network Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

5.2.3 Data Storage Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

5.2.4 Data Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

5.2.5 Guest Additions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

6 Conclusion 66

7 Future Work 67

7.1 User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

7.2 Virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

7.3 Networking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

7.4 Guest Additions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

7.5 Data Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

A Abstract (NL) 72

B Non-Technical Synopsis 73

C Screenshots 74

D Configuration 78

D.1 XML Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

D.1.1 Client XML Configuration File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

D.1.2 Scenario XML Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

D.1.3 Procedure XML Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

D.1.4 HostTemplate XML Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

D.2 Program Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

D.2.1 Client Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

D.2.2 Server Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

D.2.3 User Simulation Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

D.3 Ansible Playbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

Page 9: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

CONTENTS vi

Abstract

Just a generation ago, ‘cyber security’ was an infrequently heard term. Nowadays, not a week passes without some

news report about a major breach in a vital computerized system, sometimes the result of a random exploit, but

more often targeted – possibly state-sponsored – attacks on vital infrastructure targets.

More and more aspects of our daily lives are getting computerized, but much less diligence goes to how these

systems will affect our lives when they misbehave (either maliciously or by deficiency), nor how they will decrease

our quality of life if they – even temporarily – become unavailable. For example, Distributed Denial of Service

(DDoS) attacks are highly disruptive, both for the victim and for the users. In order to perform research in methods

and techniques to counter these threats, there is a clear need by security researchers to reproduce these attacks

in a closed-loop, isolated environment.

In this thesis, we have built ChiefNet. ChiefNet is a framework designed for security analysts, featuring an easy-

to-use, rapid-prototyping, scale-out fabric for constructing isolated, secure environments through virtualization

and virtual networks. ChiefNet allows researchers to rapidly evaluate different threat detection and mitigation

tactics and perform in-depth, post-attack analysis of network traffic.

ChiefNet emulates a real-life network by mimicking regular user behavior: sending emails, opening attach-

ments, browsing the web, transferring files, etc. This allows for a more authentic analysis, since most network

traffic isn’t malicious – just like a real network.

Page 10: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

LIST OF FIGURES vii

List of Figures

2.1 A framework for SDN-based Security Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.2 The ARP Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.3 ARP Poisoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.4 Distribution of DDoS attacks by type, Q4 2017 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.5 The defensive life cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.6 General architecture of an Intrusion Detection System . . . . . . . . . . . . . . . . . . . . . . . . 15

2.7 General idea of a micro-batch processing framework. . . . . . . . . . . . . . . . . . . . . . . . . 20

2.8 Integration of Apache Samza and Apache Kafka. . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.1 A ‘real world’ computer network. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.1 Used technologies in storing and processing data . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.2 Client-Server connections in ChiefNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4.3 Actions performed when the ChiefNet client starts . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.4 Creating a VM in ChiefNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.5 Creating a vSwitch in ChiefNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.6 Visual explanation of snapshotting using qcow2’s copy-on-write feature . . . . . . . . . . . . . 46

4.7 Guest Communication: Method of Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

5.1 Topology of the performed network tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

C.1 The ScenarioBuilder interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

C.2 The ScheduleManager interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

C.3 The VirtualSwitchManager interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

C.4 The configuration options for adding a new scheduled action . . . . . . . . . . . . . . . . . . . . 77

C.5 The interface for creating or editing a Procedure . . . . . . . . . . . . . . . . . . . . . . . . . 77

Page 11: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

LIST OF TABLES viii

List of Tables

3.1 Translation of network entities to ChiefNet components . . . . . . . . . . . . . . . . . . . . . . . 26

4.1 List of the different endpoints and their category in the server component of ChiefNet . . . . . . 45

5.1 Evaluation of the requirements listed in Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . . 57

5.2 Network Performance Test Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

Page 12: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

LIST OF TABLES ix

List of Abbreviations

AI Artificial Intelligence

API Application Programming Interface

ARP Address Resolution Protocol

CSV Comma Separated Value Files

DDoS Distributed Denial-of-Service attack

DFS Distributed File System

DMZ Demilitarized Zone

DNS Domain Name Service

DSC PowerShell Desired State Configuration

FTP File Transfer Protocol

GUI Graphical User Interface

HDFS Hadoop Distributed File System

HTTP HyperText Transfer Protocol

IDS Intrusion Detection System

IMAP Internet Message Access Protocol

IP Internet Protocol

IPS Intrusion Prevention System

IRS Intrusion Response System

KVM Kernel-based Virtual Machine

LAN Local Area Network

MAC Media Access Control Address

Page 13: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

LIST OF TABLES x

MiTM Man-in-the-Middle

ML Machine Learning

NIC Network Interface Card

OS Operating System

OVS Open vSwitch

PCAP Packet Capture Files

POP3 Post Office Protocol (version 3)

PXE Preboot Execution Environment

RDD Resilient Distributed Dataset

RMI Remote Method Invocation

SDN Software Defined Networking

SDWN Software Defined Wireless Networking

SMB Server Message Block

SMTP Simple Mail Transfer Protocol

TCP Transmission Control Protocol

TCP/IP Transmission Control Protocol/Internet Protocol network

TSDB Time-Series Database

VLAN Virtual Local Area Network

VM Virtual Machine

XML Extensible Markup Language

YARN Yet Another Resource Negotiator

Page 14: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

INTRODUCTION 1

Chapter 1

Introduction

In order to efficiently research different threat detection and mitigation techniques, security researchers need an

easy-to-use, versatile and powerful tool to reproduce real-life threats in a secure, isolated environment.

This tool must give the operator the maximum possible freedom, without being needlessly complicated. The

software should be extensible, versatile and flexible, but also easy-to-use and allow for quick deployment.

To this end, I developed ChiefNet. ChiefNet is a framework which allows researchers to easily set up isolated, secure

environments through virtualization and virtual networks. The goal of ChiefNet is to allow researchers to rapidly

evaluate different threat detection and mitigation tactics and perform in-depth analysis of the impact of different

threats on the network. From this analysis, intrusion detection systems can be built and deployed in actual, physical

networks.

Moreover, ChiefNet emulates an authentic network by mimicking regular user behavior: sending emails, open-

ing attachments, browsing the web, transferring files, etc. We’ll need these ‘users’ to establish a baseline, from

which the intrusion detection systems will detect anomalies.

We’ll start by looking at the current state-of-the-art of security research tools in our literature study, found in

Chapter 2. After this, we describe the software requirements of ChiefNet in Chapter 3. Next, in Chapter 4, we move

on to the implementation of the aforementioned requirements. Subsequently, the performance of the framework

(and its overhead) is evaluated in Chapter 5. We describe our conclusions in Chapter 6. Finally, in Chapter 7, we

speculate on some possible extensions and future endeavors.

Page 15: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

RELATED WORK 2

Chapter 2

Related Work

In order to research this topic, we analyze at the current state-of-the-art in security research tools. We’ll do this by

performing a structured literature study of all essential (sub)topics.

We identify the following five fundamental building blocks of ChiefNet: network modeling, network attacks,

intrusion detection systems, data collection and data processing. We provide a short introduction here, and cover

these topics in more details over the next few sections.

Network modeling Network modeling has been covered in many works, as this is a complicated problem. How-

ever, we’re mainly interested in topics where security is one of the principal components.

Network attacks Network attacks are at the core of ChiefNet. After all, network-based threats are one of the

biggest categories in the threat landscape. In order to build defenses, we need to know how these attacks work,

where they’re strong and where they’re weak. We’ll cover popular existing network-based attacks, and possible

detection measures.

Intrusion Detection Systems (IDS) In order to build a framework that facilitates the development of Intru-

sion Detection Systems, we need to know the in’s and out’s of an IDS. Different techniques exist, each with their

(dis)advantages in specific situations, so we want ChiefNet to support as many of them as possible. We’ll also cover

IDS architecture and the defensive life cycle.

Data collection Data collection is an invaluable part of our framework. The collected data will form the input

to the IDS system and provide insight into the inner workings of the network. ChiefNet will have to store different

types of data, ranging from network capture files to runtime operating system metrics.

Data processing Data processing will form the core of the platform on which the Intrusion Detection Systems

will run. With scalability in mind, we’ll have to use special processing techniques to allow ChiefNet to process all

Page 16: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

2.1 Network Modeling 3

the incoming data. Additionally, to avoid the processing framework to get congested, we’ll have to investigate

processing techniques based on a ‘streaming’ approach.

2.1 Network Modeling

To understand what a network contains and how it works, we dive into select research papers. The first subject

we’ll cover is Software Defined Networking (SDN). Next, we’ll look into the state-of-the-art in virtual switching.

To finish up, we cover Mininet, a popular SDN emulator designed for rapid prototyping.

2.1.1 Software Defined Networking (SDN)

Software Defined Networking (SDN) is a relatively new approach to network configuration, aiming to centralize

network intelligence in one network component (the SDN controller), while disassociating the forwarding process

of network packets (the ‘Data Plane’) from the routing process (the ‘Control Plane’) [1].

Essentially, SDN moves the logic of switching and routing (what to do with incoming packets) to a centralized

entity (the controller). These controllers are considered the brain of the network. This centralized approach fa-

cilitates network management and monitoring, but is also affected by the drawbacks of centralization: security,

scalability and elasticity are the main issues of SDN.

SDN and Security

The centralized nature of SDN allows for some interesting security services: since the controllers act as the brains

of the network, they canmanipulate the traffic flows. The authors of [2] propose some techniques for incorporating

SDN into a security framework. They introduce two examples: a centralized firewall system and a centralized DDoS

mitigation system. We’ll cover DDoS attacks in section 2.2.3.

Figure 2.1 shows the generic overview of such a security service. The security application (a firewall, a DDoS

mitigation system)manages the SDN controller, which in turn controls the physical switches. The advantage of such

an architecture is the flexibility: one is not limited to a particular hardware vendor, and the security application

only needs to be able to communicate using an SDN-compatible communication protocol.

Page 17: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

2.1 Network Modeling 4

Figure 2.1: A framework for SDN-based Security Services

Image by Jeong et al. [2]

2.1.2 Virtual Switching

Virtual switching is the concept of switching (data transfer) in a virtualized environment. Virtual switches are gen-

erally feature-equivalentwith their hardware counterparts, butmuchmoreflexible in terms of ease-of-deployment.

Since we’re building a software package that will depend heavily on virtualization, virtual switching is an

important factor in ChiefNet.

Open vSwitch (OVS)

In [3], Pettit et al. discuss the needs of virtual switching and introduce themost popular open-source virtual switch-

ing library to date: Open vSwitch (OVS).

Open vSwitch is a production quality, multilayer virtual switch licensed under the open source

Apache 2.0 license. It is designed to enable massive network automation through programmatic

extension, while still supporting standardmanagement interfaces and protocols (e.g. NetFlow, sFlow,

IPFIX, RSPAN, CLI, LACP, 802.1ag). In addition, it is designed to support distribution across multiple

physical servers similar to VMware’s vNetwork distributed vSwitch or Cisco’s Nexus 1000V [4].

The above extract from the Open vSwitch website shows that this is a very versatile, production-ready library,

which can certainly be interesting in the development of ChiefNet. In addition, it supports the Software-Defined

Networking principle.

Open vSwitch can be easily extended, as described in [5]. In this article, Gorja and Kurapati have improved OVS

to include support for Layer 4-7 services (e.g. load balances, proxies, firewalls).

Page 18: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

2.1 Network Modeling 5

A short presentation about combining OVS with Docker can be found in [6].

2.1.3 Mininet

When discussing network modeling, Mininet is one of the most prominent software packages for network emula-

tion.

Mininet is an emulator for deploying large networks on the limited resources of a simple single machine. It

was created for enabling research in SDN and OpenFlow, and allows creating topologies of very large scale size up

to thousands of nodes [7] [8]. OpenFlow is a standard protocol that is used to provide communication between the

‘control plane’ and the ‘data plane’. See section 2.1.1 for more information about the control and data plane.

Inner working of Mininet

Mininet allows for extremely large deployments on very limited resources through a simple mechanism: instead of

the traditional virtualization through Type-1 or Type-2 hypervisors (see [9]), Mininet uses a process-based approach

using namespaces: every virtual host in the topology is a seperate bash process running in a separate namespace

[10]. This feature is only available in Linux kernels (version 2.2.26 and above).

Limitations

This results in some (very obvious) limitations: Mininet does not support non-Linux hosts (e.g. Windows, Mac OSX,

BSD). This is by design: Mininet is not a virtualization platform. In a network emulation suite, the type of hosts

in the network don’t really matter, the operator simply needs access to some simple commands (e.g. tcpdump,

ping, …).

Another limitation of Mininet is the scalability: although the topologies can be very large on limited resources;

scaling Mininet is only possible in a ‘scale-up’ manner. Scale up (also called ‘vertical scaling’) means to add re-

sources to (or remove resources from) a single node in a system, typically involving the addition of CPUs or memory

to a single computer [11]. The other category of scaling is called ‘scale out’ (also called ‘horizontal scaling’). Scale

out means to add more nodes to (or remove nodes from) a system, such as adding a new computer to a distributed

software application. [11]. The authors of [12] surpassed this limitation by extending Mininet to support distributed

topologies (over multiple physical nodes), thereby achieving ’scale out’ scalability. They aptly called the extension

‘Maxinet’.

It is worth noting that Mininet has some limitations; but is also very easily extended, as demonstrated in [13]:

Fontes et al. extended Mininet with a WiFi plugin, which introduces Software Defined Wireless Networking (SDWN).

SDWN aims at providing programmatic, centralized control of the network outside the wireless boxes (APs) which

enforce the received instructions from the controller, and remain responsible for the transmission and reception

of the wireless traffic [13].

Page 19: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

2.2 Network Attacks 6

2.2 Network Attacks

The vast majority of computer attacks fall under the category ‘Network attacks’.

A network attack can be defined as any method, process, or means used to maliciously attempt to

compromise network security. [14]

In the next section we’ll cover the taxonomy of network attacks and look briefly at some popular network attacks.

We then continue by providing a more detailed view of two very common network attacks: Man-in-The-Middle

(MiTM) attacks and Distributed Denial-of-Service (DDoS) attacks.

For a very thorough list of computer attacks, we refer the reader to Table 2 in Anwar et al.’s work “From Intrusion

Detection to an Intrusion Response System: Fundamentals, Requirements, and Future Directions” [15].

2.2.1 Taxonomy & General Network Attacks

A network attack can take many forms. We can further sub-categorize network attacks into 3 subcategories: active

attacks, passive attacks and disruptive attacks.

Active attacks covermost network attacks. They are focused, targeted attacks on a specific part of the network,

or even on a single network entity. Trojan Horse, Packet forging, Port scanning and Man-in-the-middle attacks are

all examples of active attacks. These attacks are easier to detect than passive attacks, and are also more easily

counteracted.

Passive attacks are more ‘quiet’ attacks. They can be targeted, but are mostly very broad attacks, covering a

large part of the network. Examples of passive attacks are packet capturing, password sniffing, DNS spoofing, ARP

spoofing, Cross-Site Request Forgery, SQL injection and Fingerprinting attacks. Because of the ‘low key’ nature of

these kind of attacks, they are much harder to detect and vary in difficulty to neutralize.

Disruptive attacks are targeted, catastrophic attacks, designed to disturb or even damage the victim. The

main attack in this category is a Distributed Denial-of-Service (DDoS) attack. We’ll cover DDoS attacks in more de-

tail in section 2.2.3.

There are many more aspects to network attacks, but these are out of scope for this thesis. For the interested

reader, I can strongly recommend “Network attacks: Taxonomy, tools and systems” [16] by Hoque et al.

The next two sections cover two specific attacks: Man-in-the-middle attacks and Distributed Denial-of-Service

(DDoS) attacks.

2.2.2 Man-in-the-middle (MiTM) attacks

A man-in-the-middle attack is a very popular passive network attack. A man-in-the-middle attack (MiTM) is an

attack where the attacker secretly relays and possibly alters the communication between two parties who believe

they are directly communicating with eachother [17].

Page 20: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

2.2 Network Attacks 7

The goal of a MiTM attack can be both active (e.g. denial of service by filtering packets) or passive (e.g. eaves-

dropping).

Example of a MiTM attack: ARP poisoning

An ARP poisoning (or ‘spoofing’) attack is a kind of a MiTM attack. It operates on Layer 2 (Data Link Layer) of the

OSI model [18].

ARP Protocol The ARP protocol (or Address Resolution Protocol) is used by hosts to map a particular IP address

to a hardware address (e.g. the MAC address), so that packets can be transmitted from the source host to the

destination host. The reader can follow the process in Figure 2.2.

In normal operation (e.a. without any ARP poisoning), the ARP protocol works as follows:

1. Host A wishes to send a packet to Host B. Host A knows the IP address of Host B, but doesn’t know the MAC

address.

2. Host A sends a broadcast ARP message to the network, asking for the MAC address of Host B’s IP address.

3. Host B sees the request from Host A (because it was broadcast), and answers with another broadcast mes-

sage, containing Host B’s MAC address.

4. Host A sees the response from Host B (because it was broadcast). Host A now knows the MAC address of

Host B. Host A updates it’s ARP cache table with the newly acquired MAC address of Host B.

5. Host A can now send its packet.

We mentioned the ‘ARP cache’. Every host keeps this cache in its memory. The cache is used to avoid an ARP

lookup for every packet. It contains a list of known (IP,MAC) tuples, avoiding the need to constantly request the

MAC addresses when regularly communicating with the same destination IP address. The entries in the table are

periodically refreshed, to ensure up-to-date information. The refresh rate depends on the operating system.

Page 21: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

2.2 Network Attacks 8

Figure 2.2: The ARP Protocol

ARP Poisoning To perform an ARP poisoning attack, the attacker has to place himself between the sender and

the receiver by faking an answer to the ARP request from the sender. In this case, Host C will answer the ARP

request from Host A. The reader can follow the process in Figure 2.3.

An ARP poisoning attack thus works as follows:

1. Host A wishes to send a packet to Host B. Host A knows the IP address of Host B, but doesn’t know the MAC

address.

2. Host A sends a broadcast ARP message to the network, asking for the IP address of Host B.

3. Host B sees the request from Host A (because it was broadcast), and answers with another broadcast mes-

sage, containing Host B’s MAC address.

4. Host A sees the response from Host B and updates its ARP cache.

5. However, Host C also sees the request from Host A (because it was broadcast), and answers Host A’s request

too, containing Host C’s MAC address but with Host B’s IP address.

6. Because Host C answered after Host B, Host A also sees Host C’s response and (again) updates its ARP cache.

Host A now thinks it knows the MAC address of Host B, but in reality this MAC address belongs to Host C.

7. Host A sends the packet to the received MAC address (Host C’s MAC address).

8. Host C receives the packet and has the full choice of what to do with this packet. Host C can store & forward

this packet (sniffing), it can drop the packet (filtering) or can modify the packet (altering).

All subsequent packets flowing from Host A to Host B will pass through Host C, because of the existence of the

tuple (IPHostB , MACHostC ) in the ARP cache of Host A. If Host C wishes to intercept traffic from Host B to

Host A, a similar poisoning is required.

Page 22: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

2.2 Network Attacks 9

Figure 2.3: ARP Poisoning

Detection of a MiTM attack Various detection techniques for MiTM attacks have been proposed in the past.

Trabelsi and Shuaib listed multiple methods in their work “NIS04-4: Man in the middle intrusion detection” [19],

including detection of hosts with enabled IP packet routing and detection of ARP cache poisoning. They also enu-

merate tools which detect ARP poisoning attacks, such as arpwatch [20] and the popular open-source IDS/IPS

snort [21].

2.2.3 Distributed Denial-of-Service (DDoS) attacks

A Distributed Denial-of-Service (DDoS) attack is an attack with the purpose of preventing legitimate

users from using a specified network resource (e.g. a website, a file server). It’s a coordinated attack

on the availability of services of a given target sytem or network, that is launched indirectly through

many (often compromised) computing systems. The services under attack are called the ‘primary

victim’, while the compromised systems are called the ‘secondary victims’ [22].

DDoS attacks remain an extremely popular attack in today’s environment. They are a daily hassle for systems

administrators around the globe. Arbor Networks published some statistics in their article 2017 DDoS Attack Activity

[23]: in the 272 recorded days of 2017, 6.1 million DDoS attacks were detected. That averages to about 22,426

attacked hosts per day, or 15 victims per minute.

One can view the currently ongoing DDoS attacks on Digital Attack Map (www.digitalattackmap.com)

[24].

Taxonomy Specht and Lee [22] describe the two different types of DDoS attacks: Bandwidth depletion attacks

and Resource depletion attacks.

Bandwidth depletion attacks can be further divided into Flood attacks and Amplification attacks. A flood attack

involves zombies sending large volumes of traffic to a victim system, to congest the victim system’s network band-

width with IP traffic [22]. UDP flooding and ICMP flooding are examples of flood attacks. An amplification attack

involves the attacker or the zombies sending messages to a broadcast IP address, using this to cause all systems

in the subnet reached by the broadcast address to send a reply to the victim system [22]. Bandwidth depletion

attacks aren’t very popular anymore today, since they are easier to thwart.

Page 23: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

2.3 Intrusion Detection Systems 10

Resource depletion attacks are carefully created attacks which misuse the network protocol and thereby com-

promise the target system. Examples include TCP SYN attacks, TCP PUSH+ACK attacks and HTTP attacks.

Most current DDoS attacks are listed in the above examples, as shown in Figure 2.4. In Figure 2.4, we can see

that the TCP SYN attack is by far the most popular, followed by the UDP flooding and TCP PUSH+ACK attack.

Figure 2.4: Distribution of DDoS attacks by type, Q4 2017

Image by Kaspersky Labs [25]

Specht and Lee also describe some simple DDoS countermeasures, such as disabling the Command & Control

center, egress filtering and preventing secondary victim infection [22].

2.3 Intrusion Detection Systems

There are many ways to categorize Intrusion Detection Systems; we’ll cover the most important below.

2.3.1 Taxonomy

By information source Intrusion Detection Systems (IDS) can be classified into two big categories, depending

on the type of information considered.

A host-based IDS analyzes events such as process identifiers and system calls, mainly related to OS information.

A network-based IDS analyzes network related events: traffic volume, IP addresses, service ports, protocol

usage, e.g. [26].

By type of analysis We can also classify Intrusion Detection Systems by the type of analysis carried out.

Page 24: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

2.3 Intrusion Detection Systems 11

Signature-based schemes, also called misuse schemes, seek defined patterns, or signatures, within the ana-

lyzed data. For this purpose, a signature database corresponding to known attacks is specified beforehand.

On the other hand, anomaly-based detectors estimate the ‘normal’ state of a system, and generate an alarm

whenever the recorded state deviates from this prerecorded ‘normal’ state [26].

By detection method Yet another classification can be made by looking at the detection method of the IDS.

Three main classes are distinguished:

Statistical-based methods use statistics & mathematical functions for detecting signatures and anomalies. For

example, in an anomaly-based IDS, the source events are captured and a profile (the ‘normal state’) representing

its stochastic behavior, is created. Two datasets of events are considered during the anomaly detection process:

one corresponds to the currently observed profile, and the other is for the previously trained statistical profile.

As the events occur, the current profile is constructed and an anomaly score is estimated by comparing the two

behaviors. The score normally indicates the degree of irregularity for a specific event, such that the IDS will flag

the occurrence of an anomaly when the score surpasses a certain threshold [26].

Knowledge-based methods use a set of rules. These rules are structured like an IF-THEN programmatic state-

ment: it specifies a list of requirements for the rule to be activated, after which the result of the rule will be applied

to the source event. The main drawback of knowledge-based methods is that the development of high-quality

knowledge (e.a. high-quality rules) is often difficult and time-consuming. An example of a knowledge-based IDS

is the popular open-source IDS snort [21, 27].

Machine learning-based methods use Artificial Intelligence (AI) and Machine Learning (ML) approaches to es-

tablish a ‘baseline’ model. This ‘baseline’ model corresponds to the ‘normal state’ in statistical-basedmethods. The

big difference between statistical-based and machine learning-based methods is that ML methods have the ability

to change their execution strategy as it acquires new information. In other words, the ‘normal state’ is forever-

learning: a persistent change in the environment will initially be flagged as an anomaly, but will eventually be

incorporated into the ‘normal’ state.

As ChiefNet is aimed towards anomaly-based IDSs, we will cover these in more detail in the next section.

2.3.2 Anomaly-based Network Intrusion Detection Systems

We now focus on anomaly-based network intrusion detection systems.

Statistical based

Statistical models can be used in anomaly detection, for example, by capturing a live network stream and build-

ing a profile. This profile, called the ‘normal state’, can then be compared to other network streams to classify

them as ‘normal’ or ‘abnormal’ (or ‘anomaly’). Statistical models can be further classified into univariate models,

multivariate models and time series models.

Page 25: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

2.3 Intrusion Detection Systems 12

• Univariate models represent the parameters as independent Gaussian random variables, thus defining an

acceptible range for every variable [26].

• Multivariatemodels also consider correlations between two ormore variables. Thesemodels perform better

because experimental data has shown that a better level of discrimination can be obtained from combina-

tions of related variables, rather than individual parameter comparisons [26].

• Time seriesmodels use an interval timer, together with an event counter and take into account the order and

the inter-arrival times of the observations as well as their values. Thus, an observed event will be labelled

as abnormal if its probability of occurence is too low at a given time [26].

In addition, Garcia-Teodoro et al. lay out somemajor drawbacks to statistical-based detectionmethods in [26]. First,

this kind of IDS is susceptible to be wrongly trained by the attacker (to make the ‘normal state’ contain malicious

entries). Second, not all variables in a network stream can be modeled by stochastic methods. Furthermore, these

schemes all rely on the assumption that the ‘normal state’ of a network won’t change in the future, which is not

very realistic.

Knowledge based

The knowledge-based detection methods, or ‘expert system approach’, is the most widely used detection method

today. This prevalence can be attributed to the simplicity of the method: a set of rules makes up the detection

strategy. These rules are structured much like an IF-THEN programming statement, for example:

IF

(the number of packets from this IP address) > $threshold

THEN

(mark this flow as a DDoS attack)

As previously stated, the main drawback of knowledge-based methods is that the development of high-quality

knowledge (e.a. high-quality rules) is often difficult and time-consuming. Another disadvantage of knowledge

based techniques is their inability to detect unknown attacks: their performance depends on the knowledge avail-

able to the operator.

Machine learning based

Machine learning is a field of computer science that uses statistical techniques to give computer sys-

tems the ability to ”learn” (i.e. progressively improve performance) with data, without being explicitly

programmed. [28]

In many cases, machine learning techniques are very similar to statistical techniques. The big difference lies

in the execution strategy: statistical models are trained once, and are as such invariant. Machine learning models

are trained throughout the execution process, which enables them to modify their model according to (steady)

changes in the environment.

Page 26: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

2.3 Intrusion Detection Systems 13

Many different machine learning algorithms exist, we will only cover the most common here.

• Bayesian networks are models that encode probable relationships between the variables of interest. This

scheme has the advantage that it doesn’t treat input variables as independent, instead, it searches for cor-

relations and inter-dependencies.

Bayesian networks perform very well in intrusion detection systems, as researched by Hamid, Sugumaran,

and Journaux in “Machine Learning Techniques for Intrusion Detection: A Comparative Analysis” [29]. In

their experiments, Bayesian networks achieved a correct classification rate of 99.670%.

However, as noted by Kruegel et al. in [30], Bayesian networks have major disadvantages. For instance, they

require a very high computational effort, while the results are highly dependent on the assumptions about

the behavior of the source system. The computation time was not a performance metric in the aforemen-

tioned study, this should be taken into account when comparing the results.

• Markov models are based on the assumption that the input system has a set of states, but the way of

transitioning between these states is unknown. Markov models estimate these transitions by constructing

a graph, where the vertices represent system states and the edges represent transitions. Each edge has a

probability, which represents the probability to move from one vertice (state) to the other.

Markov models have been extensively used in IDS research, both in host-based systems as in network-based

systems [31, 32]. The accuracy (correct classification rate) is measured at around 97% [33].

Nonetheless, Markov models suffer from the same deficiencies as Bayesian networks: they are highly de-

pendent on the assumptions about the accepted behavior for the source system.

• Genetic algorithms are inspired by evolutionary biology: they use techniques such as inheritance, mutation,

selection and recombination. These techniques are well known for their capability of deriving classification

rules and selecting appropriate parameters for the detection process [26].

These algorithms solve the deficiencies of Bayesian networks and Markov models: they require much less

assumptions about the source system. Genetic algorithms ‘learn’ the importance of input parameters. [34]

The main flaw in genetic algorithms is the high resource consumption involved.

• A neural network is a series of algorithms that attempts to identify underlying relationships in a set of data

by using a process that mimics the way the human brain operates [35]. An IDS architecture is proposed by

Zhang et al. in “HIDE: a hierarchical network intrusion detection system using statistical preprocessing and

neural network classification” [36], containing neural networks as the primary classification technique.

The accuracy of neural networks is measured at 98.75% [33].

Because of the way a neural network works, there is one fundamental limitation to these techniques: a

neural network cannot provide a descriptive reason as to why a particular detection decision has beenmade.

In other words, a neural network can tell you that some input event is detected as malicious, but can’t tell

you why.

Page 27: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

2.3 Intrusion Detection Systems 14

2.3.3 Anatomy of an IDS

The above section provided us with a clear overview of the different techniques used in intrusion detection systems.

Detection techniques are the main aspect of IDS’s, but certainly aren’t the only ones. The next sections cover select

other components of an IDS: The Defensive Life Cycle, the difference between intrusion detection, prevention and

response systems, general architecture and some mitigation strategies.

The Defensive Life Cycle The defensive life cycle represents the course of actions when an event is submitted

to the IDS.

• The Prevention phase tries to stop attacks before they happen. This is usually done by an Intrusion Preven-

tion System (IPS), as we will see in the next paragraph.

• Detection is performed by the IDS, using one or more of the techniques we covered previously.

• Mitigation and Response is only performed if an attack was executed (and detected). This is performed by

an Intrusion Response System (IRS), as we will see in the next paragraph.

• In the last phase, Update, the system examines the applied response. In other words, if the attack was

successfully thwarted, no further action is required. If not, additional defensive measures can be activated.

A graphical representation of the defensive life cycle can be seen in Figure 2.5.

Figure 2.5: The defensive life cycle

Image by Anwar et al. [15]

Intrusion Detection, Prevention and Response Systems There are three types of Intrusion Detection Systems

we’ll cover: intrusion tolerance systems, intrusion prevention system (IPS), and Intrusion Response Systems (IRS)

[15].

Page 28: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

2.3 Intrusion Detection Systems 15

An Intrusion Tolerant System has the capability to maintain its integrity, confidentiality, and availability even

when some of its components are infected or offline.

An Intrusion Prevention System is a proactive system: it tries to stop attacks before they occur. Examples

include firewalls and DDoS mitigation systems.

The Intrusion Reponse System is only activated when the IPS fails, and the attack is executed (and detected).

However, this response must be adequately suited to the nature of the attack: an incoming DDoS attack won’t

be mitigated by a malware scan. A thorough list of attacks (and proposed responses) can be found in Anwar et

al.’s work “From Intrusion Detection to an Intrusion Response System: Fundamentals, Requirements, and Future

Directions” [15].

Figure 2.6: General architecture of an Intrusion Detection System

Image by Kizza [37]

Architecture The general architecture of an IDS system is show in Figure 2.6.

Lots of end-devices (smartphones, laptops, desktops) are connected to a server through a computer network

(e.g. WiFi, Ethernet). This server acts as the first gateway for these end-devices. The server thus has the ability to

modify or filter traffic to these end-devices, which makes it an an ideal candidate to function as an IRS.

The server is connected to an IDS node. We will call this node the secondary IDS node, for reasons which

will become clear in the next paragraph. This node is located behind the main firewall, and performs intrusion

detection for all nodes inside the network. Since this IDS node is located between the firewall and the server, all

traffic to/from end-devices flows through here. The secondary IDS node is connected to an IRS: detected threats

will be passed to the IRS and consecutively mitigated.

Connected to the secondary IDS (a network-based IDS), the host-based IDS system is actively monitoring all

devices in this network.

The secondary IDS node is connected to the main firewall. The firewall is an Intrusion Prevention System (IPS),

Page 29: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

2.4 Data Collection 16

filtering traffic as it flows between the internal network and the (untrusted) internet.

The last node in the network is the primary IDS node. This node is directly connected to the internet. It might

seem strange to have two network-based IDS systems, but the reason is fairly simple: because the firewall acts as

an IPS, the secondary IDS node might not detect attacks that were mitigated by the firewall. Therefore, we place a

similar IDS before the firewall, which will detect these thwarted attacks. The primary IDS is not connected to any

IPS or IRS.

Mitigation In [15], the authors list some simple response options which can be used by an IRS to mitigate an

attack:

• Do nothing

• Generate an alarm

• Isolate the affected node

• Completely disconnect the node from the network

• Relocate the affected node to a secure environment (e.g. a DMZ)

• Disable the affected service

• Start a virus- and/or malware scan

• Create a backup of the infected device

2.4 Data Collection

In order to provide a ‘big data approach’ to intrusion detection systems, ChiefNet needs to collect and store very

large amounts of data. In this section, we’ll briefly cover a limited selection of popular ways of storing vast amounts

of data.

In ChiefNet, we identify two big categories of data to collect: network traffic and metrics.

2.4.1 Network Traffic Storage

Network traffic is certainly the largest data chunk in terms of storage. ChiefNet will capture and store network

traffic flowing in the virtual network. This network traffic will be processed in real time, so speed is an important

factor.

This presents us with two challenges: a) storing vast amounts of data, and b) defining a format to store the

date (in our case, network traffic capture files).

We’ll cover these challenges in the next two sections.

Storage

In order to store very large amounts of data, which could very well excess the boundaries of what a single storage

node could provide, we need a way of splitting up this data and storing it on a collection of storage nodes.

Page 30: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

2.4 Data Collection 17

A distributed file system (DFS) or network file system is any file system that allows access to files frommultiple

hosts sharing via a computer network. This makes it possible for multiple users on multiple machines to share files

and storage resources [38].

There are a number of popular DFS solutions freely available, we’ll list some here:

• CephFS Ceph is a free-software storage platform that implements object storage on a distributed com-

puter cluster, and provides interfaces for object-, block- and file-level storage. Ceph aims primarily for

completely distributed operation without a single point of failure, scalable to the exabyte level. Ceph’s file

system (CephFS) runs on top of the same object storage system that provides object storage and block device

interfaces [39].

• GlusterFS GlusterFS is a scale-out network-attached storage file system. It has found applications includ-

ing cloud computing, streaming media services, and content delivery networks. GlusterFS was developed

originally by Gluster, Inc. and then by Red Hat, Inc., as a result of Red Hat acquiring Gluster in 2011 [40].

• HDFS Apache Hadoop is a collection of open-source software utilities that facilitate using a network of many

computers to solve problems involving massive amounts of data and computation. It provides a software

framework for distributed storage and processing of big data using the MapReduce programming model.

Hadoop Distributed File System (HDFS) is a distributed, scalable, and portable file system written in Java for

the Hadoop framework. It can store very large files (typically in the range of gigabytes to terabytes) across

multiple machines [41].

A more complete list of DFS software can be found in [38].

Format

We now have some options available to us as to how we’re going to store very large files across multiple machines,

but the question of how we’re going to structure data in this filesystem remains unanswered.

In the case of network traffic capture, threemain file formats are available: structured text files, Apache Parquet

and raw PCAP.

• Text Text files, in the form of formatted CSV files, are the most simple option of the three. They store

packets in a line-by-line fashion, where each line contains the captured properties of that packet. Research

has determined that text may be suitable for the analysis of network data on Hadoop, especially when older

routers generate network flow data in plain text format. However, these studies have also determined that

a more sophisticated data format such as SequenceFiles or Parquet may be more beneficial for network

analytics to yield better compression and easier parsing [42, 43]

• Parquet Parquet is a columnar storage format built for the Hadoop ecosystem based off Google’s Dremel

system. It was optimized both for large-scale query processing and storage through multiple supported

compression formats [44, 43]. An issue with Parquet is that little research has been done to study the

processing required for converting the raw PCAP data into Parquet formats.

Page 31: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

2.4 Data Collection 18

• PCAP Packet Capture files (PCAP) are the native format in which most network capture tools will capture the

network traffic. PCAP is a simple format, where all packets are stored in an sequentially ordered fashion.

PCAP thus has the advantage that no additional conversion must be performed before the processing can

begin, allowing more rapid data processing. However, research shows that PCAP may be too CPU intensive

for processing [45].

In Saavedra and Yu’s work “A Comparison between Text, Parquet, and PCAP Formats for Use in Distributed

Network Flow Analysis on Hadoop” [43], these three formats are compared and evaluated for use in an Apache

Hive [46] system. Apache Hive is a data warehouse software platform.[46] Data warehousing is different from

data analysis: warehousing depends much more on efficient storage to allow greater performance when analysis

is done later on. Data analysis, like we will be using, is less depended on this optimized storage format: data flows

in, gets processed and analytics flow out. Efficient storage is not a hard requirement because of the ‘single read’

nature. Nevertheless, Saavedra and Yu’s conclusions may interest us.

In their study, Saavedra and Yu determine that the Parquet and text formats greatly outperform the use of raw

PCAP files, at the expense of large data loss due to the need to create a well-defined schema for processing and

the conversion time needed to shift from one format to another [43].

2.4.2 Metric Storage

A metric is a measurement of some kind, expressing the value of a sensor or detector at some point in time. In our

usecase, the ‘sensors’ will be virtual sensors, which monitor several aspects of the virtual network and the hosts

within that network. Examples include CPU usage, memory usage, a list of active processes, etc.

Metric storage is commonly achieved using a time-series database (TSDB). A time-series database is a software

system that is optimized for handling time series data, arrays of numbers indexed by time (a datetime or a datetime

range) [47]. Metrics are (by definition) a time-series formatted dataformat.

Several popular TSDB software suites include:

• InfluxDB InfluxDB is an open-source time series database developed by InfluxData. It is written in Go and

optimized for fast, high-availability storage and retrieval of time series data in fields such as operations

monitoring, application metrics, Internet of Things sensor data, and real-time analytics [48].

• Graphite Graphite is a free open-source software tool that monitors and graphs numeric time-series data

such as the performance of computer systems. Graphite was developed by Orbitz and released as open

source software in 2008 [49].

• OpenTSDB OpenTSDB, or Open Time Series Database, is a distributed, horizontally scalable Time Series

Database written on top of HBase [50]. OpenTSDB was written to address a common need: store, index

and serve metrics collected from computer systems (network gear, operating systems, applications) at a

large scale, and make this data easily accessible and graphable [51].

Page 32: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

2.5 Data Processing 19

Thanks to HBase’s scalability, OpenTSDB allows you to collect thousands of metrics from tens of thousands

of hosts and applications, at a high rate (every few seconds). OpenTSDB will never delete or downsample

data and can easily store hundreds of billions of data points [51].

A more complete list of time-series databases can be found in [47].

2.5 Data Processing

To process the aforementioned data, we’ll need special techniques and software, specialized for ‘big data’ environ-

ments. There are many, many big data processing frameworks available today, of which the most popular reside

under the Apache Incubator program. We can’t possibly cover them all here, but we’ll cover the most popular. A

full list of projects in the ‘big data’ category in the Apache Incubator program can be found in [52].

We’ll start by categorizing them according to their default processing engine and way of processing data, after

which we’ll cover some popular frameworks: Apache Hadoop, Apache Samza, Apache Flink, Apache Storm and

Apache Spark.

2.5.1 Taxonomy

Processing frameworks and processing engines are responsible for computing over data in a data

system. While there is no authoritative definition setting apart ‘engines’ from ‘frameworks’, it is

sometimes useful to define the former as the actual component responsible for operating on data

and the latter as a set of components designed to do the same [53].

To categorize these systems, we group the processing frameworks by the way they handle chunks of data.

We identify three main categories: Batch Processing Systems, Stream Processing Systems and Hybrid Processing

Systems.

Batch Processing Systems

Batch processing involves operating over a large, static dataset and returning the result at a later time when

the computation is complete [53]. Datasets in batch processing systems are typically bounded, which allows for

processing all available input data before returning a result. Because batch processing excels at handling large

volumes of persistent data, it frequently is used with historical data.

The main drawback is found in the nature of batch processing: the system only produces a result after all

available input data has been read. This input data can be very large, which results in very long computation times.

Batch processing thus isn’t ideal in situations where obtaining fast results is important.

Page 33: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

2.5 Data Processing 20

Stream Processing Systems

Stream processing systems compute over data as it enters the system. This requires a different processing model

than the batch paradigm. Instead of defining operations to apply to an entire dataset, stream processors define

operations that will be applied to each individual data item as it passes through the system [53].

Stream processing systems divide up the dataset into smaller batches. The full dataset (which may not be fully

available when computation starts) is called the total dataset. The smaller parts are called working datasets.

Stream processing systems can handle a nearly unlimited amount of data, but they only process one (true

stream processing) or very few (micro-batch processing) items at a time, with minimal state being maintained in

between records [53]. Figure 2.7 shows the general idea of a micro-batch processing framework.

Figure 2.7: General idea of a micro-batch processing framework.

Original image from [54]

Stream processing excels at near real-time processing, both in the true stream processing techniques as in the

micro-batch processing techniques. It is able to process (possibly previously unknown) input data with very low

latency. Stream processing systems are thus very popular in environments where immediate feedback is desired.

Hybrid Processing Systems

Some processing frameworks can handle both batch and stream workloads. These frameworks simplify diverse

processing requirements by allowing the same or related components and API’s to be used for both types of data

[53].

While projects focused on one processing type may be a close fit for specific use-cases, the hybrid frameworks

attempt to offer a general solution for data processing. They not only provide methods for processing over data,

they have their own integrations, libraries, and tooling for doing things like graph analysis, machine learning, and

interactive querying [53].

2.5.2 Frameworks

The previous section explained the three kinds of processing systems. We can now look at some popular processing

frameworks, which may clarify the above definitions.

Page 34: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

2.5 Data Processing 21

Apache Hadoop

Apache Hadoop is a processing framework that exclusively provides batch processing. Hadoopwas the first big data

framework to gain significant traction in the open-source community. Based on several papers and presentations

by Google about how they were dealing with tremendous amounts of data at the time, Hadoop re-implemented

the algorithms and component stack to make large scale batch processing more accessible [53].

Hadoop provides an ecosystem of tools and techniques that work together to process data. Themost important

components are:

• HDFSWe covered the Hadoop Distributed File System in section 2.4.1. HDFS is the distributed filesystem layer

that coordinates storage and replication across the cluster nodes [53].

• YARN Yet Another Resource Negotiator (YARN) is the cluster coordinating component of the Hadoop stack.

It is responsible for coordinating and managing the underlying resources and scheduling jobs to be run [53].

• MapReduce MapReduce is the default processing engine of Hadoop. The detailed workings of MapReduce

are out of the scope of this document, but it’s important to remark that MapReduce reads out all data,

divides up the total dataset among the processing nodes and combines the results of each of these parts

into a single result.

Apache Samza

Apache Samza is a stream processing framework that is tightly tied to the Apache Kafka [55] messaging system.

While Kafka can be used by many stream processing systems, Samza is designed specifically to take advantage of

Kafka’s unique architecture and guarantees. It uses Kafka to provide fault tolerance, buffering, and state storage

[53]. A visual representation of the integration of Samza and Kafka can be found in Figure 2.8.

Page 35: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

2.5 Data Processing 22

Figure 2.8: Integration of Apache Samza and Apache Kafka.

Image from [56]

Because Kafka is represents an immutable log, Samza deals with immutable streams. This means that any

transformations create new streams that are consumed by other components without affecting the initial stream

[53].

A big advantage of Samza is the ability to deal with backpressure. Backpressure occures when the source

system provides a sudden influx of data, which the processing system can’t handle in real time, leading to delays

and possible data loss. Samza can cope with backpressure by using the Kafka messaging system, which is designed

to retain data for long periods of time without any data loss. Kafka simply acts as a ‘buffer’ for the sudden influx

of data, which gets processed by Samza at a later time.

The major disadvantage of Samza is also related to the Kafka messaging system: the input data must be

publishable in Kafka, which may not always be possible or practical. Raw or unstructured data is complicated to

publish into Kafka because of their disorganized nature.

Apache Flink

Apache Flink is a stream processing framework that can also handle batch tasks. It considers batches to simply be

data streams with finite boundaries, and thus treats batch processing as a subset of stream processing [53, 57].

Flink’s batch processing model in many ways is just an extension of the stream processing model. Instead of

reading from a continuous stream, it reads a bounded dataset off of persistent storage as a stream. Flink uses the

exact same runtime for both of these processing models [53].

Flink is a good all-round data processing framework, which can handle both streaming and batch processing.

Since its first release in 2014, Flink has known 45 releases to this date, the latest release being 1.5.0-rc1. Flink is

quickly becoming themost popular stream processing framework, but it is still relatively young and the community

Page 36: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

2.5 Data Processing 23

around Flink isn’t as well established as some of the alternatives.

Apache Storm

Apache Storm is a stream processing framework that focuses on extremely low latency and is perhaps the best

option for workloads that require near real-time processing. It can handle very large quantities of data with and

deliver results with less latency than some other solutions [53].

Storm uses Direct Acyclic Graphs to represent the steps that need to be taken in the processing. Storm provides

an at-least-once processing guarantee, which means that every incoming piece of data will be processed at least

once. Storm does not provide any guarantees that data is processed at most once, only at least once. Also, Storm

does not guarantee that data will be processed in order. Much like Flink, Storm can also do micro-batch processing.

Storm is one of the best solutions currently available for near real-time processing. It’s easy to use, fast, fault-

tolerant, reliable and scalable [58]. Storm is also one of the more mature frameworks available, with its first

release in 2010.

Apache Spark

Apache Spark is a very fast, general purpose batch cluster computing engine which is also fast and reliable. Spark

provides API’s for Python, Java and Scala. Spark’s speed primarily comes from the in-memory processing, which

provides much faster access speed than traditional disk-based engines such as Hadoop, Storm and Flink. Another

speed optimization is found in the general execution model, which is based on graphs [59], this allows spark to

optimize tasks across libraries (e.g. two map operations on the same dataset will be merged). Spark aims to be

100x faster than Hadoop [60].

While Spark Core is a batch processing framework, the very popular addon Spark Streaming [61] extends Spark

Core to provide micro-batch stream processing API’s. Data can be ingested frommany sources like Kafka, Flume, Ki-

nesis, or TCP sockets, and can be processed using complex algorithms expressed with high-level functions like map,

reduce, join and window. Finally, processed data can be pushed out to filesystems, databases, and live dashboards

[61].

Spark also provides nativemachine learning [62] and graph processing [63] libraries, which are highly-optimized

for use in distributed computing.

Spark has a programmingmodel similar to Hadoop’s MapReduce, but extends it with a data-sharing abstraction

called Resilient Distributed Datasets (RDDs) [64]. RDDs are fault-tolerant collections of objects partitioned across

a cluster that can be manipulated in parallel. RDDs integrate nicely with existing big data software solutions, such

as HDFS and Apache Kafka.

2.5.3 Using Big Data Frameworks in Network Traffic Analysis

After reviewing the different available data processing frameworks, it is interesting to research prior works in

network analysis using these processing frameworks.

Page 37: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

2.5 Data Processing 24

Hadoop and Hive

A traffic analysis system based on Apache Hadoop and Apache Hive was introduced in 2016 in the work “Hobbits:

Hadoop and Hive based Internet traffic analysis” [65] by Hendawi et al.

They worked on a preprovided dataset, in PCAP format, which allowed them to base their ‘Hobbits’ solution on

libpcap.

They compared their solution with the existing p3 solution. p3, proposed by Lee and Lee in [66] is another

Hadoop-based traffic analysis algorithm in MapReduce. The ‘Hobbits’ solution outperformed the p3 algorithm by

25%, which is a significant improvement. The poor efficiency of p3 can be attributed to the need of converting the

input PCAP files to another format, while ‘Hobbits’ works directly with the PCAP files.

Apache Storm

Manzoor and Morgan proposed using Apache Storm in a network traffic processing framework in their work “Net-

work intrusion detection system using apache storm” [67]. Their results look great: they achieved a throughput

of 13,600 packets per second on a single general purpose machine. They also used a machine learning technique

called Support Vector Machines to analyze traffic, which unfortunately doesn’t perform as hoped. However, they

outlined that class imbalance may lie at the core of the inaccurate issue. Class imbalance reverse to the uneven

representation of classes in the training dataset. In other words, the dataset is unevenly distributed in the different

categories, which causes incorrect classification in the detection phase.

Apache Spark

In [68], the authors compared various machine learning algorithms in Apache Spark. They used the build-in ma-

chine learning library of Spark, MLib [62]. Of the five tested machine learning algorithms (Logistic Regression,

Support Vector Machines, Naive Bayes, Random Forests and GB Trees), Random Forests turned out to be the best

performing. With an accuracy of 99%, specificity of 89%, sensitivity of 91% and a training time of only 175 seconds,

it outperformed the other algorithms significantly.

Gupta and Kulariya continue this analysis [69], but add a preprocessing step: feature selection. Feature selec-

tion algorithms identify the important aspects of the data and filter out the unimportant aspects. These aspects

are called ‘features’.

They applied twowell-known feature selection algorithms: correlation based feature selection and Chi-squared

feature selection. They conclude that this additional preprocessing step, which of course requires additional com-

putation time, is still beneficial to the total computation time: the removal of unimportant features affects accuracy

only slightly, but decreases training time significantly.

Page 38: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

DESIGN 25

Chapter 3

Design

In this chapter, we list the requirements for ChiefNet. We also present possible solutions to the challenges that

arise from these requirements.

In the first section, we describe the in-depth anatomy of a computer network. This is important, because we

want to emulate a ‘real life’ network as accurately as possible.

After understanding what a computer network entails, we cover all requirements of ChiefNet. These require-

ments are diverse. First, we’ll see some general requirements. Besides the general requirements, there are more

specific requirements or technologies, which describe the different components of ChiefNet. The subsequent sec-

tions provide details on the specific technologies or components of ChiefNet. The components include Virtualiza-

tion, Networking, User Simulation, Infection, Data Extraction, Data Storage, Data Processing and Data Visualization.

3.1 Anatomy of a Computer Network

To be able to emulate an authentic ‘real world’ network, we need to a) understand what a network really is and

b) figure out a way to imitate this network in an isolated environment.

Figure 3.1: A ‘real world’ computer network.

Page 39: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

3.2 Requirements 26

Network Entity Component Description

Users User Simulation Simulating users to do regular activity

End Devices Virtualization Emulating end devices in a virtualized environment through Virtual

Machines (VMs)

Switches Virtual Networking Networking within the virtual environment

Routers Virtual Networking Networking within the virtual environment

Durability Infection Scheduled infection of an end device

Table 3.1: Translation of network entities to ChiefNet components

Figure 3.1 shows a typical (quite simple) company computer network.

End devices, such as laptops, desktops and smartphones, are the main entities in the network. They provide

the ‘interface’ for the users to use the network. Users use the network for communicating with each other, storing

files, accessing information, etc.

The end devices are interconnected through (possibly multiple) switches. These switches are responsible for

handling communication between two end devices. The other devices on the network, which we’ll call secondary

devices, include printers, desk phones, etc.

The last important entity in the network is called the gateway. The gateway (often also acting as the main

router) is responsible for communication between the Local Area Network (LAN) and the rest of the world. A

company typically has multiple physical locations (such as two separate stores in different towns), but still needs

access from one store’s LAN to another’s. This is commonly achieved using some kind of connection (e.g. a VPN

tunnel) between the two LANs, represented by the bolt in Figure 3.1.

There’s one more important aspect to a computer network, which can’t be shown visually. I’ll call this element

the durability of the network. Durability represents the continuum of a network: barring any hardware or software

failure, the networkwill behave like it’s supposed to. I mention this aspect because it’s important for understanding

how an infection alters the network. We’ll cover this in more detail in Section 3.2.5.

3.2 Requirements

We know now the in’s and out’s of a computer network and are able to translate these aspects into different

components of ChiefNet.

By looking at the above definition of a computer network, we can identify a few key entities: users, end devices,

switches, routers and durability. We can now translate these entities into components of ChiefNet. Table 3.1 shows

an overview of the different entities and their respective components. We will cover each of the components in

Table 3.1 in the following sections.

Page 40: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

3.2 Requirements 27

3.2.1 General Requirements

First off, some general requirements that don’t really fit into a specific component.

[Req. 1] ChiefNet must be easy to use: a simple graphical user interface should provide all the tools to the

operator in a logical, structuredmanner. Creating network topologies should be effortless and intuitive,

preferably using a drag-and-drop approach.

[Req. 2] Extensibility is a core aspect of ChiefNet, therefore we’ll need to use an open, cross-platform pro-

gramming language. APIs should be used wherever possible, so that future developers can easily add

or modify the functionality.

[Req. 3] As we have seen, many of the detection techniques have a variety of parameters and settings to

optimize performance and accuracy. Researchers will want to be able to tweak these variables by

iteration: repeating the same experiment over and over while adjusting the parameters. Therefore,

ChiefNet should have some way of storing and loading experiments to and from storage, so that re-

searchers can simply save their experiment.

[Req. 4] These experiments can become very large, too large for a single machine. Scalability is key, ChiefNet

should thus allow for distributed computing in a scale-out manner.

[Req. 5] Finally, since malicious code will be executed in the network, it should be (by default) completely

isolated from the outside world. In the improbable event where an operator might want the network

to be able to communicate with the outside world (e.g. the internet), manual adjustment should allow

such a scenario.

3.2.2 Virtualization

We’ll use virtualization to run multiple hosts on a single physical machine. There are some additional requirements

for this component, to comply with the general requirements above. In the remainder of this document, we will

use the term ‘host’ and ‘guest’ interchangeably, both referring to the virtual machines which run the end devices

in our emulated network.

[Req. 6] In order to be as versatile as possible, ChiefNet should support a large range of operating systems.

The most popular operating system, Microsoft Windows, must definitely be supported. Other popular

operating systems such as Linux and MacOS should also be supported. Alternative operating systems,

such as BSD or Android, are a bonus.

[Req. 7] jFed is a Java-based framework for testbed federation [70]. jFed makes it possible to learn the

testbed federation architecture, workflows and APIs, and makes it also easy to develop java based

client tools for testbed federation [71]. This study is performed at the University of Ghent, department

IDLab (which is closely associated with imec), hence experiments are performed on the imec ‘Virtual

Page 41: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

3.2 Requirements 28

Wall’ [72]. A bonus feature of ChiefNet is to interact with jFed and use the existing APIs for running the

experiments on the Virtual Wall.

[Req. 8] As a consequence of the scale of the environment, manual configuration of the guests would be

a nuisance. As such, the virtualization layer should provide select automated configuration possibili-

ties, ideally with a configuration management tool such as Ansible [73], Puppet [74] or Chef [75]. For

Windows systems, PowerShell Desired State Configuration (DSC) [76] can be used.

[Req. 9] Somewhat related to the previous requirement is the need for automated installing of the operating

system in the guests. This can be done in two ways: (a) by performing unattended installations of the

operating systems from installation media (such as disc image files or PXE servers) or (b) by letting

the operator install the operating system beforehand, configure most of the settings and then convert

this system into a ‘template’ to be cloned into multiple guests.

3.2.3 Networking

To complement the virtual machines in the previous section, virtual networking is required to enable communi-

cation between the different guests. The counterpart of physical switches are simply called ‘virtual switches’, and

they behave much alike too.

[Req. 10] Related to [Req. 5], the networking layer is where the actual segregationwill take place. The virtual

networking stack should support some way of sequestering the different network segments in such a

way that data can’t ‘escape’ it’s segment. If the operator wishes to connect different segments (e.g.

the internet and the LAN), a router with very fine-grained access rules should be required.

[Req. 11] Associated with the previous requirement and requirement [Req. 8] is the ability to still communi-

cate with the guests, but without accessing the LAN of the isolated environment. This will become the

primary ‘gateway’ to the isolated environment. A possibility for this is the use of a ‘control network’,

which is a second LAN in the isolated environment, but with some externally reachable entity to ‘talk’

with the guests. However, having an externally reachable entity connected to the isolated environment

would defeat the requirement of being completely isolated. Therefore, ChiefNet will have to support

a way of communicating with the guest operating systems without the traditional luxury of a TCP/IP

network.

3.2.4 User Simulation

Since our network won’t contain any real users, we’ll have to simulate them in software. To this end, we desire

some software to simulate regular user activity. This software should (as convincing as possible) generate traffic

to help train intrusion detection techniques (e.g. a machine learning technique’s baseline).

Page 42: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

3.2 Requirements 29

[Req. 12] First and foremost, the user simulation software should be cross platform and extensible. This

allows us to run the software on many operating systems (including Microsoft Windows and Linux).

Future developers must easily be able to extend the user simulation suite with other ‘user actions’.

[Req. 13] As a first ‘user action’, we would like to simulate sending and receiving emails. Because some mal-

ware (e.g. spambots) use these protocols in their execution, this should be as transparent as possible.

This transparency fools the malware into thinking the email was actually sent, enabling researchers to

perform better malware analysis.

If there are any attachments in the received emails, an authentic user might open or execute those. We

wish the user simulation software to do the same.

[Req. 14] The second user action we would like to support isweb browsing. Simulating a user surfing the web

is hard, because surfing usually involves more user interaction. A possible way to circumvent this is by

implementing a ‘web crawler’, which navigates to a random link on the current page.

Similarly to email attachments, if a user stumbles onto a file, he or she might open/execute this file.

We wish the user simulation software to do the same.

[Req. 15] Finally, as a third user action, we would like to support transferring files from and to a central

storage location. Since there are many file transfer protocols, only a select few protocols should be

supported. The most popular file transfer protocol, Microsoft’s SMB, should be supported (if possible).

Alternatively, the File Transfer Protocol (FTP) is a more ‘open’ protocol than SMB, while still very popular

today.

3.2.5 Infection

The infection event is a predefined action to infect one or more hosts in the network. This infection ‘breaks’ the

durability of the network, resulting in an abnormal state. The functionality to enable this can be described in a

single requirement:

[Req. 16] ChiefNet should include functionality to schedule predefined actions. Examples of these actions

include executing a commandwithin a guest, uploading and executing files in the guest, sending emails,

etc.

3.2.6 Data Extraction

To ‘feed’ our malware analysis techniques, we need to extract information from the network. There are two inter-

esting spots to extract information from: the switches and the hosts. This translates into two requirements:

[Req. 17] We wish to extract certain information from the hosts in the network, preferably using the same

communication mechanism as in [Req. 11]. The implementation should include some basic metric (e.g.

Page 43: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

3.2 Requirements 30

CPU usage, memory usage or process list), but also be extensible to allow future developers to add

their own metrics.

[Req. 18] Additionally, network information should be collected from the virtual switches. This information

can be both raw network data streams as network metadata (e.g. total bytes transferred, destination

IP addresses).

3.2.7 Data Storage

The collected data can be both processed (and eventually stored) or directly stored for later analysis. The storage

system should support both these options, and should be able to keep up with the influx of incoming data. Data

loss due to overloading of the storage system is not acceptable.

[Req. 19] To ensure that ChiefNet is able to cope with both large and small networks, the storage framework

should be horizontally scalable.

[Req. 20] Associated with the data extraction requirements, the storage system(s) should be able to store

both raw (possibly aggregated) data, as well as structured metrics.

3.2.8 Data Processing

As with the data storage system, the data processing framework should be able to cope with both small and large

networks.

[Req. 21] To ensure that ChiefNet is able to cope with both large and small networks, the processing frame-

work should be horizontally scalable.

3.2.9 Data Visualization

In a last step, it may prove useful to visualize the data in the storage system. This data can be both raw data (e.g.

metrics from guests), or results from the data processing framework (e.g. number of infected hosts).

[Req. 22] A data visualization system should be in place, supporting the databases in the data storage system.

Page 44: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

IMPLEMENTATION 31

Chapter 4

Implementation

After having defined the requirements in the previous chapter, this chapter covers the implementation of ChiefNet.

We start with the technologies used in ChiefNet, after which we cover the software architecture. Next, we continue

with discussing the provided solutions for user simulation. Then we cover the data storage & processing solutions.

Finishing up this chapter, we talk about the given tools & utilities to configure the ChiefNet ecosystem, which are

in the form of Ansible Playbooks.

4.1 Technologies

The technologies used in ChiefNet are all open-source or freely available. We won’t cover the different licenses of

each component, but the reader can easily find these via the references. Note that these technologies are our pre-

ferred choice, but therefore not necessarily required. Many of the chosen technologies have APIs or connectors for

using other systems than the ones we choose here. We therefore structure this section as a list of key components.

The reader should have no issues changing the technologies within a component (e.g. swapping Apache Spark for

Apache Storm).

An overview of the technologies used in data storage, data processing and data visualization can be found in

Figure 4.1.

4.1.1 Programming Language

The main programming language in ChiefNet is Java. Java is one of the most popular programming languages

to date, mainly because of its ”write once, run anywhere” attitude. Java is versatile [Req. 6], extensible [Req. 2]

and cross-platform [Req. 12], which makes it a great choice for our requirements. Moreover, due to the large

community around Java, many libraries are available for connecting other technologies (including our other chosen

technologies).

The graphical framework used in ChiefNet is JavaFX. JavaFX is the successor of Swing, the deprecated (but still

popular) GUI framework of Java.

Page 45: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

4.1 Technologies 32

4.1.2 Hypervisor

The choice of hypervisor was a duel between two kinds of virtualization: container-based or virtual-machine-based.

Container-based virtualization

Container-based virtualization, or ‘containerization’, refers to an operating system feature in which the kernel al-

lows the existence of multiple isolated user-space instances [77]. The most popular containerization software

today is Docker [78]. The main advantage of containerization is the scalability: containers don’t consume near as

much resources as a ‘normal’ VM. However, containers are (by definition) forced to run the same operating system

as their host, which makes them unsuitable for our purposes. Specifically, we wish to run different operating sys-

tems such as Microsoft Windows or Android. Some work is being performed into extending Docker with Windows

containers [79], but this is still in its young stages and isn’t ready for production usage.

VM-based virtualization

On the other side of the arena is ‘regular’ VM-based virtualization. This has the advantage of being compatible

with any operating system, but VMs do consume more resources than containers. Many hypervisors exist, the most

popular being Microsoft Hyper-V [80], VMWare’s vSphere [81], Citrix’s XenServer [82] and the open-source Kernel-

based Virtual Machine (KVM) [83]. Since we’re aiming for extensibility and versatility, the choice was made to

continue with KVM as the main hypervisor in ChiefNet.

libvirt

Looking back at our main programming language, Java, and the newly made choice to use KVM as the hypervisor,

we’ll need away of connecting these two. libvirt is an open-source API, daemon andmanagement tool formanaging

platform virtualization [84] . Its support for KVM and bindings for Java makes it the ideal library to use in ChiefNet.

Libvirt also supports XenServer, vSphere and others, allowing future developers to (if needed) swap the hyper-

visor for another. This doesn’t require much changes in ChiefNet’s source code, since libvirt provides abstractions

for these concepts.

Operating System of the Host

We will use the Ubuntu Server operating system in the Long-Term-Support version 16.04.1. Ubuntu Server has

official packages for KVM and other required software components.

4.1.3 Virtual Networking

The world of open-source virtual networking is dominated by Open vSwitch [4], and with good reason. OVS is the

default networking suite in XenServer and has support for many other hypervisors (including KVM).

Page 46: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

4.1 Technologies 33

Open vSwitch supports our requirement of isolation [Req. 10] through the use of Virtual LANs (VLANs). A VLAN

is a broadcast domain that is partitioned and isolated in a computer network at the data link layer (OSI layer 2)

[85]. OVS also supports traffic capturing and SDN.

The Linux kernel implementation of Open vSwitchwasmerged into the kernelmainline in 2012, and subsequent

official packages for various Linux distributions (including Ubuntu) are available.

4.1.4 User Simulation

Unfortunately, there does not seem to be any existing user simulation software available at the time of writing.

This does not pose an unconquerable problem, since our requirements for user simulation are straightforward. As

a result, we implement the user simulation features ourselves.

Nonetheless, existing software helps us build the user simulation software. INetSim is a software suite for

simulating common internet services in a lab environment [86]. INetSim supports many protocols, such as HTTP(s),

SMTP/POP3, DNS and FTP.

The open-source implementation of SMB, Samba [87], will also prove useful.

4.1.5 Data Storage

Data storage requirements are key in ChiefNet, specifically the scalability and performance of the storage system.

To overcome the limitations of the storage capabilities of a single machine, a distributed file system is needed.

That’s why ChiefNet uses the Hadoop Filesystem (HDFS). A description of HDFS can be found in section 2.4.1. HDFS

is only used to store raw files, not structured data.

Structured data, such as metrics and other time-driven events, are be stored in a time-series database. We use

OpenTSDB as our database, because of its scalability and the interoperability between OpenTSDB and our processing

framework.

Data will not be directly inserted into OpenTSDB, instead we use Apache Kafka as a go-between. As we will see

later, the server extracts metrics from the guest machine and publishes them in Kafka. This allows us to directly

read (and process) this data in our processing framework, without the need to query the OpenTSDB database.

The OpenTSDB sink for Kafka is used to collect the published metrics and store them in OpenTSDB, for long-time

(archival) storage.

4.1.6 Data Processing

To process incoming data, we use the Apache Spark processing framework, coupledwith the Spark Streaming plugin.

The choice of a batch processing framework may seem uncanny because of it’s major disadvantage to require

all input data before producing a result, but the use of the Spark Streaming plugin lets us process the incoming

data in a semi-streaming way (micro-batches). We chose Spark because of numerous advantages:

1. Spark is faster than most other processing frameworks, because of its use of memory caching.

Page 47: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

4.2 Architecture 34

2. Spark’s build-in libraries for machine learning and graph processing will prove very useful.

3. Spark has comfortable APIs for Java, Scala and Python

4. Additional plugins and libraries (such as Spark SQL) allow Spark to interface with many other software

systems

These advantages of Spark far outweigh the disadvantage of it’s batch-style engine.

4.1.7 Data Visualization

To visualize the saved data, we use the Grafana [88] software package.

Grafana is directly compatible with OpenTSDB and allows for easy importing and exporting of graphs, making

it a very versatile platform for data visualization.

Figure 4.1: Used technologies in storing and processing data

4.2 Architecture

The general architecture of ChiefNet consists of two main entities: the client and the server. The client runs on the

operator’s local machine. The server runs on all hypervisors, controlling the virtualization layer and the networking

layer. Both of these components have additional responsibilities, which we cover in the following sections.

Additionally, to allow communication with the guest operating system, a software package called the ‘guest

additions’ can be installed within the guest OS.

Page 48: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

4.2 Architecture 35

4.2.1 Building Blocks

Before we continue with describing the key components of ChiefNet, we will briefly cover the different ‘building

blocks’ (or models) of ChiefNet.

• The Scenario is the main object in ChiefNet. It describes the current experiment and contains all the

necessary information to run or save the details of said experiment. The containment of all information

in a single object allows us to easily save the experiment: if we save this single object, all specifications,

attributes and details of the experiment are saved with it.

• We model each guest as a VirtualMachine, each of which can have zero, one or more

NetworkInterfaces, connected toVirtualSwitches. AVirtualMachine is identified by its host-

name, which is a human readable, user-defined string. ChiefNet identifies VirtualMachines by their

unique UUID.

• Each VirtualMachine is constructed from a HostTemplate, which contains the required information

to run this virtual machine (e.g. hard disk location, amount of memory, number of CPUs, etc.).

• Guest configuration is performed through the use of Procedures. A Procedure is an ordered list of

actions, which are executed sequentially on the guest operating system. These actions range from executing

a command to uploading/downloading files. Procedures are executed (either manually or automatically)

on the guests. A VirtualMachine with the guest additions installed can have a Procedure defined

to configure the state of this guest. Procedures are also used in the scheduler, where they form the

predefined actions (e.g. infecting a guest).

We are now ready to describe the two key components in ChiefNet: the client and the server.

4.2.2 Client

The client is themost important component of ChiefNet, providing the graphical user interface (GUI) to the operator.

As layed out in the previous section, JavaFX is the GUI library powering this graphical interface. Screenshots of the

client can be found in Appendix C.

Connections to the host

The client must be provided an XML configuration file, specifying the required options for connecting to the servers

and hypervisors. The details of this configuration file can be found in Appendix D. A visual portrayal can be found

in Figure 4.2. The connection to the hosts is done in four different ways:

1. The primary connection is to the server component of ChiefNet, using Java Remote Method Invocation (RMI)

[89]. The RMI interface allows us to invoke methods on the other side of the connection, while not having

to worry about serialization or error handling; RMI does this for us.

Page 49: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

4.2 Architecture 36

Figure 4.2: Client-Server connections in ChiefNet

The client connects to 7 different components of the server, which all have various responsibilities. A list of

the different components to which the client connects can be found in section 4.2.3 or in Table 4.1.

2. Secondly, the client connects to the libvirt interface for KVM. This allows the client to directly control the

virtualization & networking layer.

3. The third connection is to connect to the console of the virtual machines. KVM exposes the console of the

virtual machine as a VNC connection, allowing the operator to directly control the keyboard & mouse of the

VM, as well as viewing the screen of the VM.

4. A last connection to the server is used to communicate with the guest operating system. We’ll explain the

guest communication feature in more detail in section 4.2.4, but for now it suffices to know that the server

exposes TCP ports which are directly piped to the guest operating system. The client connects to these TCP

ports to interact with the guest operating system, for example to download log files or execute commands

inside the guest operating system.

Primary Actions

In order to clarify how ChiefNet works, some typical actions performed in ChiefNet are layed out .

Starting the client This may seem like a trivial action, but it is still noteworthy because of what happens

behind the scenes. The reader can follow the description provided below in Figure 4.3.

When the client starts, it connects to the ScenarioManager of the different servers and looks for

an already-running scenario. If such a scenario is found, the client simply resumes operations of that

scenario.

However, if such a scenario is not found, a new scenario is created. The most noteworthy operation

performed here is the deletion of any and all existing networking configuration and virtualmachines on

Page 50: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

4.2 Architecture 37

the hosts. In other words, the client deletes any possible leftovers from previously failed experiments.

This deletion is implemented to ensure maximum stability and compatibility: ChiefNet can’t possibly

take into account all the possible conflicts or collisions with other running software. Therefore, in order

to avoid these conflicts, the client deletes all existing configuration.

After all existing configuration is cleared, Open vSwitch is configured: the specified (hardware) network

interface is configured as an uplink.

Creating and powering on a VM Perhaps the most commonly performed process in ChiefNet is the creation

of a new guest. The GUI part is described below, but essentially the operator must specify a template

and certain hardware configuration options to build a VM. We refer the reader to Figure 4.4 for a visual

aid. The VM is created as follows:

1. After the operator specifies which HostTemplate he or she wishes to instantiate, the first

operation the client performs is finding a suitable hypervisor to run this VM on. The same

HostTemplate can be present on different hypervisors. The choice between the hypervisors

which house the chosen template is currently at-random. In the future, a possible extension is

to use a resource manager that takes into account the current number of VMs on that hypervisor,

the load, etc.

2. After the choice of hypervisor is made, the client prepares the hard drives (which are specified in

the template) for this specific VM. It does so by connecting to the HDDManager of the chosen

hypervisor. Theworkings of theHDDManager are specified in section 4.2.3, but for now it suffices

to know that the HDDManager creates the hard drive files needed for running this VM.

3. When the hard drives are prepared, the client prepares the XML definition of this VM. Libvirt uses

an XML definition for each VM, which specifies the attributes of this VM. The client constructs the

XML from the details of the HostTemplate, the newly created hard drives, the options specified

by the operator and some defaults which are hard coded into ChiefNet.

This XML file is everything libvirt needs to build a new domain (a ‘domain’ is the libvirt alias

of a VM). The XML specifies CPU count, memory configuration, network interfaces, hard drive

locations, VNC configuration for the console, configuration of the virtual serial port etc. The VNC

configuration is performed automatically: an open TCP port is automatically chosen to host the

VNC connection. The same applies for the virtual serial port connection.

4. Following the XML build process, the client connects to KVM through the libvirt API and instanti-

ates the new domain. The VM is now powered up and running.

Adding a vSwitch Virtual switches are the foundation of the virtual networking layer. Adding a vSwitch may

seem dead simple in ChiefNet, but much more goes on behind the scenes. We refer the reader to Figure

4.5 for a visual aid.

1. When the operator closes the ‘Add a new vSwitch’ window of the VirtualSwitchManager (as

described below in section 4.2.2), the first action performed by the client is a simple validation:

Page 51: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

4.2 Architecture 38

Figure4.3:

Actio

nspe

rform

edwhe

ntheCh

iefN

etcli

ents

tarts

Page 52: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

4.2 Architecture 39

Figure4.4:

Crea

tingaVM

inCh

iefN

et

Page 53: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

4.2 Architecture 40

the name of the vSwitch is required to be unique. If an existing vSwitch has the same name as

entered by the operator, an error message is displayed and no further operations are performed.

2. After the given name is validated to be unique, the client connects to theVirtualBridgeManagers

on every server, requesting a new virtual bridge in the Open vSwitch configuration. Important

here is that these virtual bridges all have the same VLAN ID, so they are logically connected to

the same LAN.

3. After each host created a virtual bridge for this vSwitch, the client connects to the KVM daemon

on each host, through the libvirt API. Here, a new virtual switch is created, which connects to the

virtual bridge we created in the previous step. This virtual switch currently has no attached NICs.

The vSwitch is now fully operational and ready to attach NICs from virtual machines.

Of key importance here is the equality of networking configuration per host. As we saw in the previ-

ous paragraph (”Starting the Client”), the client removes any and all existing networking configuration

when starting a new experiment. The client adds new networking configuration on every host, even

when that host won’t need it because of the absence of VMs in the new network segment. In this im-

plementation, the client is responsible for synchronizing networking configuration across hypervisors.

We discuss this limitation in more detail in Chapter 7.

Connecting a NIC to a vSwitch Connecting a virtual machine to a virtual switch is done through the use of a

virtual network interface (NIC). Attaching a NIC to a vSwitch is mainly performed by KVM, which hooks

up the vNIC to the vSwitch automatically when the virtual machine is powered on.

Adding a scheduled action A scheduled action is a predefined action which will execute upon the activation of

a trigger. This trigger is activated because of a change in the environment. Currently, the implemented

triggers are related to timing: the available triggers are ‘periodic’, ‘run-once’ and ‘fixed time’ triggers.

– Periodic triggers are triggered at regular intervals (e.g. every 1 minute)

– Run-once triggers execute after a specified delay (e.g. in 10 minutes)

– Fixed-time triggers execute at a specific date and time (e.g. at 10 AM on the first of July 2018)

The operator must choose one of these available triggers, and specify a Procedure to be executed

when the trigger is activated. Procedures are described in section 4.2.1.

After the options for the scheduled action are chosen, the client connects to (a random) host’s

SchedulerManager. The SchedulerManager is described in section 4.2.3 but in summary, it

controls a scheduling daemon for that host. The client adds the scheduled action to the host’s schedule.

The scheduling daemon starts processing triggers when the scenario is launched.

Accessing the console of a VM When the operator wishes to directly control a VM, and guest additions are not

available, the console is the only other option. Accessing the console is very easy: when right clicking

a VM in the ScenarioBuilder window, the option ”Access Console” is presented in a context menu.

Page 54: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

4.2 Architecture 41

Figure4.5:

Crea

tingavSwitc

hin

ChiefN

et

Page 55: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

4.2 Architecture 42

Wemention this action because of the way the client connects to the console. Namely, when the opera-

tor clicks the ”Access Console” button, the client looks up the connection details for the VNC connection:

the IP address of the hypervisor running the VM and the TCP port which was specified in the libvirt XML

configuration file. It then launches the VNC viewer (and thus a compatible VNC viewer is required to be

installed beforehand).

Launching the experiment When the operator has finished all configuration of the hosts and networking in

the scenario, the last step is to launch the scenario. The scenario is launched with the button ”Start

Scenario” in the ScenarioBuilder window.

When the scenario is launched, all virtual switches are configured and brought online, and all VMs are

started. The timer on the schedulers in the hypervisors is also started, which starts the countdown on

the ‘periodic’ and ‘run-once’ triggers.

Tearing down the experiment When the experiment has completed, it can be torn down. Tearing down the

experiment is equivalent to destroying the running scenario, which is done through the ”File” menu in

the ScenarioBuilder window.

When a scenario is destroyed, the following actions are performed:

1. All running VMs are powered off.

2. All operational virtual switches are shut down.

3. The scheduling daemons on the hosts are stopped, so no further triggers can be activated.

4. The saved experiment on all servers’ ScenarioManager are cleared.

5. Finally, the client exits.

Windows

In order for the operator may find his/her way in the client, this section briefly covers the different graphical

windows, along with their primary function.

ScenarioBuilder The main windows is called the ScenarioBuilder. Consisting of 3 panes, it controls the

current Scenario. A screenshot of the ScenarioBuilder can be found in Figure C.1 (Appendix C).

The left pane contains the available HostTemplates. A HostTemplate is a template from which

a guest can be constructed.

The right pane shows the network topology of the current Scenario: it includes the virtual switches,

guests and their interconnection. Adding guests is performed in a drag-and-drop manner from the left

pane to the right pane. Upon dropping a HostTemplate into the right pane, a windows appears with

options to customize the (template-provided) default hardware settings. This includes the hostname,

CPU count, memory amount and an optional configuration Procedure. After confirming these details,

the popup closes and a new item is added to the pane, representing the newly created host.

Page 56: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

4.2 Architecture 43

The bottom pane contains a single button, which starts the Scenario as it is currently configured.

Another aspect of the ScenarioBuilder is the menu at the top. It contains three items: ”File”,

”Networking” and ”Scheduling”. The ”Networking” and ”Scheduling” items allow one to launch, respec-

tively, the VirtualSwitchManager and the ScheduleManager. The ”File” item has some other

options:

Save to storage This button allows the operator to save the current scenario, with all its con-

figuration parameters, to an XML file. This XML file can be used later on to restore the

current state of the scenario.

Load from storage When the operator has a previously saved scenario available (in the form

of an XML file), he or she can load it into the ScenarioBuilder with this option. The

current configuration options of the scenario are replaced with the ones specified in the

XML file.

Save to hypervisors When clicking this option, the current scenario is saved to the hypervisors.

This allows the operator to close the client software, while the running scenario is still

active and can be recovered later on.

Destroy experiment This option destroys the running experiment. How the experiment is de-

stroyed, is described in section 4.2.2.

VirtualSwitchManager The VirtualSwitchManager forms the control panel for controlling the virtual

networking layer. The operator can add and remove virtual switches from this panel. Adding a vir-

tual switch is described in section 4.2.2. Removing a vSwitch is simply the reverse of adding one: the

attached NICs get detached, the KVM network is removed, the Open vSwitch bridge is destroyed and

the VLAN ID is released. A screenshot of the VirtualSwitchManager can be found in Figure C.3

(Appendix C).

ScheduleManager The ScheduleManager lists all scheduled actions for the defined hosts in this scenario.

A scheduled action is performed on a single host, so it logically follows that at least one host must be

present in the scenario, before a scheduled action can be added. A screenshot of theScheduleManager

can be found in Figure C.2 (Appendix C).

The operation of adding a scheduled action is described in section 4.2.2. Of key importance here is

the use of Procedures. When adding a scheduled action, the operator defines a Procedure to be

executed whenever the trigger is activated. This Procedure contains an ordered list of commands,

which will be executed inside the guest operating system. Adding a Procedure to a scheduled action

is done through the ProcedureManager. The interface for adding a scheduled action is illustrated

in Figure C.4 (Appendix C).

ProcedureManager The ProcedureManager controls the procedures of this scenario. When adding or edit-

ing procedures (e.g. when adding a scheduled action or creating a new VM), the ProcedureManager

Page 57: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

4.2 Architecture 44

is used to provide a graphical interface to modifying a Procedure. A screenshot of the

ProcedureManager can be found in Figure C.5 (Appendix C).

The ProcedureManager is build up of three main components: the list of actions, buttons to add

new actions and the top-menu.

The list of actions contains all actions currently defined in the Procedure. Actions can be deleted

with the ”Delete” button on the right. New actions can be added with the ”Add Action” buttons, which

are also on the right.

To the right of the list of actions are the buttons to add new actions. Clicking one of these buttons will

open a popup, querying the operator for the necessary information to complete the action (e.g. which

file to upload and where to store it). Upon completion of this popup, the new action is added to the

bottom of the list of actions. Since new actions are added to the bottom of the list, the operator must

add actions in a first-in-first-out manner.

Above both the list of actions and all the buttons is the menu bar. The menu bar contains a single item,

which offers the option to load and save Procedures to storage. Indeed, a Procedure can be saved,

and loaded back in at a later time. Procedures are saved, just like Scenarios, in an XML file. The

XML file is structured logically, which allows the operator to edit the XML file offline (or out-of-band).

Further information about the format of the XML files are available in Appendix D.

4.2.3 Server

The server consists mainly of endpoints for the client to connect to. The different endpoints are listed below, each

with their own uses and responsibilities.

Endpoints are exposed over the Java RMI interface, allowing the client to remotely activate operations which

are contained within this host. Most endpoints fall under one of two categories: VM-related support endpoints or

guest communication endpoints. A compact overview of the different components can be found in Table 4.1, but

we’ll cover them in more detail below.

Page 58: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

4.2 Architecture 45

Category Endpoint Description

VM Support HDDManager Controls the files containing the VM harddrives on this host

VM Support HostTemplateManager Controls the VM templates on this host

VM Support VirtualBridgeManager Controls the virtual networking layer (Open vSwitch) on this

host

Guest Communication NetCollector Controls the processes responsible for collecting network

traffic for the VMs on this host

Guest Communication GuestCollector Controls the processes responsible for collecting guest in-

formation for the VMs on this host

N/A ScenarioManager Simple store/retrieve endpoint for saving the running ex-

periment

Table 4.1: List of the different endpoints and their category in the server component of ChiefNet

HDDManager The HDDManager is responsible for managing the virtual disks of the VMs. When creating a

VM, it is not allowed to simply connect the template’s disks to the VM, because the VM would then

modify the template. Instead, a copy of the virtual disks defined in the template must be provided.

KVM supports the following virtual disk formats [90]:

• RAW: A plain file.

• bochs: Bochs disk image format.

• cloop: Compressed loopback disk image format.

• cow: User Mode Linux disk image format

• dmg: Mac disk image format

• iso: CDROM disk image format

• qcow: QEMU v1 disk image format

• qcow2: QEMU v2 disk image format (most popular format)

• qed: QEMU Enhanced Disk image format

• vmdk: VMware disk image format

• vpc: VirtualPC disk image format

In summary, when the client requests a new virtual disk for a VM, the HDDManager copies the virtual

disks defined in the VM’s template to a new location. The resulting disks are then available for the

client to use in the VM’s libvirt definition. When the VM is destroyed, the created virtual disks aren’t

needed anymore and are deleted from disk.

An important remark here is the use of qcow’s linking feature. If the virtual disks are in qcow2 format,

the HDDManager doesn’t simply copy the source file into the destination file; instead, it creates a ‘link’.

This ‘link’ is achieved using qcow2’s copy-on-write feature: a snapshot is created of the template’s

disks in their current state. This snapshot is now a ‘linked clone’ of the template’s disks (the base

Page 59: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

4.2 Architecture 46

image). When a read request occurs, the system will first check if the requested block is available in

the snapshot. If this is not the case, the system will then access the base image. When a write request

occurs, only the snapshot is modified. This preserves the state of the base image. Figure 4.6 represents

a visual explanation of linking.

Figure 4.6: Visual explanation of snapshotting using qcow2’s copy-on-write feature

Image by Technoscoop [91]

The huge advantage of linking is speed: copying a virtual disk file will take some moments, depending

on the size of the source file and the speed of the underlying storage system. In contrast, creating a

link is almost instant and doesn’t depend on the size of the base image.

HostTemplateManager The HostTemplateManager performs a simple function: it searches for VM tem-

plates on the hypervisor and exposes them via the RMI interface.

Finding HostTemplates is relatively simple: when starting the server, the operator has to specify

a directory containing the XML files (which each represent a VM template). More details about these

XML files is available in Appendix D. The HostTemplateManager then converts these XML files into

HostTemplate objects, which are passed through to the client upon request.

A HostTemplate is identified by its name (a string). The name doesn’t necessarily have to be unique:

if two hosts have some template with the same name, it is assumed these templates are identical and

thus allows for a VM to be instantiated on either of these hosts. The HostTemplate name does have

to be unique within a single hypervisor.

VirtualBridgeManager Since Open vSwitch doesn’t expose a management interface over the network, a wrap-

per around themanagement commands is needed. This is the purpose of theVirtualBridgeManager:

it exposes methods to list, create, view and delete OVS bridges. These bridges are only modified when

the client starts and creates a new scenario (see section 4.2.2). OVS doesn’t expose a programmatic

API, so interfacing with OVS is done by executing the ovs-vs management commands from the Java

JVM.

The VirtualBridgeManager also exposes functions to add and remove the SDN controller to the

Open vSwitch bridges.

Page 60: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

4.2 Architecture 47

SchedulerManager The SchedulerManager controls the host’s scheduling daemon. Under the hood, it

uses the quartz [92] scheduler. The methods of the SchedulerManager are simply wrappers

around the quartz functions.

The scheduling daemon is mainly used for scheduled actions, as described in section 4.2.2.

Scheduling is started when the scenario is started (section 4.2.2) and is stopped when the scenario

is destroyed. The activation of triggers is performed by quartz, as is the ‘administration’ around

scheduled tasks.

NetCollector The NetCollector is used for capturing live network traffic from the guests. It saves the

captured traffic in a pcap-formatted file in the Hadoop Distributed Filesystem (HDFS).

KVM exposes the virtual NICs of the VMs as local interfaces on the hypervisor. This allows us to capture

network traffic without needing access to the guest machine. Each virtual interface is named ”$VM

hostname-$NIC number” (e.g. mydesktop-0).

tcpdump is used to capture traffic passing through these interfaces. This output is the captured net-

work traffic as a pcap file, which we store on HDFS. Since tcpdump outputs the traffic in real-time

(not accounting for a small buffer delay), the file is saved in a streaming manner to the HDFS system.

This allows the data processing system (Spark) to read the file in a (similar) streaming fashion and

process the data in (near) real-time.

Another option would be to split the pcap file in smaller chunks, processing each file as it is written

to HDFS. Depending on the chunk size (the buffer size), this approach may increase the delay between

the passing of the traffic through the interface and the processing of the data by Spark. However, it

may increase performance, since HDFS isn’t designed to sustain many small writes. This is covered in

Chapter 5.

Raw network traffic isn’t the only information collected in the NetCollector. Additionally, packet

metadata (such as source IP, destination IP, protocol, etc.) can be collected and published in Apache

Kafka. This metadata doesn’t require the scalability of HDFS, because it doesn’t contain the packet’s

content.

GuestCommunicationsManager A special technique allows us to allow communication with the guest oper-

ating system. A detailed explanation is available in section 4.2.4, but for now it is sufficient to know

that communication with the guest is performed in a half-duplex manner. ‘Half-duplex’ means that

the medium allows the transmission of signals in both directions, but not simultaneously.

In other words, multiple entities can’t communicatewith the guest in parallel. We need some ‘manager’

to accept communication requests from entities and act as the go-between for this purpose. This is

what the GuestCommunicationsManager does: it exposes methods to queue a communiqué to

the guest and passes through the response. The queue is handled in a first-in-first-out basis, one

communiqué at a time.

Page 61: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

4.2 Architecture 48

This communiqué is called aGuestCommand. GuestCommands are further described in section 4.2.4.

Note that these commands are simple Java objects, allowing the requester to describe the demanded

command as a Java method. This method is then executed inside the guest operating system, which

can store the response in local variables. The Java object is then returned to the requester, with the

state of the object as it was left by the guest.

The use of the GuestCommunicationsManager is described in section 4.2.4 and in Figure 4.7.

GuestCollector We now cover the extraction of information from inside the guest operating system.

As we discussed above, the GuestCommands are plain old Java objects. The operations needed to

collect the required information can thus be expressed as simple Java statements (e.g. the number of

CPU cores with Runtime.getRuntime().availableProcessors()).

Other operations can of course also be executed, as Java allows for creating child processes. For ex-

ample a shell command (e.g. the logged in users with the who command) can be executed with

Runtime.getRuntime().exec(…).

ScenarioManager The ScenarioManager serves as a simple store for saving the current Scenario. The

only two methods are saveScenario and clearSavedScenario. The saved Scenario is saved

in memory and in storage, allowing the client to later request the running scenario and resume the

experiment.

The ScenarioManager isn’t part of either of the two categories of endpoints (VM Support or Guest

Communication).

4.2.4 Guest Additions

As stated in [Req. 17] (Chapter 3), we desire some way of communicating with the guest operating systems, mainly

to exchange data back and forth. A normal TCP/IP ethernet network won’t suffice, because malware will be spread-

ing in this network and we want a completely isolated system. Hence, we need another medium for these com-

munications.

Virtual Serial Port

Our proposed solution works with a virtual serial port.

A serial port is a serial communication interface throughwhich information transfers in or out one bit

at a time (in contrast to a parallel port). Throughout most of the history of personal computers, data

was transferred through serial ports to devices such as modems, terminals, and various peripherals.

A virtual serial port is an emulation of the standard serial port [93].

The virtual serial port provides a communicationmediumwhere the guest operating system can read data from the

port, execute any required operations and write the result back into the port. Libvirt supports the use of (emulated)

virtual serial ports [94].

Page 62: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

4.2 Architecture 49

Most communication mediums today are full duplex (they support transferring data in both ways at the

same time). As stated in the above quote, a serial port is a serial communication interface. This makes it half-

duplex (supports transferring data in both ways, but only one way at a time). This results in the following lim-

itation: while the sender is sending data to the receiver, the receiver can’t send any data (back) to the sender.

Access control to the communication medium is thus required, which is performed by the previously covered

GuestCommunicationsManager (section 4.2.3).

The name ‘guest additions’ may seem familiar. Many virtualization providers today provide an extra (optional,

but highly recommended) software package to install in every guest machine. This software often provides in-

creased performance and additional features. Our proposed solution also needs some software on the guest side

of the serial port. This software handles incoming requests and executes the commands in these requests, relaying

the output back into the serial port.

Phases

The execution of a GuestCommand is performed in three phases:

Phase 1 The HostPre phase. This phase is executed on the requesting entity, collecting information which may

be needed for the guest to successfully complete the command. The main example is uploading a file

to the guest: this phase reads out the file from the file system of the requesting entity.

Phase 2 The Guest phase. This phase is executed inside the guest machine. The guest has all required infor-

mation to execute the required operation. The definition of the guest command (how to execute the

command) is defined within the guest additions source code. The command itself (which command to

execute and its parameters) is contained within the request.

Phase 3 The HostPost phase. It is mainly used for post processing the received data from the guest to some

sort of storage (e.g. if a file is to be downloaded from the guest, this is the phase where the file gets

saved onto the requester’s local filesystem).

Each phase is required to be defined in the executor’s source code. This results in a two-way dependency: if

future developers wish to add a GuestCommand, the source code for this Java object needs to be present in both

the requesting entity and in the guest. This will require an update of the guest additions software inside the guest

machine.

Method of Operation

The complete flow, from request to response, is described below. Note that the reader can follow along with the

steps below in Figure 4.7.

1. An entity (possibly the operator, possibly an automated action) wishes to retrieve/store some information

from/to a guest. The entity fills out all required information to execute the operation (e.g. which file to

transfer back).

Page 63: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

4.2 Architecture 50

2. The requesting software (possibly the client, possibly the server) performs Phase 1.

3. The requester notifies the GuestCommunicationsManager of their desire to execute some operation

on a specific guest.

4. The GuestCommunicationsManager queues the request, to be executed at a later time.

5. Some time later (possibly long, depending on the length of the queue), theGuestCommunicationsManager

selects the request to be executed.

6. The GuestCommunicationsManager passes the GuestCommand to the guest machine.

7. The guest executes the command (Phase 2).

8. The guest updates the state of the command (e.g. fills in local variables containing the output of a shell

command)

9. The guest sends the GuestCommand back to the GuestCommunicationsManager.

10. The GuestCommunicationsManager relays the guest response to the requester.

Supported Commands

At the time of writing, 6 GuestCommands are available. Additional commands can be easily implemented by

extending the GuestCommand interface.

• The CliGuestCommand executes a shell command inside the guest operating system and returns the

stdout and stderr of the command back to the requester.

• The DownloadFileGuestCommand downloads a file from the guest to the requester. If the destination

file already exists, the file is overwritten.

• The UploadFileGuestCommand uploads a file from the requesting entity to the guest machine. If the

destination file already exists, the file is overwritten.

• The UploadAndExecuteFileGuestCommand uploads a file (e.g. a script), executes it and returns the

output back to the requester.

• The AnsiblePlaybookGuestCommand uploads an Ansible Playbook [73] and executes it on the guest

machine. The output of the ansible-playbook command is returned to the requesting entity.

• Finally, the GetMetricsGuestCommand is used by the GuestCollector (section 4.2.3) to extract in-

formation about the guest operating system. Currently, the only metric returned is a list of processes, but

this can be easily extended.

Page 64: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

4.2 Architecture 51

Figure4.7:

GuestC

ommun

icatio

n:Metho

dof

Operation

Page 65: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

4.3 User Simulation 52

Automated Configuration

To complete the section about the guest additions, we briefly go back to our requirements and design.

In [Req. 8], we layed out that some form of automated configuration is highly desired. Using the guest addi-

tions, this is now possible: when the guest operating system is booted, it informs the hypervisor that it’s ready to

start receiving commands (we call this the GuestHeartbeat). The hypervisor can then immediately start the

predefined configuration procedure (see section 4.2.2).

4.3 User Simulation

The client & server are themain components of ChiefNet, primarily taskedwith running the virtualized environment.

In this section, we’ll cover how we managed to simulate users in this confined network.

As with the previous components, the intent is to create a cross-platform solution ([Req. 12]). Therefore, the

main programming language used is (again) Java. The Quartz library is (again) used to schedule tasks, such as

repeated actions.

The simulation software, much like the guest additions, are required to be installed in the guest operating sys-

tem. However, this can easily be achieved using the guest additions: just upload the desired simulation application

and execute it with the required arguments.

The next sections cover the simulations one by one, describing their implementation and use.

Email

The first requirement, [Req. 13], is to simulate sending and receiving emails. Additionally, if received emails have

attachments, these need to be executed.

On the client side, the Apache Commons Email library [95] is used to facilitate interfacing with email servers.

The Commons Email library supports POP3(s), IMAP(s) and SMTP(s), making it a very versatile library to connect to

email systems.

On the server side, INetSim [86] is used. INetSim was designed for this purpose: simulating common internet

services in a lab environment. INetSim’s protocol support includes SMTP(s) and POP3(s). Moreover, INetSim stores

emails in a very accessible format (mbox). No kind of authentication is performed since emails aren’t actually sent

(simply saved to storage). However, INetSim does support authentication methods, but the resulting username/-

password is simply logged, not validated. This might prove useful when analyzing malware that uses email (e.g.

a spambot). To start the email simulation, the operator has to set up an INetSim server and pass the specified

properties to the Java program.

To specify which emails can be received, the operator has to supply INetSimwith an mbox file. This file contains

a list of emails, of which INetSimwill pick some at random for each time the user requests emails. The same applies

for sending emails: the operator has to supply an mbox file of which a select (random) email will periodically be

submitted to the mail server.

Page 66: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

4.3 User Simulation 53

An important remark here is the use of INetSim’s DNS spoofing feature. INetSim includes a DNS server which

always responds with the IP address of the INetSim server, this allows us to specify an external mail server (e.g.

‘mail.example.org’) which will resolve to the INetSim server. With the right Certification Authority configuration in

the guests, one can even use secure protocols such as SMTPS and POP3S.

Web

The second simulation, as specified in [Req. 14], simulates a user surfing the web. There are two different ‘modes’

to simulate web traffic.

The first mode is called crawlingmode. In crawling mode, the client just moves fromwebpage to webpage, not

executing any files. This mode is thus mainly used as a traffic generator. The client starts at the home page and

follows a random link, repeating indefinitely. When an error occurs, or when a page has no hyperlinks, we return to

the home page. We’ve implemented this by downloading a Wikipedia dump ( 11 GiB) and hosting it locally on the

INetSim server. Of course, any website will work, Wikipedia was just easy and contains lots of hyperlinks on each

page. Note that INetSim’s DNS redirection feature can be used here as well.

The second mode is called faking mode. In faking mode, the web server returns files (html, png, exe, etc.)

depending on what the client requests. For example: if the client requests

http://example.org/someprogram.exe, the server will answer with a real .exe file. Faking mode ‘fakes’

the response, but always returns a valid file for the extension the client requested. The client then executes the file

(if it’s executable). This is useful for simulating foolish users, whom just execute random files from the internet.

Of course, INetSim allows for selecting which files are be linked to a specific extension (e.g.‘sample.exe’ should be

linked to the ‘.exe’ extension).

File Transfer

The final simulation is described in [Req. 15]. This simulation is also primarily aimed to be a traffic generator: it

transfers files from and to a central file server. This is a common task in a corporate network, so it makes sense to

simulate it here.

Using the JCIFS library [96], we implemented the client much like the crawler mode of the Web simulation.

First, the client connects to the server and lists the root directory. When encountering a new file or directory, it is

queued for processing at a later time. Processing a directory is as simple as listing the contents of that directory

and (recursively) adding its contents to the queue. When a file is processed, we first check if the file is executable.

If it is, we transfer and execute the file. If it’s not, we simply download the file and subsequently remove the local

copy. Uploading files is performed as well, in a similar fashion to downloading files.

Unfortunately, INetSim does not support the SMB protocol. It does support the FTP protocol, but since SMB is

much more widely used, we opted to support the SMB protocol and find another solution for the server side. The

current implementation uses the Samba [87] project, an open-source implementation of the SMB protocol. The

Samba server can be located on the INetSim server, in case the operator wishes to enable some kind of interaction

Page 67: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

4.4 Data Storage & Processing 54

between the different simulations (e.g. email attachments get saved to the Samba server). This is currently not

implemented, but is easily achieved using INetSim.

4.4 Data Storage & Processing

This chapter covers the experiences and used techniques in data storage and processing components. Because this

thesis was centered around the development of ChiefNet, not the data processing aspect, we will not elaborate in

detail.

HDFS

In chapter 3, Design, the use of the Hadoop Distributed Filesystem (HDFS) is mentioned. This filesystem is primarily

used for storing network capture data files, which can get very large, depending on the activity in the network.

HDFS unfortunately adds an overhead to storage systems, leading to decreased performance.

To achieve optimal performance and minimize overhead, the storage system should be as close as possible

to the data source. The data source is the virtual machine, therefore it is highly recommended to place the HDFS

system on the hypervisors. As explained in detail in section 5.2, this results in nearly the same performance as the

raw storage system underneath HDFS.

In order to be able to provide the data to the processing framework as fast as possible, we chose not to use a

more efficient format (as described in section 2.4.1). Using Parquet or Text files would require preprocessing of the

pcap files, which introduces a non-trivial delay between the ingestion and processing of the data (see [43]).

OpenTSDB

For storing structured data, such asmetrics, we use OpenTSDB. OpenTSDB runs on Apache HBase, which itself runs on

HDFS. Tominimizemaintenance and simplify configuration, we store the HBase (and thus the OpenTSDB) databases

on the same HDFS system.

Proof-of-Concept Data Processing Task

Included in the source code of ChiefNet is a proof-of-concept Spark task for connecting to HDFS and processing

the pcap traffic capture files. Note that the same optimization as in HDFS can be applied here: the closer the

processing is performed to the data, the better the performance. Therefore, it is recommended to place the Spark

workers on the hypervisors (which are also the HDFS data nodes).

The task processes the pcap files in a streaming manner, using the Spark Streaming plugin. The packets are

read out by the hadoop-pcap library, after which the spark tasks does a simple count-by-destination-IP-address.

The results are printed to the console.

We mention the existence of this proof-of-concept task here because of the importance of noting that this

implementation uses the hadoop-pcap [97] library from RIPE-NCC. This library has been out of active development

Page 68: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

4.5 Ansible 55

for quite a while, so we modified it to run on the newest version of Spark to date (v2.3.0). The updated library

can be found at https://github.com/tvgdb/hadoop-pcap.

4.5 Ansible

To finish this chapter, we make the reader aware of the existence of numerous Ansible Playbooks. These playbooks

configure the lab environment for use with ChiefNet. The playbooks are included in the source code of ChiefNet,

under the ansible directory. We provide the following playbooks:

• Core: A playbook for installing the requirements and deploying KVM, libvirt and the server component of

ChiefNet.

• User Simulation: A playbook for configuring the user simulation programs.

• User Simulation: A playbook for configuring INetSim.

• User Simulation: A playbook for configuring the Samba server.

• Data Storage: A playbook to install the HDFS role.

• Data Storage: A playbook for installing the Apache HBase server.

• Data Storage: A playbook for installing the OpenTSDB server.

• Data Storage: A playbook for installing and configuring the Apache Kafka server.

• Data Processing: A playbook for installing and configuring the Apache Spark nodes.

• Data Visualization: A playbook for configuring Grafana and connecting it to the OpenTSDB server.

We will not go into further detail on these playbooks, but it may prove useful to know of their existence.

Page 69: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

EVALUATION 56

Chapter 5

Evaluation

This chapter covers the comparison between implementation (Chapter 4) and the requirements (as listed in Chapter

3). It also reviews the performance of this implementation, and some limitations of the elected design decisions.

5.1 Requirements

Chapter 3 lists the requirements which our implementation should fulfill. We now go over them again, adding our

presented solution (or why we failed to realize this requirement). A compact representation can be found in Table

5.1.

5.1.1 General Requirements

[Req. 1] ChiefNet must be easy to use.

By providing a simple GUI application to the operator, this requirement is met. The drag-and-drop

style manner of adding hosts in the network is dead simple, along with the visual representation of the

network topology (see Figure C.1).

[Req. 2] ChiefNet should be extensible.

We used an extremely popular programming language in every aspect of ChiefNet: the client, server

and user emulation packages are all written in Java. Additionally, we provided an explanation of the

major components of each package in Chapter 4. Future developers should have no problemmodifying

the source code of ChiefNet.

[Req. 3] It should be easy to repeat the same experiment.

As described in section 4.2.2, we provide a simple way of saving and loading the experiment: we use

human-readable XML files to portray experiments, allowing easy storing and reloading of the same

experiment.

Page 70: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

5.1 Requirements 57

Category Requirement Met? Solution

General Easy to use ✓ Drag-and-drop GUI

General Extensible ✓ Java programming language, documentation

in this paper

General Repeating the same experiment ✓ Saving and loading from storage

General Scalability ✓ The use of scalable technologies (KVM, Open

vSwitch, Spark, HDFS)

General Isolation ✓ Network isolation through VLANs, storage

isolation through virtual disk lifecycle

Virtualization Many supported OSs ✓ The use of VM-based virtualization

Virtualization jFed Integration × N/A

Virtualization Automated configuration ✓ Automatically executing commands through

the guest additions

Virtualization Automated OS installation ✓ VM templates

Networking Isolation ✓ The use of VLANs to segregate networks

Networking Guest Communication ✓ The use of a virtual serial port

User Simulation Extensible and cross-platform ✓ The use of Java and generic objects

User Simulation Email simulation ✓ INetSim email server and DNS redirection

User Simulation Web simulation ✓ INetSim web server and DNS redirection

User Simulation File transfer simulation ✓ Samba’s SMB server and the JCIFS library

Infection Predefined actions ✓ Schedulers on the hypervisors

Data Extraction From hosts ✓ Using the GuestCollector we extract in-

formation from inside the guest OS

Data Extraction From switches ✓ Using the NetCollector we capture live

traffic streams and packet metadata

Data Storage Scalability ✓ Using scalable technologies (HDFS,

OpenTSDB)

Data Storage Structured and raw data ✓ Using HDFS for raw data and OpenTSDB for

structured metrics

Data Processing Scalability ✓ Using scalable technologies (Spark)

Data Visualization Visualization system ✓ Using Grafana

Table 5.1: Evaluation of the requirements listed in Chapter 3

Page 71: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

5.1 Requirements 58

[Req. 4] ChiefNet should be scalable.

By using multiple hypervisor servers and scalable networking technologies (such as Open vSwitch), we

achieve high scalability. ChiefNet should be able to run tens to hundreds of hosts in the virtual network.

[Req. 5] The guests of ChiefNet should be isolated from the rest of the world.

As described in section 4.1.3, we use Open vSwitch’s VLAN feature to isolate the virtual network. If

desired, the operator could manually assign another uplink to the virtual switches, allowing commu-

nication with the outside world.

Storage-wise, these VMs are also completely isolated: their disks are copied from a template when the

VM is powered on, and deleted from the storage system when the VM is destroyed.

5.1.2 Virtualization

[Req. 6] ChiefNet should support a large number of operating systems.

By using the traditional VM-based virtualization, ChiefNet supports almost all x86 operating systems.

Examples include Microsoft Windows, Linux, FreeBSD, etc. Ports of non-x86 operating systems to the

x86 architecture are also supported, such as Android x86 and modified MacOS images.

[Req. 7] ChiefNet should integrate with jFed.

Unfortunately, this was not achieved. jFed’s design requires the administrator to perform major modi-

fications to the template’s operating system, which would limit ChiefNet’s ability to be as versatile as

possible. Therefore, we currently do not support the use of jFed in ChiefNet.

[Req. 8] Some form of automated configuration deployment should be available.

Using the guest additions (section 4.2.4), ChiefNet supports executing commands and up-/downloading

files to the guest. This allows the operator to, for example, execute an Ansible playbook upon the initial

boot of the guest machine.

[Req. 9] An automated way of installing the operating system is required.

We use templates, which deploy the virtual machine in the exact same state as the template was

constructed. The operating system (and some possible initial configuration) is deployed as part of the

template.

5.1.3 Networking

[Req. 10] The network should provide total isolation, even within the virtual network.

Through the use of VLANs in Open vSwitch, multiple virtual networks are created which are completely

segregated from each other.

Page 72: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

5.1 Requirements 59

[Req. 11] ChiefNet will have to support a way of communicating with the guest operating systems without

the traditional luxury of a TCP/IP network.

We used a per-VM virtual serial port to communicate with the guest operating system. A prerequisite

is the installation of the guest additions in the guest OS, but with the use of templating, this should

pose no issue to the operator.

5.1.4 User Simulation

[Req. 12] The user simulation software should be cross platform and extensible.

By using the Java programming language and generic simulated actions, we introduced a very powerful

way of simulating users and generating authentic traffic. See section 4.3 for more details.

[Req. 13] As a first ‘user action’, we would like to simulate sending and receiving emails.

We described this user action in section 4.3. Our current implementation supports SMTP, POP3, sending

emails, receiving email and, optionally, secure protocols (SMTPs and POP3s). Through INetSim’s DNS

redirection feature, this is completely transparent to the guest machine. Email attachments are also

executed, which is designed for possible malware propagation.

[Req. 14] The second user action we would like to support is web browsing.

We described this user action in section 4.3. We support two modes: crawling and faking. Crawling

mode is a simple traffic generator, walking from one webpage to another. Faking mode generates

authentic responses based on the request (e.g. if the client requests an image, a jpg file is returned).

Much like in the previous requirement, executable files are executed when encountered.

[Req. 15] Finally, as a third user action, we would like to support transferring files from and to a central

storage location.

We implemented this by using the most popular file transfer protocol, SMB. We used the open source

Samba implementation to act as our server, allowing clients to upload and download files from this

central location. The simulated user ‘crawls’ the server, looking for downloadable files. We direct the

reader to section 4.3 for more information.

5.1.5 Infection

[Req. 16] ChiefNet should include functionality to schedule predefined actions.

We described the implemented scheduling functions in section 4.2.3. The implementation allows for

executing an operation inside a guest operating system, through the use of the guest additions.

Page 73: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

5.2 Performance 60

5.1.6 Data Extraction

[Req. 17] We wish to extract metrics from the hosts.

As described in section 4.2.3, the GuestCollector regularly communicates with the guest operating

system and requests relevant information. We implemented a simple metric (a process list), but this

can easily be extended to just about any piece of data available.

[Req. 18] Network information should be collected from the virtual switches.

The NetCollector, as described in section 4.2.3, collects traffic streams and packet metadata, and

saves this information to HDFS and Kafka.

5.1.7 Data Storage

[Req. 19] The storage framework should be horizontally scalable.

We used the Hadoop Distributed Filesystem (HDFS), which is designed to be scalable to hundreds of

nodes, allowing capacities up to petabyes or even exabytes.

[Req. 20] The storage system(s) should be able to store both raw data as well as structured data.

We use HDFS for storing raw data (network traffic capture files) and OpenTSDB to store the structured

data, such as metrics collected from the guests and packet metadata.

5.1.8 Data Processing

[Req. 21] The processing framework should be horizontally scalable.

We use the Apache Spark processing framework, which is horizontally scalable by design.

5.1.9 Data Visualization

[Req. 22] A data visualization system should be in place.

Grafana supports many forms of graphs and visualizations, and has build-in support for OpenTSDB.

5.2 Performance

An important factor of ChiefNet is the overhead it causes in the performance of the virtual environment. Therefore,

we performed some experiments of which the results are available below.

It is important to remark that the data storage and processing technologies used in ChiefNet are horizontally

scalable. We performed these experiments on a simple two-node cluster (described below), however, future users

of ChiefNet can increase performance greatly by adding more nodes to the cluster.

Page 74: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

5.2 Performance 61

Physical machines used in the experiment

We used two seperate (physical) machines, which are part of the Virtual Wall. The specifications are described

below. They had a single, direct 1Gbit link functioning as interconnect.

• Node 1 (SuperMicro X9DRW-3LN4F+)

– CPU: 2x Intel Xeon E5-2650 v2 (2.60GHz, 8 cores, 16 threads)

– RAM: 128 GiB

– Network: 1Gbit connection

– Storage: 8x 300GB SAS 10K RPM drives in RAID-5

• Node 2 (SuperMicro X9DRW-3LN4F+)

– CPU: 2x Intel Xeon E5-2650 v2 (2.60GHz, 8 cores, 16 threads)

– RAM: 48 GiB

– Network: 1Gbit connection

– Storage: 8x 600GB SAS 10K RPM drives in RAID-0

As we can see, Node 1 has more memory, allowing it to run more VMs, but suffers from the degraded perfor-

mance of RAID-5. Node 2 has less memory, but the RAID-0 array should provide considerable performance. Both

nodes have ample CPU resources, each containing 16 cores (or 32 threads).

5.2.1 Virtualization Performance

It’s difficult to define metrics that represent the performance after virtualization. While experimenting, I found the

virtual machines to be very snappy and responsive. For more information about performance statistics of KVM, we

redirect the reader to [98].

Storage-wise, the results are very similar. We performed two tests by deploying the same template twice but

with different deployment configurations. The first test used a raw copy of the template’s virtual disks. The second

used a qcow2 linked clone. Both tests resulted in near the exact same performance ( 95% of the host’s perfor-

mance). Therefore, we can conclude that the usage of qcow2’s linking feature isn’t detrimental to performance.

5.2.2 Network Performance

KVM and libvirt supportmany types of virtual network interfaces. In our tests, we chose the e1000 (virtual) chipset.

This chipset is a very popular option and accurately represents the real world, since most hardware NICs today are

also 1 Gbit chipsets.

The only exception to this is when the guest uses the Windows XP operating system. XP doesn’t natively

include the e1000 drivers, and with no connection to the outside world (thus no internet), it doesn’t have any way

Page 75: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

5.2 Performance 62

Figure 5.1: Topology of the performed network tests

of downloading said drivers. Therefore, Windows XP defaults to the rtl8139 virtual NIC chipset (which is only

100Mib).

We performed three tests, which are described below and in Table 5.2. It’s important to note that we disabled

traffic capturing for these tests, to avoid bottlenecks.

1. The first test was performed as follows: two guests were constructed and put in the same virtual network.

We then copied a 5 GB file from one node to the other. The network topology can be seen in Figure 5.1. The

virtual machines were forcibly placed on the same hypervisor. Placing these VMs in the same hypervisor

ensures that no network traffic will actually leave this physical machine, since the VMs are connected to the

same (internal) virtual network.

We observed that the network speed was almost at the chipset rate: 119,2 MB/s (about 0,952 Gbps). We

conclude that the usage of virtualization (Open vSwitch and KVM) only slightly degrades performance.

2. Following the previous experiment, we were out to test the limits of Open vSwitch’s virtual switches. We

performed tests with 10, 20 and 25 nodes, always using 1Gbps vNICs. We could not find the limit: at 25 Gbps,

we simply concluded that Open vSwitch is very fast. Further experiments would be futile anyway, since the

interconnection link between different hypervisors is only 1 Gbps.

3. The third experiment was to measure inter-hypervisor network performance. The same network topology

as in the first test was used, but this time, the nodes were forcibly placed on different hypervisors.

The results showed that a minor overhead was introduced by having to transfer the traffic out of one hyper-

visor and into another: the network speed was measured at 113,4 MB/s (about 0,9072 Gbps). We conclude

that requiring the traffic to exit one hypervisor and enter another adds an overhead of about 10%. This is,

of course, still more than adequate for our purposes.

Page 76: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

5.2 Performance 63

Benchmark Performance (raw) Performance (%)

Theoretical Limit 1 Gbps 100 %

Test 1: VMs on the same hypervisor 0.952 Gpbs 95.2 %

Test 3: VMs on different hypervisors 0.9072 Gbps 90.72 %

Table 5.2: Network Performance Test Results

5.2.3 Data Storage Performance

To test the performance of our data storage system, we will test both components (HDFS and OpenTSDB) individ-

ually.

OpenTSDB

To test OpenTSDB, which stores metrics from the guest operating systems, we will need two things: a substantial

amount of guests and an interval to collect metrics from these guests.

Since OpenTSDB is most probably more than adequate to handle our workload, we only performed one simple

test. The test included 35 small guests and was configured with a collection interval of 5 seconds. As previously

stated, the only metric currently collected is a process list. We confirmed the average number of processes in our

chosen guest template is around 55. Thus, 1925 (35× 55) events will be submitted to OpenTSDB every 5 seconds.

The result was as expected: OpenTSDB handled our load without any issue. The CPU usage increased slightly

when we submitted the metrics, but only for a moment.

We note that the metrics aren’t directly submitted to OpenTSDB, instead they are published in Apache Kafka.

OpenTSDB maintains a connection to the Kafka server, saving metrics as they are published. The use of Kafka

carries two advantages: 1. our data processing framework (Spark) doesn’t have to query OpenTSDB for real-time

metrics , and 2. Kafka is very good at acting as a ’buffer’ for sudden influxes of data. Since we publish all metrics

at the same time (after a specified interval), thereby creating a sudden influx after every interval, this is certainly

beneficial.

HDFS

The HDFS component is, in our opinion, the most probably location of a bottleneck in the system. The performance

of HDFS mainly depends on two items: the performance of the underlying storage system and the performance of

the connection between the HDFS nodes. In our experiments, the storage system should be decently fast (8x 10K

RPM disks), but the connection speed is limited to 1Gbps.

We performed two tests:

1. The first test was performed to confirm the bottleneck of the connection speed. We used the same setup as

the last networking test (2 virtual machines, each on different hypervisors). This time though, we enabled

the capturing of network traffic between the nodes.

Page 77: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

5.2 Performance 64

We configured HDFS with a replication factor of 2. In other words, each byte of data will be stored on 2

different HDFS nodes, providing redundancy in case one node fails. This forces HDFS to synchronize the

traffic capture files across both nodes, using the 1Gbps link as the interconnect.

The results are as expected: network performancewas severely degraded. The file copy speedwasmeasured

at 28.6 MBps (0,2288 Gbps). This seriously impacts the performance of ChiefNet, of course. Using this setup,

the total maximum speed of the virtual network would be limited to the measured speed (0,2288 Gbps).

Fortunately, a solution is presented in the following test.

2. Following the degraded performance of the previous test, we came up with a very simple solution to our

problem. The fix is applied by simply setting the replication factor to 1, which allows HDFS to only use

the local storage of that hypervisor, eliminating the need to synchronize files and therefore not using the

interconnection link. The local storage system is heaps faster than 1Gbps, which should result in much better

traffic capture performance.

Indeed, when revising the previous test, our results are much better. The file transfer speed was measured

at 94.7 MBps (0,7576 Gbps). This is still lower than the network test’s result, but is certainly an acceptable

overhead.

We note that our solution does have a disadvantage: when setting the replication factor to 1, HDFS provides no

redundancy. In other words, when a hypervisor node fails, the saved network traffic on that node will be lost.

Another solution, which would allow a replication factor greater than 1, would be to provide a faster connection

link between the HDFS nodes (for example a 10Gbps or even 100Gbps link).

5.2.4 Data Processing

The performance of our data processing system (Apache Spark) is hugely dependent on the implemented Spark

job. For example, a machine-learning based task will be much slower than a simple statistical-based task.

Nonetheless, we tested our proof-of-concept Spark task. This tasks simply reads the network traffic files from

HDFS and counts the number of packets destined for the same destination IP address.

We repeated the second HDFS experiment, capturing data at a rate of 0,75 Gbps. In our experiment, Spark had

no trouble keeping up with the ingress of data at this rate.

5.2.5 Guest Additions

Another limitation of ChiefNet lies in the use of a virtual serial port in the guest additions package. Since serial

ports are an old technology, their speeds aren’t comparable to modern day connections.

In ChiefNet, the virtual serial port operates at a baud rate of 9600 bits per second (bps) with 8 data bits and 1

stop bit. No parity bits are used. This (roughly) delivers a speed of 9600 bps× (8/9) = 8533 bps = 8.533 kbps.

For putting this number into perspective, transferring a 1 MB file would take over 15 minutes.

Page 78: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

5.2 Performance 65

It is therefore safe to say that this isn’t a great way to transfer large files. Instead, it is used to transfer small

files, often configuration files or metrics of some kind.

Higher performance can be achieved by increasing the baud rate of the virtual serial port, for example to 115200

bits/s. This would result in a performance increase of 1200%. Unfortunately, documentation around virtual serial

ports in libvirt and KVM is scarce. We haven’t found any way of setting the speed to something other than 9600

bits/s.

Page 79: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

CONCLUSION 66

Chapter 6

Conclusion

The goal of this thesis was to create a framework for security researchers to build upon and use in their research.

ChiefNet should provide researchers with everything they need to develop scalable threat detection and mitigation

systems based on Big Data processing techniques.

We started with a thorough look at the current state-of-the-art, looking into topics we’ll certainly need such

as network modeling, network attacks, intrusion detection systems, data collection and data processing. With this

knowledge, we listed 22 formal requirements of ChiefNet in Chapter 3. These requirements lay out all the necessary

core features of ChiefNet, which future developers and security researchers can build upon. Therefore, one of the

important non-formal requirements is extensibility: the possibility to allow future developers to add and modify

ChiefNet. Another important factor was isolation: the constructed network should be isolated from the outside

world, negating the possibility of malware spreading outside the experiment’s network.

We then implemented these design criteria in Chapter 4. Aspects like used technologies, architecture and

a proof-of-concept detection technique are all discussed in Chapter 4. Some creative, untypical solutions were

sometime required, for example the use of a virtual serial port to provide a communication medium with the guest

operating system. An important ingredient of ChiefNet is the user simulation software. We implemented three

basic user simulation programs: sending and receiving emails, browsing the web and transferring files. Each of

these simulations have different parameters, which are all described in Appendix D. Appendix D also includes more

technical documentation, such as a description of the configuration files and example commands on how to run

ChiefNet.

In Chapter 5, we showed that 21 of the 22 requirements were fulfilled. We can therefore conclude that the main

goal is achieved. Chapter 5 also inspects the performance of the different components of ChiefNet. We found that

many of the used technologies (Spark, HDFS, OpenTDSB, etc) are very scalable, often even horizontally scalable. The

core virtualization components (KVM, libvirt, Open vSwitch) were also found to be very capable, adding minimal

overhead to the system. Some limitations were also presented, such as the limited write speed of HDFS and the

overhead of capturing live network traffic. We offered (partial) solutions to these limitations and included the

disadvantages of using these, but overall these performance limitations can often be mitigated by using more or

faster hardware (e.g. SSDs in stead of spinning hard drives).

Page 80: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

FUTURE WORK 67

Chapter 7

Future Work

As mentioned, ChiefNet isn’t meant to be finished by the end of this thesis. Instead, ChiefNet should be extensible,

so that future developers and interested parties can contribute to its source code. This document provides docu-

mentation and my thoughts on some design decisions, but aren’t meant to be interpreted as strict rules. We note

that ChiefNet is still in alpha stages, which means that the presence of bugs is a likely possibility.

We will now cover some possible future extensions and improvements.

7.1 User Interface

To improve the general overview of the network, it might prove useful to provide the operator with the ability to

define a scheduled action which checks if a guest machine was infected with the currently-investigating threat. To

this end, a simple procedure which executes a command (e.g. ”Is process virus.exe running”) will be required. If

the procedure returns a positive result, the guest could be marked as infected in the ScenarioBuilder network

overview (e.g. by coloring the name in red).

7.2 Virtualization

Hiding the VM environment

Some malware is constructed to modify its behavior when it detects it’s running in a virtual environment. For

example, the malware could not execute the malicious payload if it detects it’s running in a VM.

KVM has a number of options to hide this virtualization from the guest machine, including masking the NIC

drivers, hiding the CPU information and modifying the BIOS properties such as manufacturer, motherboard model,

etc.

Page 81: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

7.3 Networking 68

Containers

In Chapter 4, we made the decision to use traditional VM-based virtualization. This provided advantages such as

the flexibility to use any x86 operating system. However, the drawback is of course scalability. A typical hypervisor

can handle tens (maybe hundreds) of guests.

When using containers however, one can expand to thousands of guests. The disadvantage is the limited

options for guest operating systems. If the choice of guest operating system isn’t important, and scalability is, it

might prove useful to use container-based virtualization.

Docker could be a great candidate, but this will require some major modifications around the ChiefNet source

code. In addition, Docker’s integration with Open vSwitch has not been researched yet.

jFed

The only requirement from Chapter 3 we couldn’t fulfill is the use of jFed in ChiefNet. An bonus feature of ChiefNet

was to interact with jFed and use the existing APIs for running the experiments on the Virtual Wall. However,

this would limit our choice of guest operating systems, as jFed requires manual configuration of operating system

templates.

If this limitation of OS choice is of no concern to the researcher, the use of jFed may prove useful for scalability

and integration with other existing tools. Therefore, it could be useful to modify ChiefNet to support jFed APIs.

Fault Tolerance

Currently, ChiefNet offers no redundancy in case a hypervisor node fails. When a node fails (e.g. due to power

outage or hardware failure), the VMs on that node will stop executing. The experiment may fail because of this.

KVM/libvirt don’t support fault tolerance out-of-the box, but some solutions exist that work atop of KVM to

provide fault tolerance (e.g. Kemari [99]).

7.3 Networking

As stated in section 4.2.2, the client is currently responsible for synchronizing the networking configuration across

all hypervisor hosts. This responsibility is hard to get right, because of the non-transactional nature of the oper-

ation. Suppose a new vSwitch is added, and the client starts configuring Open vSwitch and KVM on the different

hypervisors. If one hypervisor fails to apply this change (suppose the last one processed), all changes to the other

hypervisors have to be rolled back (in order for the configuration to stay synchronized).

In the current state of ChiefNet, this rollback is not supported. If the client encouters an error while configuring

the virtual network, the program simply displays an error message and leaves the networking configuration in its

non-synchronized state.

Other virtual networking platforms, such as VMWare vNetwork or Cisco Nexus 1000V, provide a solution to this

problem in theway of a ”virtual distributed switch”. Unfortunately, Open vSwitch does not have similar functionality.

Page 82: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

7.4 Guest Additions 69

7.4 Guest Additions

Parallel Port

The guest additions currently use a virtual serial port to bridge the gap between the hypervisor and the guest.

Another possibility, one which we only just discovered, is the use of a parallel port. The name refers to the way

data is sent across the medium: parallel ports sendmultiple bits at once, in parallel, as opposed to serial interfaces

that send bits one at a time.

As a result of this parallel sending of bits, parallel ports are much faster than serial ports: the old (hardware)

parallel ports have a data transfer speed of 150 000 bits/s, but the newer Enhanced Parallel Ports support speeds

of up to 2 MB/s.

Priority Queue

In section 4.2.3, we described the current procedure of submitting a request to communicate with a specific guest

machine. We mentioned that the current implementation uses a simple queue, on a first-in-first-out basis.

A possible extension here is to use a priority queue. A priority queue attributes a priority to each item in the

queue. Higher priorities get handled first, lower priorities can be pushed back to make room for higher priority

items.

This priority could be used to indicate the importance of the communiqué. For example, automated communi-

cations could have a lower priority than communiqués requested by an operator.

Online Update of the Guest Additions Package

When a developer wishes to add a GuestCommand, he/she is required to update the guest additions program on

all guests the operator wishes to use. This is a major hassle if we’re talking about tens or even hundreds of VM

templates.

A possible solution to this problem is the use of a second virtual serial port to serve as a simple ‘update channel’.

The second serial port is connected to a simple program, which only allows updating the primary guest additions

package. The update could, for example, be performed immediately after the guest’s first heartbeat, before the

configuration takes place.

IP redirection

We talked about INetSim’s DNS redirection feature, which provides transparent DNS redirection to the guest ma-

chines. This allows us to let the guestmachine think it’s connecting to the actualmail server (e.g. mail.example.

org), while actually connecting to our INetSim server.

In reality, some malware may have the mail address’ IP address hard coded in the source code. Since this

malware does not use DNS lookup, it won’t connect to our INetSim instance.

Page 83: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

7.5 Data Processing 70

A possible solution for this is the use of port-based IP redirection. This is a feature of the gateway router, which

reroutes packets destined for email ports (such as TCP ports 25, 587, 465 and 993) to the IP of our INetSim instance.

This feature is not natively supported in INetSim, but some open source routing distributions (e.g. VyOS [100]) do.

7.5 Data Processing

We implemented a simple proof-of-concept task to approximate an intrusion detection system. In reality, intrusion

detection techniques are much more complex.

We therefore propose somework to be done in implementing different Spark tasks, for example the techniques

outlined in [19] could prove useful to detecting man-in-the-middle attacks (see section 2.2.2).

Another possibility is the detection of incoming distributed denial of service attacks (DDoS). We discussed DDoS

attacks in section 2.2.3. Possible countermeasures are defined in [22].

Page 84: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

71

Appendices

Page 85: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

ABSTRACT (NL) 72

Appendix A

Abstract (NL)

Niet meer dan een generatie geleden was ‘cyber security’ maar een zeldzaam-gehoorde term. Tegenwoordig

passeert er geen week zonder een nieuwsbericht over een ‘hack’ op een belangrijk computersysteem. Dit is soms

het gevolg van toeval, maar meervoorkomend een doelgerichte – vaak door de staat gesponserde – aanval op

nationale of internationale infrastructuur.

Het is geen geheimdat er alsmaarmeer aspecten van ons dagelijks levenworden geautomatiseerd. Echter gaat

er veel minder gedachtegang in hoe deze nieuwe systemen ons dagelijks leven zullen beïnvloeden wanneer zij zich

misdragen (door defect of opzettelijke bedoeningen). Ook wordt er niet lang genoeg bij stilgestaan hoe zij onze

levenskwaliteit zullen aantasten wanneer ze – zelfs tijdelijk – onbeschikbaar worden. Bijvoorbeeld, Distributed

Denial of Service (DDoS) aanvallen zijn zeer disruptief, zowel voor het slachtoffer als voor de gebruikers van de

service. Om onderzoek te kunnen doen naar het verijdelen van deze bedreigingen is er een duidelijke nood van

beveiligings-experts om deze aanvallen te kunnen simuleren in een geïsoleerde, afgesloten omgeving.

In deze thesis introduceren we ChiefNet.

ChiefNet is een framework, ontworpen voor beveiligings-experts. ChiefNet is gemaakt met schaalbaarheid en

gebruiksgemak in het achterhoofd. Het laat onderzoekers toe om verschillende beveiligingsoplossingen snel en

eenvoudig te evalueren door ze te testen op bestaande bedreigingen in een afgesloten omgeving.

Om dit geïsoleerde netwerk te bekomen gebruikt ChiefNet virtualisatie en virtuele netwerken. Echter ont-

breken we nog één cruciaal deel van een netwerk, namelijk de gebruikers. Om een authentiek, wereldecht netwerk

na te bootsen, simuleert ChiefNet deze gebruikers. De gesimuleerde gebruikers doen het gedrag van echte ge-

bruikers na: ze versturen en ontvangen emails, ze surfen op het web, ze openen bestanden, etc.

Page 86: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

NON-TECHNICAL SYNOPSIS 73

Appendix B

Non-Technical Synopsis

We understand the need for a short non-technical summary of the performed thesis, in order to convey the infor-

mation and insights we learned to the general public. Therefore, the next few paragraphs will lay out this thesis

in layman’s terms.

In the security research community, some researchers came to the conclusion that they needed some way of pur-

posefully infecting computers with a virus in order to study the workings of said virus. However, due to the scale

and the malicious nature of these threats, they can’t just buy some physical computers and put them in a network.

Therefore, they needed some tool to help them set up an isolated network in order to research these viruses.

That’s what we did: we build a tool (called ChiefNet) which allows researchers to set up these isolated envi-

ronments. ChiefNet also offers some useful tools and utilities to aid these researchers in their investigation. We

mentioned in the abstract that ChiefNet uses virtualization and virtual networks. Virtualization really boils down

to running multiple computers on a single physical box. This helps with the scalability of the task: real world

company networks often have hundreds or even thousands of computers. It wouldn’t be realistic if these security

researchers had to buy and set up all these machines in order to run their experiment, and then toss the machines

in the trash. To this end, ChiefNet uses ‘disposable’ machines, called Virtual Machines (VMs). The virtual networking

part is simply the fabric that connects all these virtual machines, so that they can talk to each other.

Additionally, a network has users. These users are at the core of the network: they use the network to perform

their tasks. Because many viruses today use this fact to aid itself (think of a virus that injects itself in every Word

document you create), ChiefNet also allows the researcher to simulate users in the network.

Finally, we wish to stress that ChiefNet itself is not an anti-virus tool. It’s actually the opposite: it’s a pro-virus

tool, but developed with the aim to learn about these viruses and therefore be better at creating ‘antidotes’ for

them. Computer viruses are not that different from biological viruses: there is some point of infection, there are

symptoms, the virus tries to infect other entities in the vicinity of the infected entity, etc. To understand these bio-

logical viruses, many research laboratories have been build all over the world. These laboratories also purposefully

infect living creatures (think rats, birds, etc.) to study the virus’s behavior. ChiefNet is just like these laboratories: it

provides researchers with the means and tools to study viruses (the computer kind), without causing an outbreak.

Page 87: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

SCREENSHOTS 74

Appendix C

Screenshots

This appendix contains a few screenshots to illustrate the design of the application.

Page 88: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

SCREENSHOTS 75

FigureC.1

:The

Scen

ario

Buil

derinterfa

ce

Someico

nsareso

urcedfro

m[10

1]

Page 89: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

SCREENSHOTS 76

FigureC.2

:The

Sche

dule

Mana

gerinterfa

ce

Page 90: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

SCREENSHOTS 77

Figure C.3: The VirtualSwitchManager interface

Figure C.4: The configuration options for adding a new scheduled action

Figure C.5: The interface for creating or editing a Procedure

Page 91: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

CONFIGURATION 78

Appendix D

Configuration

D.1 XML Files

ChiefNet is primarily configured through the use of configuration files, which are all in XML formatting. We explain

the structure of these files below.

D.1.1 Client XML Configuration File

The client XML configuration file defines the list of hypervisors to connect to. It also includes the URL of the SDN

controller. The example file can be found in Listing D.1.

Listing D.1: Example client XML configuration file<config>

<sdn-controller></sdn-controller><hosts>

<host><address>host0.example.org</address><port>16509</port><username>libvirtadmin</username><password>SuperSecurePassword</password><interconnect-link>eth0</interconnect-link>

</host><host>

<address>host1.example.org</address><port>16509</port><username>libvirtadmin</username><password>SuperSecurePassword</password><interconnect-link>eth0</interconnect-link>

</host></hosts>

</config>

• Each host has an address at which the client will connect. This can be an IP address, or an DNS domain

name.

Page 92: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

D.1 XML Files 79

• The port attribute is the libvirt port to connect to (default is 16509).

• The username and password options are the authentication parameters for libvirt.

• Finally, the interconnect-link element specifies the interface that connect all hypervisors. This inter-

face will be attached to the Open vSwitch bridge and therefore be used as the uplink of the virtual machines.

D.1.2 Scenario XML Definition

This section illustrates the generated XML file when saving a Scenario from the application. The XML file is very

human-readable and can therefore be easily changed manually or by a third-party application. The example file

can be found in Listing D.2.

Listing D.2: Generated XML file when saving a Scenario<?xml version="1.0" encoding="UTF-8" standalone="yes"?><scenario>

<hosts><virtualCPUs>2</virtualCPUs><memory>2048</memory><hostname>host0</hostname><networkInterfaces>

<virtualSwitch><name>vswitch0</name><vlan>1</vlan>

</virtualSwitch><model>e1000</model><macAddress>7c:e1:6e:68:c4:e0</macAddress><interfaceNumber>0</interfaceNumber>

</networkInterfaces><template name="Windows 7"/>

</hosts><hosts>

<virtualCPUs>2</virtualCPUs><memory>2048</memory><hostname>host1</hostname><networkInterfaces>

<virtualSwitch><name>vswitch0</name><vlan>1</vlan>

</virtualSwitch><model>e1000</model><macAddress>ac:c4:f1:15:d8:6b</macAddress><interfaceNumber>0</interfaceNumber>

</networkInterfaces><template name="Windows 7"/>

</hosts><scenarioId>0.31692483515090597</scenarioId><scheduler>

<scheduledProcedureTask><triggerType>PERIODIC</triggerType><intervalInSeconds>10</intervalInSeconds><name>some scheduled task</name><procedure/><vm>

<virtualCPUs>2</virtualCPUs>

Page 93: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

D.1 XML Files 80

<memory>2048</memory><hostname>host0</hostname><networkInterfaces>

<virtualSwitch><name>vswitch0</name><vlan>1</vlan>

</virtualSwitch><model>e1000</model><macAddress>7c:e1:6e:68:c4:e0</macAddress><interfaceNumber>0</interfaceNumber>

</networkInterfaces><template name="Windows 7"/>

</vm></scheduledProcedureTask>

</scheduler></scenario>

• A Scenario has zero, one or more hosts element(s). This hosts is not to be confused with the host

element from the client XML configuration file: we’re talking about virtual machines here.

• The Scenario also contains a scheduler element, which in itself may contain a number of

scheduledProcedureTasks.

• The hosts element contains some (self-explanatory) elements, such as virtualCPUs and memory (in

MB).

• Ahost can contain a number ofnetworkInterfaces, eachwith its ownproperties, such as itsmacAddress,

the connected virtualSwitch and the model.

• The Scenario also contains a scenarioId, which is simply a random number uniquely identifying this

Scenario.

D.1.3 Procedure XML Definition

Procedures can be saved to XML files too. We’ll cover the XML definition below, along with an example in Listing

D.3.

Listing D.3: Generated XML file when saving a Procedure<?xml version="1.0" encoding="UTF-8" standalone="yes"?><procedure>

<cliGuestCommand><command>whoami</command><timeout>0</timeout>

</cliGuestCommand><uploadFileGuestCommand>

<srcFile>/opt/file.txt</srcFile><destFile>/tmp/file.txt</destFile>

</uploadFileGuestCommand><downloadFileGuestCommand>

Page 94: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

D.1 XML Files 81

<sourceFile>/var/log/apache2/error.log</sourceFile><destinationFile>/tmp/apache_error.log</destinationFile>

</downloadFileGuestCommand><ansiblePlaybookGuestCommand>

<srcFile>/opt/playbook.yaml</srcFile></ansiblePlaybookGuestCommand><uploadAndExecuteFileGuestCommand>

<srcFile>/opt/sample.exe</srcFile></uploadAndExecuteFileGuestCommand>

</procedure>

• The Procedure simply has a list of GuestCommands, which will be executed in the same order as their

definition in the XML file. Some examples of GuestCommands can be found in Listing D.3.

• The cliGuestCommand executes a command on the guest. The timeout element specifies the number

of seconds the commandmay run, after which the command is forcibly killed an exception is thrown. A value

of 0 specifies no timeout is defined.

• The uploadFileGuestCommand, uploadAndExecuteFileGuestCommand and

downloadFileGUestCommand are pretty self-explanatory.

• The ansiblePlaybookGuestCommand uploads and executes an Ansible Playbook.

D.1.4 HostTemplate XML Definition

The HostTemplates, from which the guests are created, are defined on each hypervisor. The XML file contains

simple information needed to create a virtual machine from this template. An example can be found in Listing D.4.

Listing D.4: HostTemplate XML definition<host>

<name>Windows 7</name><os>WINDOWS_7</os><harddrives>

<harddrive>/var/lib/libvirt/images/win7.qcow2</harddrive></harddrives><features>

<guest-additions>true</guest-additions></features><defaults>

<cpu>2</cpu><memory>2048</memory><nic-model>e1000</nic-model>

</defaults></host>

• Each HostTemplate has a user-friendly name, which is shown in the Graphical User Interface.

Page 95: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

D.2 Program Arguments 82

• The os element specifies the operating system family of the template. Currently, the values LINUX,

WINDOWS_XP, WINDOWS_7 and OTHER are supported. This is used to display a nice operating system icon

in the

ScenarioBuilder (see Figure C.1).

• The most notable element of the template is the harddrives attribute, which contains a list of locations

for the virtual disk drives of this template. Each of these virtual disks is cloned when a virtual machine is

instantiated.

• The features element contains a list of ‘features’, which this template supports. At this time, the only

feature is the guest-additions element, which represents the ability of the template to communicate

with the hypervisor through the Guest Additions (see section 4.2.4).

• The defaults element has a list of sensible defaults for this template, including cpu count, memory size

and the model for virtual network interfaces (nic-model). These are all optional, of omitted the defaults

are set to 1 CPU, 512MB of memory and the NIC model e1000.

D.2 Program Arguments

The configuration files above specify the most important options for ChiefNet. However, some configuration is also

done through the use of command line parameters (arguments).

This is primarily important for the user simulation packages, where uploading a configuration file may be a

hassle we wish to avoid.

D.2.1 Client Arguments

The client only has a single argument: the XML configuration file file we covered above. An example command of

running the client is shown in Listing D.5. A list of the arguments is show in Listing D.6.

Listing D.5: Running the ChiefNet client

java −jar chiefnet−client.jar −c configuration.xml

Listing D.6: ChiefNet client arguments

usage: chiefnet

-c,--config-file <arg> Configuration file (config.xml)

D.2.2 Server Arguments

The server has a couple of arguments, all of which are show in Listing D.8. Additionally, an example command of

how to run the server is shown in Listing D.7.

Page 96: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

D.2 Program Arguments 83

Listing D.7: Running the ChiefNet server

java −jar chiefnet−server.jar −hdfs ”hdfs://192.168.0.1:9000” −kafkaserver ”↪→ 192.168.0.2” −kafkazookeeperserver ”192.168.0.2” −t ”/var/lib/libvirt/↪→ templates/”

Listing D.8: ChiefNet server arguments

usage: chiefnet-server

-hdfs,--hdfs-path <arg> Path to the HDFS system where I'll store the

↪→ collected PCAPs (eg. 'hdfs://192.168.0.1:9000'). Don't add the

↪→ trailing slash!

-kafkaserver,--kafka-server <arg> Kafka endpoint to publish metrics

↪→ to

-kafkazookeeperserver,--kafka-zookeeper-server <arg> The Zookeeper

↪→ instance Kafka is running on

-t,--template-directory <arg> Template directory

D.2.3 User Simulation Arguments

The user simulation packages don’t use any XML configuration files, because we want these to be as portable as

possible. The three implemented simulations (web, email and file transfer) are each described below.

Note: when we’re talking about the ‘interval’ of a simulation, we follow a normal distribution.

Web

The arguments and an example command of running the user web simulation is provided in (respectively) Listing

D.10 and Listing D.9.

Listing D.9: Running the ‘web’ user simulation

java −jar chiefnet−user−emulation−web.jar −server http://example.org/ −port 80↪→ −int_stddev 5 −int_mean 15

Listing D.10: User Simulation (Web) arguments

usage: chiefnet-user-emulation-web

-disable_cert_check,--disable-cert-check Disable SSL certificate

↪→ checking (default = checking enabled) (optional)

-enable_ssl,--enable-ssl Enable SSL (default = false) (remember to

↪→ set port to 443 if required) (optional)

-int_mean,--interval-mean <arg> Browse to the next website every X

↪→ seconds

Page 97: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

D.2 Program Arguments 84

-int_stddev,--interval-stddev <arg> Standard deviation in the

↪→ interval (for randomness)

-port,--port <arg> HTTP port of the server to connect to (default =

↪→ 80) (optional)

-server,--server <arg> HTTP address of the server to connect to

Email

The arguments and an example command of running the user email simulation is provided in (respectively) Listing

D.12 and Listing D.11.

Listing D.11: Running the ‘email’ user simulation

java −jar chiefnet−user−emulation−email.jar −pop3_server mail.example.org −

↪→ smtp_server mail.example.org −username john −password ”↪→ SuperSecurePassword” −mbox sample.mbox −int_mean 30 −int_stddev 15

Listing D.12: User Simulation (Email) arguments

usage: chiefnet-user-emulation-email

-int_mean,--interval-mean <arg> Execute tasks every X seconds (both

↪→ sending mail as checking for received mail)

-int_stddev,--interval-stddev <arg> Standard deviation in the

↪→ interval (for randomness)

-mbox,--mbox <arg> .mbox file. Reads emails from this mbox and re-

↪→ sends them.

-open_attachments,--open-attachments Should I open attachments in

↪→ received emails? (optional)

-password,--password <arg> SMTP/POP3 password

-pop3_port,--pop3-port <arg> The port of the POP3 server (default =

↪→ 110) (optional)

-pop3_server,--pop3-server <arg> The POP3 server where I should check

↪→ for new mail.

-smtp_port,--smtp-port <arg> SMTP server port (default = 25) (

↪→ optional)

-smtp_server,--smtp-server <arg> SMTP server address

-username,--username <arg> SMTP/POP3 username

File Transfer

The arguments and an example command of running the user file transfer simulation is provided in (respectively)

Listing D.14 and Listing D.13.

Page 98: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

D.3 Ansible Playbooks 85

Listing D.13: Running the ‘file transfer’ user simulation

java −jar chiefnet−user−emulation−smb.jar −server 192.168.0.1 −int_mean 60 −

↪→ int_stddev 30 −execute_chance 5 −download_chance 90 −upload_chance↪→ 15 −upload_file document.docx −upload_file picture.jpg −upload_file↪→ sound.mp3

Listing D.14: User Simulation (File Transfer) arguments

usage: chiefnet-user-emulation-smb

-domain,--domain <arg> Domain. If not set, we'll use the guest

↪→ account. (optional)

-download_chance,--download-chance <arg> Chance for a file to be

↪→ downloaded (0-100%)

-execute_chance,--execute-chance <arg> When encountering an

↪→ executable file, this is the chance that file is executed

↪→ (0-100%)

-int_mean,--interval-mean <arg> Process files/directories every X

↪→ seconds

-int_stddev,--interval-stddev <arg> Standard deviation in the

↪→ interval (for randomness)

-password,--password <arg> Password. If not set, we'll use the guest

↪→ account. (optional)

-port,--port <arg> SMB port to connect to (default = 445) (optional)

-server,--server <arg> IP address of the SMB server to connect to

-upload_chance,--upload-chance <arg> Chance for uploading a file to a

↪→ directory (0-100 %)

-upload_file,--upload-file <arg> Path to a file to upload (this

↪→ argument can be passed multiple times)

-username,--username <arg> Username. If not set, we'll use the guest

↪→ account. (optional)

D.3 Ansible Playbooks

The provided ansible playbooks are too big to cover here, but they should be self-explanatory.

Page 99: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

BIBLIOGRAPHY 86

Bibliography

[1] Software-defined networking - Wikipedia. https : / / en . wikipedia . org / wiki / Software -

defined_networking. (Accessed on 04/25/2018).

[2] Jaehoon Jeong et al. “A framework for security services based on software-defined networking”. In: Ad-

vanced Information Networking and Applications Workshops (WAINA), 2015 IEEE 29th International Con-

ference on. IEEE. 2015, pp. 150–153.

[3] Justin Pettit et al. Virtual switching in an era of advanced edges. 2010.

[4] Open vSwitch. https://www.openvswitch.org/. (Accessed on 04/25/2018).

[5] Prasad Gorja and Rakesh Kurapati. “Extending open vSwitch to L4-L7 service aware OpenFlow switch”. In:

Advance Computing Conference (IACC), 2014 IEEE International. IEEE. 2014, pp. 343–347.

[6] Goran Cetusic.Building a network emulatorwith Docker andOpen vSwitch.https://www.slideshare.

net / GoranCetusic / building - a - network - emulator - with - docker - and - open -

vswitch. (Accessed on 10/21/2017). 2015.

[7] Karamjeet Kaur, Japinder Singh, and Navtej Singh Ghumman. “Mininet as software defined networking test-

ing platform”. In: International Conference on Communication, Computing & Systems (ICCCS). 2014, pp. 139–

42.

[8] Vitaly Antonenko and Ruslan Smelyanskiy. “Global network modelling based on mininet approach.” In: Pro-

ceedings of the second ACM SIGCOMM workshop on Hot topics in software defined networking. ACM. 2013,

pp. 145–146.

[9] Hypervisor -Wikipedia.https://en.wikipedia.org/wiki/Hypervisor. (Accessed on04/25/2018).

[10] GitHub: mininet. https:// github.com /mininet/ mininet/blob /master/ README.md.

(Accessed on 04/25/2018).

[11] Scalability -Wikipedia.https://en.wikipedia.org/wiki/Scalability. (Accessed on04/25/2018).

[12] Philip Wette et al. “Maxinet: Distributed emulation of software-defined networks”. In: Networking Confer-

ence, 2014 IFIP. IEEE. 2014, pp. 1–9.

[13] Ramon R Fontes et al. “Mininet-WiFi: Emulating software-defined wireless networks”. In: Network and Ser-

vice Management (CNSM), 2015 11th International Conference on. IEEE. 2015, pp. 384–389.

Page 100: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

BIBLIOGRAPHY 87

[14] Network Attacks. http : / / www . tech - faq . com / network - attacks . html. (Accessed on

04/28/2018).

[15] Shahid Anwar et al. “From Intrusion Detection to an Intrusion Response System: Fundamentals, Require-

ments, and Future Directions”. In: Algorithms 10.2 (2017), p. 39.

[16] Nazrul Hoque et al. “Network attacks: Taxonomy, tools and systems”. In: Journal of Network and Computer

Applications 40 (2014), pp. 307–324.

[17] Wikipedia contributors. Man-in-the-middle attack Wikipedia, The Free Encyclopedia. https://en.

wikipedia.org/w/index.php?title=Man-in-the-middle_attack&oldid=837490400.

[Online; accessed 28-April-2018]. 2018.

[18] OSI model - Wikipedia. https://en.wikipedia.org/wiki/OSI_model. (Accessed on 04/28/2018).

[19] Zouheir Trabelsi and Khaled Shuaib. “NIS04-4: Man in the middle intrusion detection”. In: Global Telecom-

munications Conference, 2006. GLOBECOM’06. IEEE. IEEE. 2006, pp. 1–6.

[20] arpwatch - Wikipedia. https://en.wikipedia.org/wiki/Arpwatch. (Accessed on 04/28/2018).

[21] Snort - Network Intrusion Detection & Prevention System. https://www.snort.org/. (Accessed on

04/28/2018).

[22] Stephen M Specht and Ruby B Lee. “Distributed Denial of Service: Taxonomies of Attacks, Tools, and Coun-

termeasures.” In: ISCA PDCS. 2004, pp. 543–550.

[23] Arbor Networks. 2017 DDoS Attack Activity. https://www.arbornetworks.com/blog/insight/

2017-ddos-attack-activity/. (Accessed on 04/28/2018).

[24] Digital Attack Map. http://www.digitalattackmap.com/. (Accessed on 04/28/2018).

[25] Kaspersky Labs. DDoS attacks in Q4 2017. https://securelist.com/ddos-attacks-in-q4-

2017/83729/. (Accessed on 04/28/2018).

[26] Pedro Garcia-Teodoro et al. “Anomaly-based network intrusion detection: Techniques, systems and chal-

lenges”. In: computers & security 28.1 (2009), pp. 18–28.

[27] Christopher Gerg and Kerry J Cox. “Managing security with Snort and IDS tools”. In: (2004).

[28] Wikipedia contributors.Machine learning Wikipedia, The Free Encyclopedia.https://en.wikipedia.

org/w/index.php?title=Machine_learning. [Online; accessed 29-April-2018]. 2018.

[29] Yasir Hamid, M Sugumaran, and Ludovic Journaux. “Machine Learning Techniques for Intrusion Detection: A

Comparative Analysis”. In: Proceedings of the International Conference on Informatics and Analytics. ACM.

2016, p. 53.

[30] Christopher Kruegel et al. “Bayesian event classification for intrusion detection”. In: Computer Security Ap-

plications Conference, 2003. Proceedings. 19th Annual. IEEE. 2003, pp. 14–23.

[31] Dit-Yan Yeung and Yuxin Ding. “Host-based intrusion detection using dynamic and static behavioral mod-

els”. In: Pattern recognition 36.1 (2003), pp. 229–243.

Page 101: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

BIBLIOGRAPHY 88

[32] Matthew V Mahoney and Philip K Chan. “Learning nonstationary models of normal network traffic for de-

tecting novel attacks”. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge

discovery and data mining. ACM. 2002, pp. 376–385.

[33] Luca Boero et al. “Statistical fingerprint-based intrusion detection system (SF-IDS)”. In: International Journal

of Communication Systems 30.10 (2017).

[34] Susan M Bridges, Rayford B Vaughn, et al. “Fuzzy data mining and genetic algorithms applied to intrusion

detection”. In: Proceedings of 12th Annual Canadian Information Technology Security Symposium. 2000,

pp. 109–122.

[35] Neural Network. https://www.investopedia.com/terms/n/neuralnetwork.asp. (Accessed

on 04/29/2018).

[36] Zheng Zhang et al. “HIDE: a hierarchical network intrusion detection system using statistical preprocessing

and neural network classification”. In: Proc. IEEE Workshop on Information Assurance and Security. 2001,

pp. 85–90.

[37] Joseph Migga Kizza. “Understanding Computer Network Security”. In: A Guide to Computer Network Security

(2009), pp. 43–59.

[38] Comparison of distributed file systems -Wikipedia.https://en.wikipedia.org/wiki/Comparison_

of_distributed_file_systems. (Accessed on 05/02/2018).

[39] Wikipedia contributors. Ceph (software) Wikipedia, The Free Encyclopedia.https://en.wikipedia.

org/w/index.php?title=Ceph_(software)&oldid=838357071. [Online; accessed 2-May-

2018]. 2018.

[40] Wikipedia contributors. Gluster Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/

w/index.php?title=Gluster&oldid=838628301. [Online; accessed 2-May-2018]. 2018.

[41] Wikipedia contributors. Apache Hadoop Wikipedia, The Free Encyclopedia. https://en.wikipedia.

org/w/index.php?title=Apache_Hadoop&oldid=834490530. [Online; accessed 2-May-2018].

2018.

[42] Xiaofeng Zhou et al. “Exploring Netfow data using hadoop”. In: Proceedings of the Second ASE International

Conference on Big Data Science and Computing. 2014.

[43] Miguel Zenon Nicanor L Saavedra and William Emmanuel S Yu. “A Comparison between Text, Parquet, and

PCAP Formats for Use in Distributed Network Flow Analysis on Hadoop”. In: analysis 4 (), p. 7.

[44] Apache Parquet. http : / / parquet . apache . org / documentation / latest/. (Accessed on

05/02/2018).

[45] Maarten Wullink et al. “ENTRADA: A high-performance network traffic data streaming warehouse”. In: Net-

work Operations and Management Symposium (NOMS), 2016 IEEE/IFIP. IEEE. 2016, pp. 913–918.

[46] Apache Hive TM. https://hive.apache.org/. (Accessed on 05/02/2018).

Page 102: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

BIBLIOGRAPHY 89

[47] Wikipedia contributors. Time series database Wikipedia, The Free Encyclopedia.https://en.wikipedia.

org/w/index.php?title=Time_series_database&oldid=838220098. [Online; accessed

2-May-2018]. 2018.

[48] Wikipedia contributors. InfluxDB Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/

w/index.php?title=InfluxDB&oldid=838245678. [Online; accessed 2-May-2018]. 2018.

[49] Wikipedia contributors.Graphite (software) Wikipedia, The Free Encyclopedia.https://en.wikipedia.

org/w/index.php?title=Graphite_(software)&oldid=831443947. [Online; accessed 2-

May-2018]. 2018.

[50] Apache HBase – Apache HBase Home. https://hbase.apache.org/. (Accessed on 05/02/2018).

[51] GitHub - OpenTSDB/opentsdb: A scalable, distributed Time Series Database. https://github.com/

OpenTSDB/opentsdb. (Accessed on 05/02/2018).

[52] Apache Projects List. https://projects.apache.org/projects.html?category. (Accessed

on 05/03/2018).

[53] DigitalOcean. Hadoop, Storm, Samza, Spark, and Flink: Big Data Frameworks Compared. https://www.

digitalocean.com/community/tutorials/hadoop-storm-samza-spark-and-flink-

big-data-frameworks-compared. (Accessed on 10/21/2017). 2016.

[54] SnappyDataInc/snappydata. Stream Processing using SQL. https://github.com/SnappyDataInc/

snappydata/blob/master/docs/programming_guide/stream_processing_using_

sql.md. (Accessed on 05/03/2018).

[55] Apache Kafka. https://kafka.apache.org/. (Accessed on 05/03/2018).

[56] Real-Time Streaming Analytics for Big Data: A Survey and Decision. http://www.wingerath.org/

rt/index2.html. (Accessed on 05/03/2018).

[57] Apache Flink: Scalable Stream and Batch Data Processing. https://flink.apache.org/. (Accessed

on 05/03/2018).

[58] Muhammad Hussain Iqbal and Tariq Rahim Soomro. “Big data analysis: Apache storm perspective”. In: In-

ternational journal of computer trends and technology (2015), pp. 9–14.

[59] Abdul Ghaffar Shoro and Tariq Rahim Soomro. “Big data analysis: Apache spark perspective”. In: Global

Journal of Computer Science and Technology 15.1 (2015).

[60] Apache Spark - Unified Analytics Engine for Big Data. https://spark.apache.org/. (Accessed on

05/03/2018).

[61] Spark Streaming - Spark 2.3.0 Documentation. https://spark.apache.org/docs/latest/

streaming-programming-guide.html. (Accessed on 05/03/2018).

[62] MLlib: Main Guide - Spark 2.3.0 Documentation. https://spark.apache.org/docs/latest/ml-

guide.html. (Accessed on 05/03/2018).

Page 103: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

BIBLIOGRAPHY 90

[63] GraphX - Spark 2.3.0 Documentation. https://spark.apache.org/docs/latest/graphx-

programming-guide.html. (Accessed on 05/03/2018).

[64] Matei Zaharia et al. “Apache Spark: A Unified Engine for Big Data Processing”. In: Commun. ACM 59.11 (Oct.

2016), pp. 56–65. ISSN: 0001-0782. DOI: 10.1145/2934664. URL: http://doi.acm.org/10.1145/

2934664.

[65] Abdeltawab M Hendawi et al. “Hobbits: Hadoop and Hive based Internet traffic analysis”. In: Big Data (Big

Data), 2016 IEEE International Conference on. IEEE. 2016, pp. 2590–2599.

[66] Yeonhee Lee and Youngseok Lee. “Toward scalable internet trafficmeasurement and analysis with hadoop”.

In: ACM SIGCOMM Computer Communication Review 43.1 (2013), pp. 5–13.

[67] Muhammad Asif Manzoor and Yasser Morgan. “Network intrusion detection system using apache storm”.

In: Probe 4107 (2017), p. 4166.

[68] Manish Kulariya et al. “Performance analysis of network intrusion detection schemes using Apache Spark”.

In: Communication and Signal Processing (ICCSP), 2016 International Conference on. IEEE. 2016, pp. 1973–

1977.

[69] Govind P Gupta and Manish Kulariya. “A framework for fast and efficient cyber security network intrusion

detection using apache spark”. In: Procedia Computer Science 93 (2016), pp. 824–831.

[70] imec. jFed. https://jfed.ilabt.imec.be/. (Accessed on 05/06/2018).

[71] imec. jFed: Features. https://jfed.ilabt.imec.be/features/. (Accessed on 05/06/2018).

[72] iLab-t. Virtual Wall iLab-t testbeds 1.0.0 documentation. http://doc.ilabt.iminds.be/ilabt-

documentation/virtualwallfacility.html. (Accessed on 05/06/2018).

[73] Ansible is Simple IT Automation. https://www.ansible.com/. (Accessed on 05/06/2018).

[74] Puppet. Puppet. https://puppet.com/. (Accessed on 05/06/2018).

[75] Chef. Chef - Automate IT Infrastructure. https://www.chef.io/chef/. (Accessed on 05/06/2018).

[76] Microsoft Docs.Windows PowerShell Desired State Configuration Overview.https://docs.microsoft.

com/en-us/powershell/dsc/overview. (Accessed on 05/20/2018).

[77] Wikipedia contributors.Operating-system-level virtualization Wikipedia, The Free Encyclopedia.https:

//en.wikipedia.org/w/index.php?title=Operating-system-level_virtualization&

oldid=840250771. [Online; accessed 9-May-2018]. 2018.

[78] Docker - Build, Ship, and Run Any App, Anywhere.https://www.docker.com/. (Accessed on05/09/2018).

[79] Docker. Docker For Windows. https : / / www . docker . com / docker - windows. (Accessed on

05/09/2018).

[80] Microsoft Docs. Hyper-V Technology Overview. https://docs.microsoft.com/en-us/windows-

server / virtualization / hyper - v / hyper - v - technology - overview. (Accessed on

05/09/2018).

Page 104: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

BIBLIOGRAPHY 91

[81] VMware. VMware vSphere. https://www.vmware.com/be/products/vsphere.html. (Accessed

on 05/09/2018).

[82] Citrix. XenServer Open Source Server Virtualization.https://xenserver.org/. (Accessed on05/09/2018).

[83] KVM. https://www.linux-kvm.org/page/Main_Page. (Accessed on 05/09/2018).

[84] Wikipedia contributors. Libvirt Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/

w/index.php?title=Libvirt&oldid=823942538. [Online; accessed 9-May-2018]. 2018.

[85] Wikipedia contributors. Virtual LAN Wikipedia, The Free Encyclopedia. https://en.wikipedia.

org/w/index.php?title=Virtual_LAN&oldid=838035739. [Online; accessed 9-May-2018].

2018.

[86] Thomas Hungenberg and Matthias Eckert. INetSim: Internet Services Simulation Suite. http://www.

inetsim.org/index.html. (Accessed on 05/09/2018).

[87] Samba - opening windows to a wider world. https://www.samba.org/. (Accessed on 05/09/2018).

[88] Grafana - The open platform for analytics and monitoring. https://grafana.com/. (Accessed on

05/09/2018).

[89] Wikipedia contributors. Java remote method invocation Wikipedia, The Free Encyclopedia. https://

en.wikipedia.org/w/index.php?title=Java_remote_method_invocation&oldid=

829082654. [Online; accessed 10-May-2018]. 2018.

[90] libvirt: Storage Management. https://libvirt.org/storage.html. (Accessed on 05/11/2018).

[91] Technoscoop. Copy on write. https://technoscooop.wordpress.com/tag/copy-on-write/.

(Accessed on 05/11/2018).

[92] Quartz Enterprise Job Scheduler. http://www.quartz-scheduler.org/. (Accessed on 05/11/2018).

[93] Wikipedia contributors. Serial port Wikipedia, The Free Encyclopedia. https://en.wikipedia.

org/w/index.php?title=Serial_port&oldid=838649939. [Online; accessed 11-May-2018].

2018.

[94] libvirt.Domain XML format.https://libvirt.org/formatdomain.html. (Accessed on05/11/2018).

[95] The Apache Software Foundation. Commons Email. https://commons.apache.org/proper/

commons-email/. (Accessed on 05/11/2018).

[96] The JCIFS Project. JCIFS. https://jcifs.samba.org/. (Accessed on 05/11/2018).

[97] RIPE-NCC. hadoop-pcap: Hadoop library to read packet capture (PCAP) files. https://github.com/

RIPE-NCC/hadoop-pcap. (Accessed on 05/11/2018).

[98] Level1Techs Forums - GrayBoltWolf.KVMHost vs VirtualMachine Performance.https://forum.level1techs.

com/t/how-fast-is-kvm-host-vs-virtual-machine-performance/110192. (Accessed

on 05/13/2018).

Page 105: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

BIBLIOGRAPHY 92

[99] QEMU. Features/FaultTolerance. https://wiki.qemu.org/Features/FaultTolerance. (Ac-

cessed on 05/14/2018).

[100] VyOS. VyOS Homepage. https://vyos.io/. (Accessed on 05/14/2018).

[101] rsmudge. armitage: Automatically exported from code.google.com/p/armitage. https://github.com/

rsmudge/armitage. (Accessed on 05/18/2018).

Page 106: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …
Page 107: Containerisedcybersecuritylabforrapid ... 1 Chapter1 Introduction Inordertoefficientlyresearchdifferentthreatdetectionandmitigationtechniques,securityresearchersneedan easy …

Faculty of Sciences

Master’s dissertation submitted in order to obtain the academic degree ofMaster of Science in Computer Science

Containerised cybersecurity lab for rapidand secure evaluation of threat

mitigation tactics

Thibault Van Geluwe de Berlaere

Supervisor(s): Prof. Dr. Bruno Volckaert, Prof. Dr. ir. Filip De TurckCounsellor(s): Dr. ir. Tim Wauters, Andres Felipe Ocampo Palacio

Academic year 2017-2018