Slide title In CAPITALS 50 pt Slide subtitle 32 pt On the Validation of Traffic Classification...

18
On the Validation of Traffic Classification Algorithms Géza Szabó, Dániel Orincsay, Szabolcs Malomsoky, István Szabó Traffic Lab, Ericsson Research Hungary
  • date post

    15-Jan-2016
  • Category

    Documents

  • view

    213
  • download

    0

Transcript of Slide title In CAPITALS 50 pt Slide subtitle 32 pt On the Validation of Traffic Classification...

Page 1: Slide title In CAPITALS 50 pt Slide subtitle 32 pt On the Validation of Traffic Classification Algorithms Géza Szabó, Dániel Orincsay, Szabolcs Malomsoky,

Slide titleIn CAPITALS

50 pt

Slide subtitle 32 pt

On the Validation of Traffic Classification Algorithms

Géza Szabó, Dániel Orincsay, Szabolcs Malomsoky, István Szabó

Traffic Lab, Ericsson Research Hungary

Page 2: Slide title In CAPITALS 50 pt Slide subtitle 32 pt On the Validation of Traffic Classification Algorithms Géza Szabó, Dániel Orincsay, Szabolcs Malomsoky,

Top right corner for field-mark, customer or partner logotypes. See Best practice for example.

Slide title 40 pt

Slide subtitle 24 pt

Text 24 pt

Bullets level 2-520 pt

On the Validation of Traffic Classification Algorithms 2008-04-292 /17/17

Aim & Contents Aim:

– Introduce our novel validation method which makes it possible to measure the accuracy of traffic classification methods

Contents:– Requirements – How should validation be done?

– Related work – How is it currently done?

– Our proposal – What have we proposed?

– Working mechanism – How does our proposal work?

– Validation a state-of-the-art traffic classification method – What have we learnt from the validation?

– Future work – What else can be done with the proposed method?

Page 3: Slide title In CAPITALS 50 pt Slide subtitle 32 pt On the Validation of Traffic Classification Algorithms Géza Szabó, Dániel Orincsay, Szabolcs Malomsoky,

Top right corner for field-mark, customer or partner logotypes. See Best practice for example.

Slide title 40 pt

Slide subtitle 24 pt

Text 24 pt

Bullets level 2-520 pt

On the Validation of Traffic Classification Algorithms 2008-04-293 /17/17

Requirements – How should validation be done?

Objective of traffic classification: – Identify applications in passively observed traffic

Validation of classification method by active test

Page 4: Slide title In CAPITALS 50 pt Slide subtitle 32 pt On the Validation of Traffic Classification Algorithms Géza Szabó, Dániel Orincsay, Szabolcs Malomsoky,

Top right corner for field-mark, customer or partner logotypes. See Best practice for example.

Slide title 40 pt

Slide subtitle 24 pt

Text 24 pt

Bullets level 2-520 pt

On the Validation of Traffic Classification Algorithms 2008-04-294 /17/17

Related work – How is it currently done?

•Header traces → port based method

•Header traces → port based method

•Impossible to validate by others

•Impossible to validate by others

•Impossible to repeat with same conditions

•Impossible to repeat with same conditions

•Non-realistic environment

•Non-realistic environment

•Dynamically allocated ports

•Dynamically allocated ports

•Proprietary protocols•Encryption•Be up2date

•Proprietary protocols•Encryption•Be up2date

•Lot of flows•Simultaneous applications

•Lot of flows•Simultaneous applications

•Previously well-classified traces

•Previously well-classified traces

•Just hint •Just hint

S. Sen and J. Wang: Analyzing Peer-to-peer Traffic Across Large Networks

S. Sen and J. Wang: Analyzing Peer-to-peer Traffic Across Large Networks

T. Karagiannis, K. Papagiannaki and M. Faloutsos : BLINC: Multilevel Traffic Classification in the Dark

T. Karagiannis, K. Papagiannaki and M. Faloutsos : BLINC: Multilevel Traffic Classification in the Dark

J. Erman, M. Arlitt and A. Mahanti : Traffic Classification Using Clustering Algorithms

J. Erman, M. Arlitt and A. Mahanti : Traffic Classification Using Clustering Algorithms

L. Bernaille et al: Traffic Classification On The Fly

L. Bernaille et al: Traffic Classification On The Fly

CURRENTLY•Weak and ad hoc validation•No reliable and widely accepted validation technique•No reference packet trace with well-defined content is available

Page 5: Slide title In CAPITALS 50 pt Slide subtitle 32 pt On the Validation of Traffic Classification Algorithms Géza Szabó, Dániel Orincsay, Szabolcs Malomsoky,

Top right corner for field-mark, customer or partner logotypes. See Best practice for example.

Slide title 40 pt

Slide subtitle 24 pt

Text 24 pt

Bullets level 2-520 pt

/17 On the Validation of Traffic Classification Algorithms 2008-04-295

OUR PROPOSAL

Page 6: Slide title In CAPITALS 50 pt Slide subtitle 32 pt On the Validation of Traffic Classification Algorithms Géza Szabó, Dániel Orincsay, Szabolcs Malomsoky,

Top right corner for field-mark, customer or partner logotypes. See Best practice for example.

Slide title 40 pt

Slide subtitle 24 pt

Text 24 pt

Bullets level 2-520 pt

/17 On the Validation of Traffic Classification Algorithms 2008-04-296

The proposed method for validation Principle:

– Packets are collected into flows at the traffic generating terminal– Flows are marked with the identifier of the application that generated the packets of

the flow The main requirements on the realization of the method:

– It should not deteriorate the performance of the terminal– The byte overhead of marking should be negligible

The preferred realization is a driver that can be easily installed on terminals

Internet

User mode

Kernel mode

TCP UDP

IP

NDIS NDIS hook driver

Network drivers

IExplorer Outlook Skype

Network connections-Process ID association Protocol-Local Address:Port-Foreign Address:Port-State -Process ID TCP -192.168.0.1 :2154-82.99.36.186 :80 -Established-5126

TCP -192.168.0.1 :2189-86.101.125.82 :110 -Established-1932 UDP -0.0.0.0 :2196-212.19.63.112 :9612-Established-2056 ...

Process ID-Application name association

Process ID-Application

5126 -IExplorer.exe 1932 -Outlook.exe 2056 -Skype.exe

...

Packet marking driver

Measurement point

The position of the proposed driver within the terminal

Page 7: Slide title In CAPITALS 50 pt Slide subtitle 32 pt On the Validation of Traffic Classification Algorithms Géza Szabó, Dániel Orincsay, Szabolcs Malomsoky,

Top right corner for field-mark, customer or partner logotypes. See Best practice for example.

Slide title 40 pt

Slide subtitle 24 pt

Text 24 pt

Bullets level 2-520 pt

/17 On the Validation of Traffic Classification Algorithms 2008-04-297

Working mechanism1. The packet is examined whether it is an incoming or outgoing packet 2. In case of an outgoing packet, the size of the packet is examined

Continues with only those packets which are smaller than the MTU decreased with the size of marking

3. The process continues with only TCP or UDP packets4. According to the five-tuple identifier of the packet, it is checked whether there is

already available information about which application the flow belongs to5. Query operation system6. Need marking:

Randomly Only first Leave the first No mark

Outgoing packet?

Packet passing through the

interfaceProper size?

Send

Protocol Exist info? Need to mark?

Mark

Get info

NO

YES

NO

YES TCP/UDP

NO

YES

YES NO

Other

The working mechanism of the introduced driver

Internet

User mode

Kernel mode

TCP UDP

IP

NDIS NDIS hook driver

Network drivers

IExplorer Outlook Skype

Network connections-Process ID association Protocol-Local Address:Port-Foreign Address:Port-State -Process ID TCP -192.168.0.1 :2154-82.99.36.186 :80 -Established-5126

TCP -192.168.0.1 :2189-86.101.125.82 :110 -Established-1932 UDP -0.0.0.0 :2196-212.19.63.112 :9612-Established-2056 ...

Process ID-Application name association

Process ID-Application

5126 -IExplorer.exe 1932 -Outlook.exe 2056 -Skype.exe

...

Packet marking driver

Measurement point

Page 8: Slide title In CAPITALS 50 pt Slide subtitle 32 pt On the Validation of Traffic Classification Algorithms Géza Szabó, Dániel Orincsay, Szabolcs Malomsoky,

Top right corner for field-mark, customer or partner logotypes. See Best practice for example.

Slide title 40 pt

Slide subtitle 24 pt

Text 24 pt

Bullets level 2-520 pt

/17 On the Validation of Traffic Classification Algorithms 2008-04-298

Place of marking Extending the original IP packet with one option field

– Router Alert option field Transparent for both the routers on the path and also for

the receiver host (according to RFC 2113 [3]). The first two characters of the corresponding executable file name

are added– Increasing the size of the packet with 4 bytes– The packet size field in the IP header is also increased with 4

bytes– Header checksum is recalculated

A marked packet of the BitTorrent protocol

Page 9: Slide title In CAPITALS 50 pt Slide subtitle 32 pt On the Validation of Traffic Classification Algorithms Géza Szabó, Dániel Orincsay, Szabolcs Malomsoky,

Top right corner for field-mark, customer or partner logotypes. See Best practice for example.

Slide title 40 pt

Slide subtitle 24 pt

Text 24 pt

Bullets level 2-520 pt

/17 On the Validation of Traffic Classification Algorithms 2008-04-299

PROOF-OF-CONCEPT

Page 10: Slide title In CAPITALS 50 pt Slide subtitle 32 pt On the Validation of Traffic Classification Algorithms Géza Szabó, Dániel Orincsay, Szabolcs Malomsoky,

Top right corner for field-mark, customer or partner logotypes. See Best practice for example.

Slide title 40 pt

Slide subtitle 24 pt

Text 24 pt

Bullets level 2-520 pt

/17 On the Validation of Traffic Classification Algorithms 2008-04-2910

Reference measurement Available at

http://pics.etl.hu/˜szabog/measurement.tar In a separated access network Our driver has been installed onto all

computers on this network Duration of the measurement: 43 hours Captured data volume: 6 Gbytes,

containing 12 million packets The measurement contains the traffic of

the most popular – P2P protocols:

BitTorrent eDonkey Gnutella DirectConnect

– VoIP and chat applications: Skype MSN Live

– FTP sessions– Download manager– E-mail sending, receiving sessions– Web based e-mail (e.g., Gmail)– SSH sessions– SCP sessions– FPS, MMORPG gaming sessions– Streaming:

Radio Video Web based

The traffic mix of the measurement

Page 11: Slide title In CAPITALS 50 pt Slide subtitle 32 pt On the Validation of Traffic Classification Algorithms Géza Szabó, Dániel Orincsay, Szabolcs Malomsoky,

Top right corner for field-mark, customer or partner logotypes. See Best practice for example.

Slide title 40 pt

Slide subtitle 24 pt

Text 24 pt

Bullets level 2-520 pt

/17 On the Validation of Traffic Classification Algorithms 2008-04-2911

Validation results (1) – Success

Combined traffic classification method (described in [1]) with the addition that the classification of VoIP applications has been extended with ideas from [2]

Accurately identified:– E-mail– Filetransfer– Streaming– Secure channel– Gaming traffic

Success due to:– Well-documented protocols– Open standards– Do not constantly change

Difficulties in case of…?– Encryption:

But: session initiation phase is critical as this phase can be identified accurately

Success: SSH or SCP [1] G. Szabo, I. Szabo and D. Orincsay: Accurate Traffic Classification[2] M. Perenyi and S. Molnar: Enhanced Skype Traffic Identification

The results of the classification compared [1] to the reference measurement

Page 12: Slide title In CAPITALS 50 pt Slide subtitle 32 pt On the Validation of Traffic Classification Algorithms Géza Szabó, Dániel Orincsay, Szabolcs Malomsoky,

Top right corner for field-mark, customer or partner logotypes. See Best practice for example.

Slide title 40 pt

Slide subtitle 24 pt

Text 24 pt

Bullets level 2-520 pt

/17 On the Validation of Traffic Classification Algorithms 2008-04-2912

Validation results (2) – P2P

Difficulties: Many TCP flows containing 1-2 SYN

packets probably to disconnected peers – No payload in these packets =>the

signature based methods can not work

– Dynamically allocated source ports towards not well-known destination ports => the port based methods fail

– Server search and P2P communication heuristic [1] methods also fail => there are no other successful flows to such IPs

Also some small non-P2P flows were misclassified into the P2P class

– Not fully proper content of the port-application database

– Creating too many port-application associations easily results in the rise of the misclassification ratio.

The constant change of P2P protocols– New features added to P2P clients

day-by-day– Working mechanism can be typical

for a selected client not the whole protocol itself

[1] G. Szabo, I. Szabo and D. Orincsay: Accurate Traffic Classification[2] M. Perenyi and S. Molnar: Enhanced Skype Traffic Identification

The results of the classification compared [1] to the reference measurement

Page 13: Slide title In CAPITALS 50 pt Slide subtitle 32 pt On the Validation of Traffic Classification Algorithms Géza Szabó, Dániel Orincsay, Szabolcs Malomsoky,

Top right corner for field-mark, customer or partner logotypes. See Best practice for example.

Slide title 40 pt

Slide subtitle 24 pt

Text 24 pt

Bullets level 2-520 pt

/17 On the Validation of Traffic Classification Algorithms 2008-04-2913

Validation results (3) – Philosophy

Traffic which is the derivation of other traffic:

– E.g., DNS traffic – MSN: HTTP protocol for

transmitting chat messages– MSN client transmits

advertisements over HTTP, but this cannot be recognized as deliberate web browsing

Hit := the classification outcome and the generating application type (the validation outcome) agreed

– E.g., the chat on the DirectConnect hubs which has been classified as chat could have been considered as actually correct but in this comparison it was considered as misclassification The results of the classification compared

[1] to the reference measurement

[1] G. Szabo, I. Szabo and D. Orincsay: Accurate Traffic Classification

Page 14: Slide title In CAPITALS 50 pt Slide subtitle 32 pt On the Validation of Traffic Classification Algorithms Géza Szabó, Dániel Orincsay, Szabolcs Malomsoky,

Top right corner for field-mark, customer or partner logotypes. See Best practice for example.

Slide title 40 pt

Slide subtitle 24 pt

Text 24 pt

Bullets level 2-520 pt

/17 On the Validation of Traffic Classification Algorithms 2008-04-2914

Validation results (4) – VoIP: MSN, Skype High VoIP hit ratio is due to the

successful identification– MSN Messenger– Skype

Skype is difficult to identify– Same problem as in the case of P2P – Proprietary protocol designed to

ensure secure communication

– [2] characteristic feature: the application sends packets even when there is no ongoing call with an exact 20 sec interval.

– In [1]: a P2P identification heuristic which was designed to track any message which has a periodicity in packet sending

– Extension of [1] was straightforward The validation showed:

– The deficiency of the classification of Skype

Simple extension of the algorithm

– Idea of [1] has been validated as it proved to be robust for the extension with new application recognition

– Also the validation mechanism proved to be useful

[1] G. Szabo, I. Szabo and D. Orincsay: Accurate Traffic Classification[2] M. Perenyi and S. Molnar: Enhanced Skype Traffic Identification

The results of the classification compared [1] to the reference measurement

Page 15: Slide title In CAPITALS 50 pt Slide subtitle 32 pt On the Validation of Traffic Classification Algorithms Géza Szabó, Dániel Orincsay, Szabolcs Malomsoky,

Top right corner for field-mark, customer or partner logotypes. See Best practice for example.

Slide title 40 pt

Slide subtitle 24 pt

Text 24 pt

Bullets level 2-520 pt

/17 On the Validation of Traffic Classification Algorithms 2008-04-2915

Summary We introduced a new active

measurement method which can help in the validation of traffic classification methods.

The introduced method is a network driver

– Mark the outgoing packets from the clients with an application specific marking

With the introduced method we created a measurement and used this to validate the method presented in [1]

– The method has been proved to be working accurately

– Some deficiencies in the classification

P2P applications Skype

[1] G. Szabo, I. Szabo and D. Orincsay: Accurate Traffic Classification

Benefits:

Page 16: Slide title In CAPITALS 50 pt Slide subtitle 32 pt On the Validation of Traffic Classification Algorithms Géza Szabó, Dániel Orincsay, Szabolcs Malomsoky,

Top right corner for field-mark, customer or partner logotypes. See Best practice for example.

Slide title 40 pt

Slide subtitle 24 pt

Text 24 pt

Bullets level 2-520 pt

On the Validation of Traffic Classification Algorithms 2008-04-2916 /17/17

Further work Use the marking method at the measurement side for

online traffic classification – Assumptions:

The terminals accessing an operator’s network are all installed with the proposed driver

The driver is made tamper-proof to avoid users forging the marking

– Online clustering of the traffic into QoS classes based on the resource requirements of the generating application

– Used by operators to charge on the basis of the used application by the user

Extension of the marking by other information about the traffic generating application

– E.g., version number Operator could track the security risks of an old

application

Page 17: Slide title In CAPITALS 50 pt Slide subtitle 32 pt On the Validation of Traffic Classification Algorithms Géza Szabó, Dániel Orincsay, Szabolcs Malomsoky,

Top right corner for field-mark, customer or partner logotypes. See Best practice for example.

Slide title 40 pt

Slide subtitle 24 pt

Text 24 pt

Bullets level 2-520 pt

On the Validation of Traffic Classification Algorithms 2008-04-2917 /17/17

Questions, discussion…

Thank you very much for your kind attention!

Contact: – E-mail: [email protected]

Page 18: Slide title In CAPITALS 50 pt Slide subtitle 32 pt On the Validation of Traffic Classification Algorithms Géza Szabó, Dániel Orincsay, Szabolcs Malomsoky,

Top right corner for field-mark, customer or partner logotypes. See Best practice for example.

Slide title 40 pt

Slide subtitle 24 pt

Text 24 pt

Bullets level 2-520 pt

/17 On the Validation of Traffic Classification Algorithms 2008-04-2918