Slide title In CAPITALS 50 pt Slide subtitle 32 pt On the Validation of Traffic Classification...
-
date post
15-Jan-2016 -
Category
Documents
-
view
213 -
download
0
Transcript of Slide title In CAPITALS 50 pt Slide subtitle 32 pt On the Validation of Traffic Classification...
Slide titleIn CAPITALS
50 pt
Slide subtitle 32 pt
On the Validation of Traffic Classification Algorithms
Géza Szabó, Dániel Orincsay, Szabolcs Malomsoky, István Szabó
Traffic Lab, Ericsson Research Hungary
Top right corner for field-mark, customer or partner logotypes. See Best practice for example.
Slide title 40 pt
Slide subtitle 24 pt
Text 24 pt
Bullets level 2-520 pt
On the Validation of Traffic Classification Algorithms 2008-04-292 /17/17
Aim & Contents Aim:
– Introduce our novel validation method which makes it possible to measure the accuracy of traffic classification methods
Contents:– Requirements – How should validation be done?
– Related work – How is it currently done?
– Our proposal – What have we proposed?
– Working mechanism – How does our proposal work?
– Validation a state-of-the-art traffic classification method – What have we learnt from the validation?
– Future work – What else can be done with the proposed method?
Top right corner for field-mark, customer or partner logotypes. See Best practice for example.
Slide title 40 pt
Slide subtitle 24 pt
Text 24 pt
Bullets level 2-520 pt
On the Validation of Traffic Classification Algorithms 2008-04-293 /17/17
Requirements – How should validation be done?
Objective of traffic classification: – Identify applications in passively observed traffic
Validation of classification method by active test
Top right corner for field-mark, customer or partner logotypes. See Best practice for example.
Slide title 40 pt
Slide subtitle 24 pt
Text 24 pt
Bullets level 2-520 pt
On the Validation of Traffic Classification Algorithms 2008-04-294 /17/17
Related work – How is it currently done?
•Header traces → port based method
•Header traces → port based method
•Impossible to validate by others
•Impossible to validate by others
•Impossible to repeat with same conditions
•Impossible to repeat with same conditions
•Non-realistic environment
•Non-realistic environment
•Dynamically allocated ports
•Dynamically allocated ports
•Proprietary protocols•Encryption•Be up2date
•Proprietary protocols•Encryption•Be up2date
•Lot of flows•Simultaneous applications
•Lot of flows•Simultaneous applications
•Previously well-classified traces
•Previously well-classified traces
•Just hint •Just hint
S. Sen and J. Wang: Analyzing Peer-to-peer Traffic Across Large Networks
S. Sen and J. Wang: Analyzing Peer-to-peer Traffic Across Large Networks
T. Karagiannis, K. Papagiannaki and M. Faloutsos : BLINC: Multilevel Traffic Classification in the Dark
T. Karagiannis, K. Papagiannaki and M. Faloutsos : BLINC: Multilevel Traffic Classification in the Dark
J. Erman, M. Arlitt and A. Mahanti : Traffic Classification Using Clustering Algorithms
J. Erman, M. Arlitt and A. Mahanti : Traffic Classification Using Clustering Algorithms
L. Bernaille et al: Traffic Classification On The Fly
L. Bernaille et al: Traffic Classification On The Fly
CURRENTLY•Weak and ad hoc validation•No reliable and widely accepted validation technique•No reference packet trace with well-defined content is available
Top right corner for field-mark, customer or partner logotypes. See Best practice for example.
Slide title 40 pt
Slide subtitle 24 pt
Text 24 pt
Bullets level 2-520 pt
/17 On the Validation of Traffic Classification Algorithms 2008-04-295
OUR PROPOSAL
Top right corner for field-mark, customer or partner logotypes. See Best practice for example.
Slide title 40 pt
Slide subtitle 24 pt
Text 24 pt
Bullets level 2-520 pt
/17 On the Validation of Traffic Classification Algorithms 2008-04-296
The proposed method for validation Principle:
– Packets are collected into flows at the traffic generating terminal– Flows are marked with the identifier of the application that generated the packets of
the flow The main requirements on the realization of the method:
– It should not deteriorate the performance of the terminal– The byte overhead of marking should be negligible
The preferred realization is a driver that can be easily installed on terminals
Internet
User mode
Kernel mode
TCP UDP
IP
NDIS NDIS hook driver
Network drivers
IExplorer Outlook Skype
Network connections-Process ID association Protocol-Local Address:Port-Foreign Address:Port-State -Process ID TCP -192.168.0.1 :2154-82.99.36.186 :80 -Established-5126
TCP -192.168.0.1 :2189-86.101.125.82 :110 -Established-1932 UDP -0.0.0.0 :2196-212.19.63.112 :9612-Established-2056 ...
Process ID-Application name association
Process ID-Application
5126 -IExplorer.exe 1932 -Outlook.exe 2056 -Skype.exe
...
Packet marking driver
Measurement point
The position of the proposed driver within the terminal
Top right corner for field-mark, customer or partner logotypes. See Best practice for example.
Slide title 40 pt
Slide subtitle 24 pt
Text 24 pt
Bullets level 2-520 pt
/17 On the Validation of Traffic Classification Algorithms 2008-04-297
Working mechanism1. The packet is examined whether it is an incoming or outgoing packet 2. In case of an outgoing packet, the size of the packet is examined
Continues with only those packets which are smaller than the MTU decreased with the size of marking
3. The process continues with only TCP or UDP packets4. According to the five-tuple identifier of the packet, it is checked whether there is
already available information about which application the flow belongs to5. Query operation system6. Need marking:
Randomly Only first Leave the first No mark
Outgoing packet?
Packet passing through the
interfaceProper size?
Send
Protocol Exist info? Need to mark?
Mark
Get info
NO
YES
NO
YES TCP/UDP
NO
YES
YES NO
Other
The working mechanism of the introduced driver
Internet
User mode
Kernel mode
TCP UDP
IP
NDIS NDIS hook driver
Network drivers
IExplorer Outlook Skype
Network connections-Process ID association Protocol-Local Address:Port-Foreign Address:Port-State -Process ID TCP -192.168.0.1 :2154-82.99.36.186 :80 -Established-5126
TCP -192.168.0.1 :2189-86.101.125.82 :110 -Established-1932 UDP -0.0.0.0 :2196-212.19.63.112 :9612-Established-2056 ...
Process ID-Application name association
Process ID-Application
5126 -IExplorer.exe 1932 -Outlook.exe 2056 -Skype.exe
...
Packet marking driver
Measurement point
Top right corner for field-mark, customer or partner logotypes. See Best practice for example.
Slide title 40 pt
Slide subtitle 24 pt
Text 24 pt
Bullets level 2-520 pt
/17 On the Validation of Traffic Classification Algorithms 2008-04-298
Place of marking Extending the original IP packet with one option field
– Router Alert option field Transparent for both the routers on the path and also for
the receiver host (according to RFC 2113 [3]). The first two characters of the corresponding executable file name
are added– Increasing the size of the packet with 4 bytes– The packet size field in the IP header is also increased with 4
bytes– Header checksum is recalculated
A marked packet of the BitTorrent protocol
Top right corner for field-mark, customer or partner logotypes. See Best practice for example.
Slide title 40 pt
Slide subtitle 24 pt
Text 24 pt
Bullets level 2-520 pt
/17 On the Validation of Traffic Classification Algorithms 2008-04-299
PROOF-OF-CONCEPT
Top right corner for field-mark, customer or partner logotypes. See Best practice for example.
Slide title 40 pt
Slide subtitle 24 pt
Text 24 pt
Bullets level 2-520 pt
/17 On the Validation of Traffic Classification Algorithms 2008-04-2910
Reference measurement Available at
http://pics.etl.hu/˜szabog/measurement.tar In a separated access network Our driver has been installed onto all
computers on this network Duration of the measurement: 43 hours Captured data volume: 6 Gbytes,
containing 12 million packets The measurement contains the traffic of
the most popular – P2P protocols:
BitTorrent eDonkey Gnutella DirectConnect
– VoIP and chat applications: Skype MSN Live
– FTP sessions– Download manager– E-mail sending, receiving sessions– Web based e-mail (e.g., Gmail)– SSH sessions– SCP sessions– FPS, MMORPG gaming sessions– Streaming:
Radio Video Web based
The traffic mix of the measurement
Top right corner for field-mark, customer or partner logotypes. See Best practice for example.
Slide title 40 pt
Slide subtitle 24 pt
Text 24 pt
Bullets level 2-520 pt
/17 On the Validation of Traffic Classification Algorithms 2008-04-2911
Validation results (1) – Success
Combined traffic classification method (described in [1]) with the addition that the classification of VoIP applications has been extended with ideas from [2]
Accurately identified:– E-mail– Filetransfer– Streaming– Secure channel– Gaming traffic
Success due to:– Well-documented protocols– Open standards– Do not constantly change
Difficulties in case of…?– Encryption:
But: session initiation phase is critical as this phase can be identified accurately
Success: SSH or SCP [1] G. Szabo, I. Szabo and D. Orincsay: Accurate Traffic Classification[2] M. Perenyi and S. Molnar: Enhanced Skype Traffic Identification
The results of the classification compared [1] to the reference measurement
Top right corner for field-mark, customer or partner logotypes. See Best practice for example.
Slide title 40 pt
Slide subtitle 24 pt
Text 24 pt
Bullets level 2-520 pt
/17 On the Validation of Traffic Classification Algorithms 2008-04-2912
Validation results (2) – P2P
Difficulties: Many TCP flows containing 1-2 SYN
packets probably to disconnected peers – No payload in these packets =>the
signature based methods can not work
– Dynamically allocated source ports towards not well-known destination ports => the port based methods fail
– Server search and P2P communication heuristic [1] methods also fail => there are no other successful flows to such IPs
Also some small non-P2P flows were misclassified into the P2P class
– Not fully proper content of the port-application database
– Creating too many port-application associations easily results in the rise of the misclassification ratio.
The constant change of P2P protocols– New features added to P2P clients
day-by-day– Working mechanism can be typical
for a selected client not the whole protocol itself
[1] G. Szabo, I. Szabo and D. Orincsay: Accurate Traffic Classification[2] M. Perenyi and S. Molnar: Enhanced Skype Traffic Identification
The results of the classification compared [1] to the reference measurement
Top right corner for field-mark, customer or partner logotypes. See Best practice for example.
Slide title 40 pt
Slide subtitle 24 pt
Text 24 pt
Bullets level 2-520 pt
/17 On the Validation of Traffic Classification Algorithms 2008-04-2913
Validation results (3) – Philosophy
Traffic which is the derivation of other traffic:
– E.g., DNS traffic – MSN: HTTP protocol for
transmitting chat messages– MSN client transmits
advertisements over HTTP, but this cannot be recognized as deliberate web browsing
Hit := the classification outcome and the generating application type (the validation outcome) agreed
– E.g., the chat on the DirectConnect hubs which has been classified as chat could have been considered as actually correct but in this comparison it was considered as misclassification The results of the classification compared
[1] to the reference measurement
[1] G. Szabo, I. Szabo and D. Orincsay: Accurate Traffic Classification
Top right corner for field-mark, customer or partner logotypes. See Best practice for example.
Slide title 40 pt
Slide subtitle 24 pt
Text 24 pt
Bullets level 2-520 pt
/17 On the Validation of Traffic Classification Algorithms 2008-04-2914
Validation results (4) – VoIP: MSN, Skype High VoIP hit ratio is due to the
successful identification– MSN Messenger– Skype
Skype is difficult to identify– Same problem as in the case of P2P – Proprietary protocol designed to
ensure secure communication
– [2] characteristic feature: the application sends packets even when there is no ongoing call with an exact 20 sec interval.
– In [1]: a P2P identification heuristic which was designed to track any message which has a periodicity in packet sending
– Extension of [1] was straightforward The validation showed:
– The deficiency of the classification of Skype
Simple extension of the algorithm
– Idea of [1] has been validated as it proved to be robust for the extension with new application recognition
– Also the validation mechanism proved to be useful
[1] G. Szabo, I. Szabo and D. Orincsay: Accurate Traffic Classification[2] M. Perenyi and S. Molnar: Enhanced Skype Traffic Identification
The results of the classification compared [1] to the reference measurement
Top right corner for field-mark, customer or partner logotypes. See Best practice for example.
Slide title 40 pt
Slide subtitle 24 pt
Text 24 pt
Bullets level 2-520 pt
/17 On the Validation of Traffic Classification Algorithms 2008-04-2915
Summary We introduced a new active
measurement method which can help in the validation of traffic classification methods.
The introduced method is a network driver
– Mark the outgoing packets from the clients with an application specific marking
With the introduced method we created a measurement and used this to validate the method presented in [1]
– The method has been proved to be working accurately
– Some deficiencies in the classification
P2P applications Skype
[1] G. Szabo, I. Szabo and D. Orincsay: Accurate Traffic Classification
Benefits:
Top right corner for field-mark, customer or partner logotypes. See Best practice for example.
Slide title 40 pt
Slide subtitle 24 pt
Text 24 pt
Bullets level 2-520 pt
On the Validation of Traffic Classification Algorithms 2008-04-2916 /17/17
Further work Use the marking method at the measurement side for
online traffic classification – Assumptions:
The terminals accessing an operator’s network are all installed with the proposed driver
The driver is made tamper-proof to avoid users forging the marking
– Online clustering of the traffic into QoS classes based on the resource requirements of the generating application
– Used by operators to charge on the basis of the used application by the user
Extension of the marking by other information about the traffic generating application
– E.g., version number Operator could track the security risks of an old
application
Top right corner for field-mark, customer or partner logotypes. See Best practice for example.
Slide title 40 pt
Slide subtitle 24 pt
Text 24 pt
Bullets level 2-520 pt
On the Validation of Traffic Classification Algorithms 2008-04-2917 /17/17
Questions, discussion…
Thank you very much for your kind attention!
Contact: – E-mail: [email protected]
Top right corner for field-mark, customer or partner logotypes. See Best practice for example.
Slide title 40 pt
Slide subtitle 24 pt
Text 24 pt
Bullets level 2-520 pt
/17 On the Validation of Traffic Classification Algorithms 2008-04-2918