AMeta’AnalysisApproachforFeatureSelectionin...
Transcript of AMeta’AnalysisApproachforFeatureSelectionin...
![Page 1: AMeta’AnalysisApproachforFeatureSelectionin …conferences.sigcomm.org/sigcomm/2017/files/program... · 2017. 10. 27. · Institute’of’Telecommunications TU’Wien. Network"Traffic"Analysis](https://reader036.fdocuments.us/reader036/viewer/2022070923/5fbbb15b20968a36f6144710/html5/thumbnails/1.jpg)
A Meta-Analysis Approach for Feature Selection in Network Traffic Research
Daniel C. Ferreira, Félix Iglesias Vázquez, Gernot Vormayr, Maximilian Bachl, Tanja Zseby
Institute of TelecommunicationsTU Wien
![Page 2: AMeta’AnalysisApproachforFeatureSelectionin …conferences.sigcomm.org/sigcomm/2017/files/program... · 2017. 10. 27. · Institute’of’Telecommunications TU’Wien. Network"Traffic"Analysis](https://reader036.fdocuments.us/reader036/viewer/2022070923/5fbbb15b20968a36f6144710/html5/thumbnails/2.jpg)
Network Traffic Analysis
August 20172
Feature Vector
Traffic Analysis
Post Processing
Traffic Observation
Packet/Flow Data
Results
Reproducibility Workshop
Feature Selection:Select most suitable features
![Page 3: AMeta’AnalysisApproachforFeatureSelectionin …conferences.sigcomm.org/sigcomm/2017/files/program... · 2017. 10. 27. · Institute’of’Telecommunications TU’Wien. Network"Traffic"Analysis](https://reader036.fdocuments.us/reader036/viewer/2022070923/5fbbb15b20968a36f6144710/html5/thumbnails/3.jpg)
Well-chosen Features è Simplified Analysis
August 20173
diameter
length
magnetic
weight
Reproducibility Workshop
![Page 4: AMeta’AnalysisApproachforFeatureSelectionin …conferences.sigcomm.org/sigcomm/2017/files/program... · 2017. 10. 27. · Institute’of’Telecommunications TU’Wien. Network"Traffic"Analysis](https://reader036.fdocuments.us/reader036/viewer/2022070923/5fbbb15b20968a36f6144710/html5/thumbnails/4.jpg)
Agree to Disagree
August 20174
Source: Iglesias, Zseby: "Analysis of network traffic features for anomaly detection";; Machine Learning, 101 (2015), 1;; 59 - 84.
Reproducibility Workshop
![Page 5: AMeta’AnalysisApproachforFeatureSelectionin …conferences.sigcomm.org/sigcomm/2017/files/program... · 2017. 10. 27. · Institute’of’Telecommunications TU’Wien. Network"Traffic"Analysis](https://reader036.fdocuments.us/reader036/viewer/2022070923/5fbbb15b20968a36f6144710/html5/thumbnails/5.jpg)
Why a Meta Analysis?• Meta-Analysis common in other disciplines– Structures the state of art– Combines existing results– Identifies agreements/disagreements in the community
– Provides basis for gap analysis• Provides information about– Availability of data and tools – Parameter settings– Validation Methods– Terminology and notation
è Supports reproducibility and comparability
August 20175Reproducibility Workshop
![Page 6: AMeta’AnalysisApproachforFeatureSelectionin …conferences.sigcomm.org/sigcomm/2017/files/program... · 2017. 10. 27. · Institute’of’Telecommunications TU’Wien. Network"Traffic"Analysis](https://reader036.fdocuments.us/reader036/viewer/2022070923/5fbbb15b20968a36f6144710/html5/thumbnails/6.jpg)
Data Structure
August 20176Reproducibility Workshop
![Page 7: AMeta’AnalysisApproachforFeatureSelectionin …conferences.sigcomm.org/sigcomm/2017/files/program... · 2017. 10. 27. · Institute’of’Telecommunications TU’Wien. Network"Traffic"Analysis](https://reader036.fdocuments.us/reader036/viewer/2022070923/5fbbb15b20968a36f6144710/html5/thumbnails/7.jpg)
Example: Features• Base features• Operations on base features• Flow keys
August 20177
Standard IPFIX Information Element
Non-IPFIX feature
Reproducibility Workshop
![Page 8: AMeta’AnalysisApproachforFeatureSelectionin …conferences.sigcomm.org/sigcomm/2017/files/program... · 2017. 10. 27. · Institute’of’Telecommunications TU’Wien. Network"Traffic"Analysis](https://reader036.fdocuments.us/reader036/viewer/2022070923/5fbbb15b20968a36f6144710/html5/thumbnails/8.jpg)
Example: Data Set
August 20178Reproducibility Workshop
Dataset available
![Page 9: AMeta’AnalysisApproachforFeatureSelectionin …conferences.sigcomm.org/sigcomm/2017/files/program... · 2017. 10. 27. · Institute’of’Telecommunications TU’Wien. Network"Traffic"Analysis](https://reader036.fdocuments.us/reader036/viewer/2022070923/5fbbb15b20968a36f6144710/html5/thumbnails/9.jpg)
Example: Algorithms
August 20179
Tool available
Parameters not provided
Link to tools provided
Reproducibility Workshop
![Page 10: AMeta’AnalysisApproachforFeatureSelectionin …conferences.sigcomm.org/sigcomm/2017/files/program... · 2017. 10. 27. · Institute’of’Telecommunications TU’Wien. Network"Traffic"Analysis](https://reader036.fdocuments.us/reader036/viewer/2022070923/5fbbb15b20968a36f6144710/html5/thumbnails/10.jpg)
Initial Results• 71 Papers from years 2005 to 2017
August 2017Reproducibility Workshop
10
pcapAnomaly Detection
Analysis Chain
![Page 11: AMeta’AnalysisApproachforFeatureSelectionin …conferences.sigcomm.org/sigcomm/2017/files/program... · 2017. 10. 27. · Institute’of’Telecommunications TU’Wien. Network"Traffic"Analysis](https://reader036.fdocuments.us/reader036/viewer/2022070923/5fbbb15b20968a36f6144710/html5/thumbnails/11.jpg)
Initial Results• Flow Definitions– 64.6% of papers that define a flow-key useclassical 5-tuple sIP, dIP, sPort, dPort, Protocol
– 70.8% use bi-directional flows– 83.1% use flow-based features
• Data Sets– 46.5% use at least one public data set
• Most Common Features– Number of papers that use a specific base feature– Number of papers weighted with their citations log10(citations)
August 2017Reproducibility Workshop
11
![Page 12: AMeta’AnalysisApproachforFeatureSelectionin …conferences.sigcomm.org/sigcomm/2017/files/program... · 2017. 10. 27. · Institute’of’Telecommunications TU’Wien. Network"Traffic"Analysis](https://reader036.fdocuments.us/reader036/viewer/2022070923/5fbbb15b20968a36f6144710/html5/thumbnails/12.jpg)
Most Recurrent Base Features
August 201712Reproducibility Workshop
![Page 13: AMeta’AnalysisApproachforFeatureSelectionin …conferences.sigcomm.org/sigcomm/2017/files/program... · 2017. 10. 27. · Institute’of’Telecommunications TU’Wien. Network"Traffic"Analysis](https://reader036.fdocuments.us/reader036/viewer/2022070923/5fbbb15b20968a36f6144710/html5/thumbnails/13.jpg)
Summary• Meta Analysis for Network Traffic Analysis– Supports comparability and reproducibility– Focus on feature selection, but much more information collected
• JSON files– Structured, searchable state of art– Fast extraction of relevant information from papers
• Initial results– Most common features– Flow definitions– Usage of public data sets
• è Data allows for many further analysis opportunities
August 201713Reproducibility Workshop
![Page 14: AMeta’AnalysisApproachforFeatureSelectionin …conferences.sigcomm.org/sigcomm/2017/files/program... · 2017. 10. 27. · Institute’of’Telecommunications TU’Wien. Network"Traffic"Analysis](https://reader036.fdocuments.us/reader036/viewer/2022070923/5fbbb15b20968a36f6144710/html5/thumbnails/14.jpg)
Discussion• Manual data curation è Errors– Involve authors (check and correct)
• Analysis just shows “preferred” features, methods– è not necessarily the best!
• Incentives to fill data base– Conferences can require to add accepted papers– Students can add data when exploring state of art– Searchable data base may increase citations for papers included
• All data, documentation, paper data base available at: www.cn.tuwien.ac.at/meta
August 201714Reproducibility Workshop
![Page 15: AMeta’AnalysisApproachforFeatureSelectionin …conferences.sigcomm.org/sigcomm/2017/files/program... · 2017. 10. 27. · Institute’of’Telecommunications TU’Wien. Network"Traffic"Analysis](https://reader036.fdocuments.us/reader036/viewer/2022070923/5fbbb15b20968a36f6144710/html5/thumbnails/15.jpg)
Thank you!Contact: [email protected]