Leveraging Anonymized Patient Level Data to Detect Hidden Market Potential
On the Utility of Anonymized Flow Traces for Anomaly Detection
description
Transcript of On the Utility of Anonymized Flow Traces for Anomaly Detection
![Page 1: On the Utility of Anonymized Flow Traces for Anomaly Detection](https://reader035.fdocuments.us/reader035/viewer/2022070403/56813a3f550346895da22a7d/html5/thumbnails/1.jpg)
On the Utility of Anonymized Flow Traces for Anomaly Detection
Author : Martin BURKHART , ∗ Daniela BRAUCKHOFF†, Martin MAY‡Journal: ITC SS 2008Advisor: Yuh-Jye LeeReporter: Yi-Hsiang YangEmail: [email protected]
2011/2/14
![Page 2: On the Utility of Anonymized Flow Traces for Anomaly Detection](https://reader035.fdocuments.us/reader035/viewer/2022070403/56813a3f550346895da22a7d/html5/thumbnails/2.jpg)
Contributions
• Introduce a generic methodology for evaluating the impact of anonymization•Quantify the utility of anonymized data for a
three-week long data•Present an overall estimate for the impact of
anonymization
2
![Page 3: On the Utility of Anonymized Flow Traces for Anomaly Detection](https://reader035.fdocuments.us/reader035/viewer/2022070403/56813a3f550346895da22a7d/html5/thumbnails/3.jpg)
Outline
•Introduction•Methodology•Measurement Results•Conclusion
3
![Page 4: On the Utility of Anonymized Flow Traces for Anomaly Detection](https://reader035.fdocuments.us/reader035/viewer/2022070403/56813a3f550346895da22a7d/html5/thumbnails/4.jpg)
Introduction
• Traffic data is hinderedReleasing data introduces a threat to users’
privacyAnomaly detection
Have been evaluated with anonymized data•Focus on the anonymization of IP addresses
BlackmarkingTruncationRandom Permutation(Partial) Prefix-Preserving permutation
4
![Page 5: On the Utility of Anonymized Flow Traces for Anomaly Detection](https://reader035.fdocuments.us/reader035/viewer/2022070403/56813a3f550346895da22a7d/html5/thumbnails/5.jpg)
Utility of Anonymized Data for Anomaly Detection
• Granularity design space has two dimensionsSubset size
The size of the network (subnet) that is to be analyzed
ResolutionThe address granularity which the traffic is
analyzed• Assume the whole design space is available
5
![Page 6: On the Utility of Anonymized Flow Traces for Anomaly Detection](https://reader035.fdocuments.us/reader035/viewer/2022070403/56813a3f550346895da22a7d/html5/thumbnails/6.jpg)
• Cell 1 [00,00]: Select all traffic and set the resolution to the minimum. • Cell 5 [00,16]: Select all traffic and set the resolution to /16 networks.
6
![Page 7: On the Utility of Anonymized Flow Traces for Anomaly Detection](https://reader035.fdocuments.us/reader035/viewer/2022070403/56813a3f550346895da22a7d/html5/thumbnails/7.jpg)
IP address anonymization techniques
7
•Blackmarking (BM)Blindly replaces all IP addresses in a trace with
the same value•Truncation (TR{t})
Replaces the t least significant bits of an IP address with 0
•Random permutation (RP)Translates IP addresses using a random
permutation Partial prefix-preserving permutation (PPP{p})
Permutes the host and network part of IP addresses independently
![Page 8: On the Utility of Anonymized Flow Traces for Anomaly Detection](https://reader035.fdocuments.us/reader035/viewer/2022070403/56813a3f550346895da22a7d/html5/thumbnails/8.jpg)
IP address anonymization techniques
•Prefix-preserving permutation (PP)Permutes IP addresses so that two addresses
sharing a common real prefix
8
![Page 9: On the Utility of Anonymized Flow Traces for Anomaly Detection](https://reader035.fdocuments.us/reader035/viewer/2022070403/56813a3f550346895da22a7d/html5/thumbnails/9.jpg)
![Page 10: On the Utility of Anonymized Flow Traces for Anomaly Detection](https://reader035.fdocuments.us/reader035/viewer/2022070403/56813a3f550346895da22a7d/html5/thumbnails/10.jpg)
Methodology•Data captured from the four border routers
of the Swiss Academic and Research NetworkIP address range contains about 2.4 million
IP addresses Traffic volume varies between 60 and 140
million NetFlow records per hourAnalyzed a three-week period (from August 19th
to September 10th 2007) 713 TerabytesUn-sampled and Non-anonymized flow data
10
![Page 11: On the Utility of Anonymized Flow Traces for Anomaly Detection](https://reader035.fdocuments.us/reader035/viewer/2022070403/56813a3f550346895da22a7d/html5/thumbnails/11.jpg)
Methodology-Ground Truth
•Visual inspection of metric timeseriesComputed the timeseries for five well-known
metrics byte, packet, flow counts, unique IP address counts,
and the Shannon entropy¶ of flows per IP address
At 15-minute intervals2016 data points per metric
11
![Page 12: On the Utility of Anonymized Flow Traces for Anomaly Detection](https://reader035.fdocuments.us/reader035/viewer/2022070403/56813a3f550346895da22a7d/html5/thumbnails/12.jpg)
Methodology-Ground Truth•Assigning ground truth to each interval
If the analyzed metric timeseries exposed an unusual event, classified that interval as anomalous
• Identifying the anomaly typeAssigned the anomalous events to different types
Volume A sharp increase or decrease in the volume based
metrics (D)DoS
Drop in the destination IP address entropy
12
![Page 13: On the Utility of Anonymized Flow Traces for Anomaly Detection](https://reader035.fdocuments.us/reader035/viewer/2022070403/56813a3f550346895da22a7d/html5/thumbnails/13.jpg)
Methodology-Ground Truth Scan
Increase in the destination IP address count and entropy
Network Fluctuation Cause an increase or decrease in the IP address
counts at the highest resolution Unknown
13
![Page 14: On the Utility of Anonymized Flow Traces for Anomaly Detection](https://reader035.fdocuments.us/reader035/viewer/2022070403/56813a3f550346895da22a7d/html5/thumbnails/14.jpg)
Methodology-Anomaly Detection•Use Kalman filter
Efficient recursive filter
14
![Page 15: On the Utility of Anonymized Flow Traces for Anomaly Detection](https://reader035.fdocuments.us/reader035/viewer/2022070403/56813a3f550346895da22a7d/html5/thumbnails/15.jpg)
Methodology
•60 studied metrics are different variants ofThree volume-based metrics (vbm)
Byte, packet and flow countsTwo feature-based metrics (fbm)
Unique IP address count Shannon entropy of flows per IP address
•Total (3[vbm] + (2[fbm] × 2[src/dst] × 3[res])) × 2[in/out] × 2[udp/tcp] = 60 detection metrics
15
![Page 16: On the Utility of Anonymized Flow Traces for Anomaly Detection](https://reader035.fdocuments.us/reader035/viewer/2022070403/56813a3f550346895da22a7d/html5/thumbnails/16.jpg)
Methodology
16
![Page 17: On the Utility of Anonymized Flow Traces for Anomaly Detection](https://reader035.fdocuments.us/reader035/viewer/2022070403/56813a3f550346895da22a7d/html5/thumbnails/17.jpg)
Measurement Results
17
![Page 18: On the Utility of Anonymized Flow Traces for Anomaly Detection](https://reader035.fdocuments.us/reader035/viewer/2022070403/56813a3f550346895da22a7d/html5/thumbnails/18.jpg)
Measurement Results•Volume Anomalies
Exposed by volume-based metricsFor TCP blackmarking and random permutation
perform slightly better
18
![Page 19: On the Utility of Anonymized Flow Traces for Anomaly Detection](https://reader035.fdocuments.us/reader035/viewer/2022070403/56813a3f550346895da22a7d/html5/thumbnails/19.jpg)
Measurement Results•Scanning and denial of service anomalies
Feature-based metrics
19
![Page 20: On the Utility of Anonymized Flow Traces for Anomaly Detection](https://reader035.fdocuments.us/reader035/viewer/2022070403/56813a3f550346895da22a7d/html5/thumbnails/20.jpg)
Measurement Results
•Network fluctuationsFeature-based metrics at lower resolutions
20
![Page 21: On the Utility of Anonymized Flow Traces for Anomaly Detection](https://reader035.fdocuments.us/reader035/viewer/2022070403/56813a3f550346895da22a7d/html5/thumbnails/21.jpg)
Measurement Results-AUC
21
![Page 22: On the Utility of Anonymized Flow Traces for Anomaly Detection](https://reader035.fdocuments.us/reader035/viewer/2022070403/56813a3f550346895da22a7d/html5/thumbnails/22.jpg)
Measurement Results
•Blackmarking Decreases the utility for detecting anomalies in
UDP and TCP traffic except volume anomalies
•Random permutation Very bad with the detection of anomalies in UDP
trafficPreserving the utility for TCP traffic
22
![Page 23: On the Utility of Anonymized Flow Traces for Anomaly Detection](https://reader035.fdocuments.us/reader035/viewer/2022070403/56813a3f550346895da22a7d/html5/thumbnails/23.jpg)
Measurement Results
•Truncation of 8 or 16 bitDecreases the utility for detecting anomalies in
TCP traffic by roughly10 percentPerforming well for UDP traffic
•(Partial) prefix-preserving permutationNo significant negative impact for detecting
anomalies in UDP and TCP traffic
23
![Page 24: On the Utility of Anonymized Flow Traces for Anomaly Detection](https://reader035.fdocuments.us/reader035/viewer/2022070403/56813a3f550346895da22a7d/html5/thumbnails/24.jpg)
Implicit Traffic Aggregation
•Analyzing the count of additional flows for 170 webserversTruncating a single bit
Around 10% of the webservers have a resulting traffic increase of 100% or more and 50% no additional traffic
Unaffected servers : 20% for 2 bits, 5% for 4 bits, and even 0% for 8 bits
25% for 2 bits, 55% for 4 bits and 89% for 8 bits at least a doubling of traffic
24
![Page 25: On the Utility of Anonymized Flow Traces for Anomaly Detection](https://reader035.fdocuments.us/reader035/viewer/2022070403/56813a3f550346895da22a7d/html5/thumbnails/25.jpg)
Conclusion
•Anonymization techniques impact statistical anomaly detection
• Introduced the detection granularity design space
•Analyzed the utility of anonymized traces
25
![Page 26: On the Utility of Anonymized Flow Traces for Anomaly Detection](https://reader035.fdocuments.us/reader035/viewer/2022070403/56813a3f550346895da22a7d/html5/thumbnails/26.jpg)
Thanks for your attentionQ&A
26