SketchLearn: Relieving User Burdens in Approximate...
Transcript of SketchLearn: Relieving User Burdens in Approximate...
![Page 1: SketchLearn: Relieving User Burdens in Approximate ...conferences.sigcomm.org/sigcomm/2018/files/slides/paper_10.4.pdf · User Burden 4 6 Measurement Algorithm. Configuration. Expected](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f1c28611e022324d529fc5d/html5/thumbnails/1.jpg)
SketchLearn: Relieving User Burdens in Approximate Measurement withAutomated Statistical Inference
Qun Huang, Patrick P. C. Lee, Yungang Bao
1
![Page 2: SketchLearn: Relieving User Burdens in Approximate ...conferences.sigcomm.org/sigcomm/2018/files/slides/paper_10.4.pdf · User Burden 4 6 Measurement Algorithm. Configuration. Expected](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f1c28611e022324d529fc5d/html5/thumbnails/2.jpg)
Typical Approximate Measurement
2
MeasurementAlgorithmConfiguration
Expected errors
Querythresholds
Domainknowledge
MeasurementResults
Network traffic Query
Network statistics
Our goal: identify and relieve user burdens for approximate measurement
![Page 3: SketchLearn: Relieving User Burdens in Approximate ...conferences.sigcomm.org/sigcomm/2018/files/slides/paper_10.4.pdf · User Burden 4 6 Measurement Algorithm. Configuration. Expected](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f1c28611e022324d529fc5d/html5/thumbnails/3.jpg)
User Burden 1
3
MeasurementAlgorithmConfiguration
Expected errors
Querythresholds
Domainknowledge
MeasurementResults
Network traffic Query
Network statistics
1: Hard to specify
Errors need to be specified:• bias: an answer deviates true answer by ε• failure probability: fail to produce small-error answer with a probability δ
![Page 4: SketchLearn: Relieving User Burdens in Approximate ...conferences.sigcomm.org/sigcomm/2018/files/slides/paper_10.4.pdf · User Burden 4 6 Measurement Algorithm. Configuration. Expected](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f1c28611e022324d529fc5d/html5/thumbnails/4.jpg)
User Burden 2
4
MeasurementAlgorithmConfiguration
Expected errors
Querythresholds
Domainknowledge
MeasurementResults
Network traffic Query
Network statistics
1: Hard to specify 2: Hard to know in advance
Large-threshold configurations fail to work for small-threshold queriesNo guideline for sufficiently small threshold• Vary across management operations and traffic characteristics
![Page 5: SketchLearn: Relieving User Burdens in Approximate ...conferences.sigcomm.org/sigcomm/2018/files/slides/paper_10.4.pdf · User Burden 4 6 Measurement Algorithm. Configuration. Expected](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f1c28611e022324d529fc5d/html5/thumbnails/5.jpg)
User Burden 3
5
MeasurementAlgorithmConfiguration
Expected errors
Querythresholds
Domainknowledge
MeasurementResults
Network traffic Query
Network statistics
1: Hard to specify 2: Hard to know in advance
3: Hard to follow theory to tune
Domain knowledge is not always available• Theories usually show worst-case results• Configuration for worst case not practically efficient
![Page 6: SketchLearn: Relieving User Burdens in Approximate ...conferences.sigcomm.org/sigcomm/2018/files/slides/paper_10.4.pdf · User Burden 4 6 Measurement Algorithm. Configuration. Expected](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f1c28611e022324d529fc5d/html5/thumbnails/6.jpg)
User Burden 4
6
MeasurementAlgorithmConfiguration
Expected errors
Querythresholds
Domainknowledge
MeasurementResults
Network traffic Query
Network statistics
1: Hard to specify 2: Hard to know in advance
3: Hard to follow theory to tune
4: Hard to redefine flow-keys
Algorithm works on a particular flow-key• Flow-key definition must be fixed in deployment
![Page 7: SketchLearn: Relieving User Burdens in Approximate ...conferences.sigcomm.org/sigcomm/2018/files/slides/paper_10.4.pdf · User Burden 4 6 Measurement Algorithm. Configuration. Expected](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f1c28611e022324d529fc5d/html5/thumbnails/7.jpg)
User Burden 5
7
MeasurementAlgorithmConfiguration
Expected errors
Querythresholds
Domainknowledge
MeasurementResults
Network traffic Query
Query Results
1: Hard to specify 2: Hard to know in advance
3: Hard to follow theory to tune
4: Hard to redefine flow-keys
5: Hard to quantify actual errors
Infeasible to track errors for a particular flow• Configuration only tells worst-case and overall errors
![Page 8: SketchLearn: Relieving User Burdens in Approximate ...conferences.sigcomm.org/sigcomm/2018/files/slides/paper_10.4.pdf · User Burden 4 6 Measurement Algorithm. Configuration. Expected](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f1c28611e022324d529fc5d/html5/thumbnails/8.jpg)
All User Burdens
8
MeasurementAlgorithmConfiguration
Expected errors
Querythresholds
Domainknowledge
MeasurementResults
Network traffic Query
Query Results
1: Hard to specify 2: Hard to know in advance
3: Hard to follow theory to tune
4: Hard to redefine flow-keys
5: Hard to quantify actual errors
![Page 9: SketchLearn: Relieving User Burdens in Approximate ...conferences.sigcomm.org/sigcomm/2018/files/slides/paper_10.4.pdf · User Burden 4 6 Measurement Algorithm. Configuration. Expected](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f1c28611e022324d529fc5d/html5/thumbnails/9.jpg)
Our Work
Relieve five user burdens
Performance• Catch up with underlying packet forwarding speed
Memory efficiency• Consume only limited memory
Accuracy• Preserve high accuracy of sketches
Generality• One design and one configuration for multiple tasks• Deployable in both software and hardware
9
SketchLearn: Sketch-based measurement system with limited user burdens
![Page 10: SketchLearn: Relieving User Burdens in Approximate ...conferences.sigcomm.org/sigcomm/2018/files/slides/paper_10.4.pdf · User Burden 4 6 Measurement Algorithm. Configuration. Expected](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f1c28611e022324d529fc5d/html5/thumbnails/10.jpg)
Root Cause: Resource Conflicts
Previous work: perfect configuration to eliminate conflicts• Determined by many factors
10
Algorithm Flow definitionQuery parameters
Resource conflicts
Input data Configuration
![Page 11: SketchLearn: Relieving User Burdens in Approximate ...conferences.sigcomm.org/sigcomm/2018/files/slides/paper_10.4.pdf · User Burden 4 6 Measurement Algorithm. Configuration. Expected](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f1c28611e022324d529fc5d/html5/thumbnails/11.jpg)
High-Level Idea Not pursue perfect configurations to mitigate resource conflicts
• Hard to identify right trade-offs
Characterize resource conflicts in an “imperfect” configuration
11
Network queries
Data structure withimperfect configuration Conflict modelStatistical
Properties
Expected errors
Query parameters
Flow definitions
Not necessary
Domain knowledge
![Page 12: SketchLearn: Relieving User Burdens in Approximate ...conferences.sigcomm.org/sigcomm/2018/files/slides/paper_10.4.pdf · User Burden 4 6 Measurement Algorithm. Configuration. Expected](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f1c28611e022324d529fc5d/html5/thumbnails/12.jpg)
How to Realize?
12
Network queries
Data structure withimperfect configuration Conflict modelStatistical
Properties
![Page 13: SketchLearn: Relieving User Burdens in Approximate ...conferences.sigcomm.org/sigcomm/2018/files/slides/paper_10.4.pdf · User Burden 4 6 Measurement Algorithm. Configuration. Expected](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f1c28611e022324d529fc5d/html5/thumbnails/13.jpg)
Design Data Structure
13
Network queries
Data structure withimperfect configuration Conflict modelStatistical
Properties
![Page 14: SketchLearn: Relieving User Burdens in Approximate ...conferences.sigcomm.org/sigcomm/2018/files/slides/paper_10.4.pdf · User Burden 4 6 Measurement Algorithm. Configuration. Expected](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f1c28611e022324d529fc5d/html5/thumbnails/14.jpg)
Multi-level Sketch [Cormode, ToN 05] L: # of bits considered
Data structure: L+1 levels (from 0 to L), each of which is a counter matrix
Level 0 is always updated
Level k is updated iff k-th bit in a flowkey is 1
All levels share same hash functions
14
![Page 15: SketchLearn: Relieving User Burdens in Approximate ...conferences.sigcomm.org/sigcomm/2018/files/slides/paper_10.4.pdf · User Burden 4 6 Measurement Algorithm. Configuration. Expected](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f1c28611e022324d529fc5d/html5/thumbnails/15.jpg)
Statistical Properties of Resource Conflicts
15
Network queries
Conflict modelStatisticalProperties
Multi-levelSketch
![Page 16: SketchLearn: Relieving User Burdens in Approximate ...conferences.sigcomm.org/sigcomm/2018/files/slides/paper_10.4.pdf · User Burden 4 6 Measurement Algorithm. Configuration. Expected](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f1c28611e022324d529fc5d/html5/thumbnails/16.jpg)
Properties
Does a flow update level k?• Depends on inherent distribution of flow keys
Does a flow update bucket (i, j)?• Depends on hash functions of sketch
A theory should characterize the above two factors
16
![Page 17: SketchLearn: Relieving User Burdens in Approximate ...conferences.sigcomm.org/sigcomm/2018/files/slides/paper_10.4.pdf · User Burden 4 6 Measurement Algorithm. Configuration. Expected](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f1c28611e022324d529fc5d/html5/thumbnails/17.jpg)
Main Theorem
17
Main Theorem: if no large flows, Ri,j k follows Gaussian distribution with mean p[k]
p[k]: probability that k-th bit is 1
Vi,j 𝑘𝑘 : counter of bucket (i,j)Vi,j 0 : counter of bucket (i,j)
random variable Ri,j k =Vi,j 𝑘𝑘Vi,j 0
![Page 18: SketchLearn: Relieving User Burdens in Approximate ...conferences.sigcomm.org/sigcomm/2018/files/slides/paper_10.4.pdf · User Burden 4 6 Measurement Algorithm. Configuration. Expected](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f1c28611e022324d529fc5d/html5/thumbnails/18.jpg)
Build Conflict Model
18
Network queries
Conflict model
MainTheorem
Multi-levelSketch
![Page 19: SketchLearn: Relieving User Burdens in Approximate ...conferences.sigcomm.org/sigcomm/2018/files/slides/paper_10.4.pdf · User Burden 4 6 Measurement Algorithm. Configuration. Expected](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f1c28611e022324d529fc5d/html5/thumbnails/19.jpg)
Statistical Model Inference
Goals• Extract all large flows• Guarantee remaining flows in sketches are small• Estimate Gaussian distributions for each level
Challenges• No guidelines to distinguish large and small flows
19
![Page 20: SketchLearn: Relieving User Burdens in Approximate ...conferences.sigcomm.org/sigcomm/2018/files/slides/paper_10.4.pdf · User Burden 4 6 Measurement Algorithm. Configuration. Expected](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f1c28611e022324d529fc5d/html5/thumbnails/20.jpg)
Self-Adaptive Inference Algorithm
20
Multi-LevelSketch
ImperfectGaussian Distribution
Large Flows
Estimate Gaussian distributions
Extract large flows
Remove extracted large flows(Flow extracted)
Use smaller threshold(No flow extracted)
![Page 21: SketchLearn: Relieving User Burdens in Approximate ...conferences.sigcomm.org/sigcomm/2018/files/slides/paper_10.4.pdf · User Burden 4 6 Measurement Algorithm. Configuration. Expected](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f1c28611e022324d529fc5d/html5/thumbnails/21.jpg)
Self-Adaptive Inference Algorithm
21
Multi-LevelSketch
ImperfectGaussian Distribution
Large Flows
Estimate Gaussian distributions
Extract large flows
Remove extracted large flows(Flow extracted)
Use smaller threshold(No flow extracted)
![Page 22: SketchLearn: Relieving User Burdens in Approximate ...conferences.sigcomm.org/sigcomm/2018/files/slides/paper_10.4.pdf · User Burden 4 6 Measurement Algorithm. Configuration. Expected](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f1c28611e022324d529fc5d/html5/thumbnails/22.jpg)
Large Flow Extraction
Intuition: a large flow• (i) results in extremely (large or small) Ri,j k , or• (ii) at least deviates Ri,j k from its expectation p[k] significantly
22
Key idea: examine Ri,j k and its difference from p[k], then• Determine k-th bit of a large flow (assuming it exists)• Estimate frequency• Associate flow confidence
More details in paper!
![Page 23: SketchLearn: Relieving User Burdens in Approximate ...conferences.sigcomm.org/sigcomm/2018/files/slides/paper_10.4.pdf · User Burden 4 6 Measurement Algorithm. Configuration. Expected](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f1c28611e022324d529fc5d/html5/thumbnails/23.jpg)
Guarantee
23
Theorem: w is sketch width, flows exceeding 1/w of total must be extracted• Empirical results: even flows that are smaller than 1/w can also be extracted!
64KB: flows above 0.6% of total traffic can be extracted (by theorem)
Practice: >99% flows exceeding 0.3% are also extracted with 64KB
![Page 24: SketchLearn: Relieving User Burdens in Approximate ...conferences.sigcomm.org/sigcomm/2018/files/slides/paper_10.4.pdf · User Burden 4 6 Measurement Algorithm. Configuration. Expected](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f1c28611e022324d529fc5d/html5/thumbnails/24.jpg)
Multi-levelSketch
How to Perform Network-wide Queries?
24
Self-adaptiveModel Inference
DecoupledComponents
Extracted Large Flows
ResidualSketch
Conflict Characteristics
FlowConfidence
GaussianDistributions
MainTheorem
Network queries
![Page 25: SketchLearn: Relieving User Burdens in Approximate ...conferences.sigcomm.org/sigcomm/2018/files/slides/paper_10.4.pdf · User Burden 4 6 Measurement Algorithm. Configuration. Expected](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f1c28611e022324d529fc5d/html5/thumbnails/25.jpg)
Supported Network Queries
Per-flow byte count
Heavy hitter detection
Heavy changer detection
Cardinality estimation
Flow size distribution estimation
Entropy estimation
25
![Page 26: SketchLearn: Relieving User Burdens in Approximate ...conferences.sigcomm.org/sigcomm/2018/files/slides/paper_10.4.pdf · User Burden 4 6 Measurement Algorithm. Configuration. Expected](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f1c28611e022324d529fc5d/html5/thumbnails/26.jpg)
Extended Query Model
Allow query for arbitrary flowkeys• Only use corresponding levels
Estimate actual query errors• Use attached errors for each large flow• Use Gaussian distributions
26
![Page 27: SketchLearn: Relieving User Burdens in Approximate ...conferences.sigcomm.org/sigcomm/2018/files/slides/paper_10.4.pdf · User Burden 4 6 Measurement Algorithm. Configuration. Expected](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f1c28611e022324d529fc5d/html5/thumbnails/27.jpg)
Multi-levelSketch
Putting It Together
27
Self-adaptiveModel Inference
DecoupledComponents
Extracted Large Flows
ResidualSketch
Conflict Characteristics
FlowConfidence
GaussianDistributions
Network-wideQuery Runtime
MainTheorem
![Page 28: SketchLearn: Relieving User Burdens in Approximate ...conferences.sigcomm.org/sigcomm/2018/files/slides/paper_10.4.pdf · User Burden 4 6 Measurement Algorithm. Configuration. Expected](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f1c28611e022324d529fc5d/html5/thumbnails/28.jpg)
(Slight) User Burdens Users now just need to configure the multi-level sketch
28
Expected errors
Query parameters
Flow definitions
Domain knowledge
Multi-levelSketch
Not necessary
One row suffices
Columns: based on minimum flow or memory budget
Example: 400 KB memoryRequire flows exceeding 0.1% 1000 columns
![Page 29: SketchLearn: Relieving User Burdens in Approximate ...conferences.sigcomm.org/sigcomm/2018/files/slides/paper_10.4.pdf · User Burden 4 6 Measurement Algorithm. Configuration. Expected](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f1c28611e022324d529fc5d/html5/thumbnails/29.jpg)
Implementation
Challenge: updating L+1 levels is time consuming
Solution: parallel updating• L+1 levels are independent
Software• Based on OpenVSwitch + DPDK• Parallelism with SIMD
Hardware• Based on P4 programmable switches• Parallelism with P4 pipeline stages
29
![Page 30: SketchLearn: Relieving User Burdens in Approximate ...conferences.sigcomm.org/sigcomm/2018/files/slides/paper_10.4.pdf · User Burden 4 6 Measurement Algorithm. Configuration. Expected](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f1c28611e022324d529fc5d/html5/thumbnails/30.jpg)
Evaluation
Platforms• Software: OpenVSwitch + DPDK• Hardware: Tofino swtich• Large-scale simulation
Traces• Caida 2017• Data center traffic (IMC 2010)
30
![Page 31: SketchLearn: Relieving User Burdens in Approximate ...conferences.sigcomm.org/sigcomm/2018/files/slides/paper_10.4.pdf · User Burden 4 6 Measurement Algorithm. Configuration. Expected](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f1c28611e022324d529fc5d/html5/thumbnails/31.jpg)
Fitting Theorem
31
Guaranteed boundary of flow extraction
Flows that are guaranteed to be extracted
Flows that are not guaranteed(but are likely) to be extracted
![Page 32: SketchLearn: Relieving User Burdens in Approximate ...conferences.sigcomm.org/sigcomm/2018/files/slides/paper_10.4.pdf · User Burden 4 6 Measurement Algorithm. Configuration. Expected](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f1c28611e022324d529fc5d/html5/thumbnails/32.jpg)
Arbitrary Flow Keys
Query for three flow keys
32
Heavy hitter detection Traffic size distribution
![Page 33: SketchLearn: Relieving User Burdens in Approximate ...conferences.sigcomm.org/sigcomm/2018/files/slides/paper_10.4.pdf · User Burden 4 6 Measurement Algorithm. Configuration. Expected](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f1c28611e022324d529fc5d/html5/thumbnails/33.jpg)
More Experiments
Resource consumption
Generality for various measurement tasks
Efficiency of attached query errors
Network-wide measurement
33
![Page 34: SketchLearn: Relieving User Burdens in Approximate ...conferences.sigcomm.org/sigcomm/2018/files/slides/paper_10.4.pdf · User Burden 4 6 Measurement Algorithm. Configuration. Expected](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f1c28611e022324d529fc5d/html5/thumbnails/34.jpg)
Conclusion Analyze 5 user burdens in existing approximate measurement
SketchLearn framework• Multi-level data structure design• Theory: counters follow Gaussian distributions when no large flows• Self-adaptive model inference algorithm• Extended query models
Prototype and evaluations
Source code available at: https://github.com/huangqundl/SketchLearn
34
![Page 35: SketchLearn: Relieving User Burdens in Approximate ...conferences.sigcomm.org/sigcomm/2018/files/slides/paper_10.4.pdf · User Burden 4 6 Measurement Algorithm. Configuration. Expected](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f1c28611e022324d529fc5d/html5/thumbnails/35.jpg)
Limitations and Future Work
Less pipeline consumptions
Quantify Gaussian distributions and convergence rate
More applications
35