Data Mining Challenges for Network Management
Nick Feamster, Georgia Tech
Dave Andersen, CMU
http://www.datapository.net/
(joint with Jay Lepreau and Emulab)
Reactive Operation
• Problems cause downtime• Problems often not immediately apparent
What happens if I tweak this policy…?
Configure ObserveWait for
Next ProblemDesired Effect?
RevertNo
Yes
Proactive Techniques
Better: Proactive Operation
• Idea: Analyze configuration before deployment
Configure
rccDetectFaults Deploy
Many faults can be detected with static analysis.
PredictTraffic Flow
Dynamics for Network Management
• Problem: Many problems can’t be detected from static configuration analysis of a single AS
• Dependencies on neighboring ASes– Contract violations– Route hijacks– BGP “wedgies”– Filtering
• Dependencies on route arrivals– Simple network configurations can oscillate, but operators can’t
tell until the routes actually arrive.
Threshold-based “anomaly detection” schemes cannot detect these problems.
Network Management Challenges
• Infrastructure support for data management– Heterogeneous
• DB support for longest-prefix match would make correlation of routing and traffic data (“joint analysis”) much easier
– Large volumes– Need for real-time analysis (e.g., for anomalies/intrusion detection)
• Algorithmic support for data mining– Support for joint analysis– Threshold-based schemes don’t work for
• Small traffic blips• Small routing blips
• Support for proactive, offline analysis of routing dynamics– Analyzing configuration changes, etc.
• Support for online control
Challenge 1: Infrastructure Support
• Separate: collection, storage, analysis• Collection: abstract type, format, and access method
Challenge 2: Algorithmic SupportBlips across signals may be more operationally
interesting than any spike in one.
Challenge 3: Proactive Fault Detection
Configure
Static FaultDetection
ConstructNetwork Model
Dynamic AnalysisIn Emulation Deploy
Proactive Techniques
Existing Routes(e.g., from Datapository)
A possibility: detect configuration faults by observing “playback” of routing dynamics
“What-if” analysis in a safe sandbox.
Challenge 4: Support for Online Control
Probes
BGP updates
IGP updates
Netflow
Router Configs
Compute Engine(input processing)
Storageand DB
AnomalyDetection
Network-Wide Route Selection,
Filter deployment, etc.
Given a system to monitor, why not also use it for control?
Top Related