Post on 23-Feb-2016
description
Change Is Hard: Adapting Dependency Graph Models ForUnified Diagnosis in Wired/Wireless Networks
Lenin Ravindranath, Victor Bahl, Ranveer Chandra, David A. Maltz, Jitendra Padhye,
Parveen Patel
Enterprise Network (of the Near Future)
• Stationary servers hosted in wired cloud/DC• Nomadic users connect via wireless, VPN, RAS,
etc.
Inter-Building Network
DataCenter
NetworkServersRAS
Firewalls
InternetRemote user via VPN
Campus user
Access Points
End-to-end performance issues are a result of wired and wireless components
URL fetch time: wired desktop client and nomadic laptop client
Hard to figure out which component to blame
Existing solutions
Diagnose end-to-end application performance
Unified wired and wireless
Consider effects of wireless mobility
Ease of deployment
Recovery
Jigsaw, DAIR, WIT, Airtight √Sherlock √ √SMARTS √ √
No existing scheme works end-to-end in mixed wired/wireless environments
MnM Take Aways1. Unified view of the wireless/wired network
2. User location needs to be a first class consideration
3. A system architecture that can deal with constantly changing dependencies, is easy to deploy and takes corrective action
MnM’s hammer: Dynamic Dependency Graphs
• Dependency graphs– Link observations to root causes– Use a fault inference algorithm, e.g., Sherlock
• Deal with frequent topology changes due to mobility– Constantly monitor end-systems to detect changes– Apply differences to existing dependency graph
• Consider location as a first-class component– Bootstrap the location system without help from static
infrastructure– Use white-box monitoring to determine location
Example scenario:client accesses http://foo
DNS Server
Kerberos Server
Web ServerClient C
Stationary dependency graph
Dynamic dependency Graphs
Client C accesses http://foo
Name Resolution (C
DNS)
Certificate Fetch (C Kerberos)
HTTP Get(C WebSrv)
Path:C DNS Path:CKerberos Path:CWbSrv
Web Server
Kerberos server
DNS server
Access Point
Net
wor
kSe
rvic
es
Local Gateway
RTT
LocationInternet Path
Remote Gateway
RTT
RAS Server Routers ...
MnM System Architecture
Runs on every monitored machine Runs on a central server
Incrementally building an dependency graph
Type: Http.RequestInstance: http://foo
Client: C
Type: NetworkService
Name Resolution (C DNS)
Type: NetworkService
Certificate Fetch (C Kerberos)
Type: NetworkService
HTTP Get(C WebSrv)
Path:C DNS Path:CKerberos Path:CWbSrv
Web ServerKerberos serverDNS server
Access Point
Remote Gateway
RTT
Internet PathLocation
Local Gateway RTT
RAS Server Routers ...
HTTPExpert
ServiceExpert
NetExpert
WiFiExpert
RASExpert
LocationExpert
Example: end-to-end diagnosisRTT Monitor
HTTP Actuator
HTTP Expert
InferenceEngine
Measurement
ResponseAnalysis
Fault Observation
ObservationState Root-cause
Analysis
WiFiActuator
WiFiExpert
RC:AP, Location
Recovery:Change AP
Agent Inference Engine
Evaluation
• Controlled experiments– Verified accuracy of MnM diagnosis
• Two week study on 27 user laptops and 10 servers
Location Profiling Techniques
• AP-based location, default
• Outlook calendar-based, if available
• Cluster similar looking WiFi signatures to identify unnamed locations, e.g., a coffee shop
Calendar-based Location Profiles
Location Priors
Impact of Using Location Priors
Conclusion
• End-to-end performance diagnosis in mixed wired/wireless environments requires special considerations– The system needs to cope with constantly changing
dependencies– Location needs to be a first-class component
• MnM is an extensible system architecture for diagnosing performance faults using dynamic dependency graphs
Backup
Accuracy ResultsTarget RootCause
% the target Root Cause is first
Other Root Causes in top two
Reasons for other root causes
Location 55 Machine, Server, AP Location Error,Real congestion at the server
AP 100 First-hop router Few positive observations through the first-hop router
AP Handoff 86 Location, Machine, AP
Location Error, AP failures
Server 100 Last-hop router Few positive observations for last-hop router
Simultaneous faults 100 APFirst-hop router
Few positive observations for the first-hop router