Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks
-
Upload
leo-wilson -
Category
Documents
-
view
38 -
download
0
description
Transcript of Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks
![Page 1: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812c90550346895d913e35/html5/thumbnails/1.jpg)
Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks
Lili Qiu, Paramvir Bahl, Ananth Rao, and Lidong ZhouMicrosoft Research
Presented by -Maitreya Natu
![Page 2: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812c90550346895d913e35/html5/thumbnails/2.jpg)
Network Management
Faulty network
…
Root cause
Faults directory
Corrective measure
Healthy network
![Page 3: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812c90550346895d913e35/html5/thumbnails/3.jpg)
Tasks involved in Network Management Continuously monitoring the functioning Collecting information about the nodes and the
links Removing inconsistencies and noise from the
reported information Analyzing the information Taking appropriate actions to improve network
reliability and performance
![Page 4: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812c90550346895d913e35/html5/thumbnails/4.jpg)
Challenges in wireless networks
Dynamic and unpredictable topology link errors due to fluctuating environment
conditionsNode mobility
Limited capacityScarcity of resources
Link attacks
![Page 5: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812c90550346895d913e35/html5/thumbnails/5.jpg)
Proposed framework
Reproduce inside a simulator, the real-world events that took place
Use online trace driven simulation to detect faults and analyze the root causes
![Page 6: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812c90550346895d913e35/html5/thumbnails/6.jpg)
Network Management
…
Healthy network
Types of faults
Network model
Faults directory
Creating a network model
![Page 7: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812c90550346895d913e35/html5/thumbnails/7.jpg)
Network Management
Faulty network
…
Types of faults
Network model
Detected faults
Fault diagnosis
Faults directory
![Page 8: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812c90550346895d913e35/html5/thumbnails/8.jpg)
Network Management
…
Types of faults
Network modelwhat-if analysis
Detected faults
Faults directory
Corrective measures
![Page 9: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812c90550346895d913e35/html5/thumbnails/9.jpg)
Key issues
How to Accurately reproduce what happened in the network inside a simulator
How to build fault diagnosis on top of a simulator to perform root cause analysis
![Page 10: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812c90550346895d913e35/html5/thumbnails/10.jpg)
Accurate modeling
Use real traces from the diagnosed networkRemoves dependency on generic theoretical
modelsCaptures nuances of the hardware, software
and environment of the particular network Collect good quality data
By developing a technique to effectively rule out erroneous data
![Page 11: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812c90550346895d913e35/html5/thumbnails/11.jpg)
Fault diagnosis
Performance data emitted by trace driven simulation is used as baseline
Any significant deviation indicates a potential fault
Simulator selectively injects a set of suspected faults and searches a set that most produces the expected performance
An efficient algorithm is designed to determine root causes
![Page 12: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812c90550346895d913e35/html5/thumbnails/12.jpg)
System Overview
simulator
Topology changes
Traffic Simulator
Interference InjectionLink RSSLink Load
Routing update
Faults Directory
+/-Expected loss rateThroughput noise
Loss rateThroughput noise
Error
Link/Node failure
1. Receive Cleaned Data 2. Drive Simulation
3. Compute Expected Performance
4. Compare Expected & AveragePerformance
5. Discrepancy Found
6. Search for set of faults that result in best explanation
7. Report thecause of failure
![Page 13: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812c90550346895d913e35/html5/thumbnails/13.jpg)
Why Simulation Based Diagnosis?
Much better insights into the network behavior than any heuristic or theoretical technique
Highly customizable and applies to a large class of networks
Ability to perform what-if analysis Helps to foresee the consequences of a corrective
action
Recent advances in simulators have made possible their use for real-time analysis
![Page 14: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812c90550346895d913e35/html5/thumbnails/14.jpg)
Accurate modeling
…
Healthy network
Types of faults
Network model
Faults directory
![Page 15: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812c90550346895d913e35/html5/thumbnails/15.jpg)
Current network models
Bayesian networks to map symptom-fault dependencies
Context Free Grammars Correlation Matrix
![Page 16: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812c90550346895d913e35/html5/thumbnails/16.jpg)
Can on-line simulations be used as core tool?
![Page 17: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812c90550346895d913e35/html5/thumbnails/17.jpg)
Building confidence in simulator accuracy Problem
Hard to accurately model the physical layer and the RF propagation
Traffic demands on the router are hard to predict
![Page 18: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812c90550346895d913e35/html5/thumbnails/18.jpg)
Building confidence in simulator accuracy Problem
Hard to accurately model the physical layer and the RF propagation
Traffic demands on the router are hard to predict
Solution “after the fact” simulation Agents periodically report information about the link
conditions and traffic patterns to the link simulators
![Page 19: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812c90550346895d913e35/html5/thumbnails/19.jpg)
Simulations when the RF condition of the link is good
Modeling the overheads of the protocol stack such as parity bits, MAC-layer back-off, IEEE 802.11 inter-framespacing and ACK, and headers.
Modeling the contention from flows within theinterference and communication ranges.
![Page 20: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812c90550346895d913e35/html5/thumbnails/20.jpg)
Simulations with varying received signal strength
Throughput matches closely with the simulator’s estimate,when signal quality is good
Simulator estimate deviates from real, when signal strength is poor
![Page 21: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812c90550346895d913e35/html5/thumbnails/21.jpg)
Why simulation results deviate in case of poor signal strength? Lack of accurate packet loss as a function
of packet size, RSS and ambient noise. Depends on signal processing hardware and
the RF antenna within the wireless cards Lack of accurate auto-rate control
Adjustment of sending rate done by WLAN cards based on the transmission conditions
![Page 22: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812c90550346895d913e35/html5/thumbnails/22.jpg)
How to model auto-rate control done by WLAN cards? Use Trace driven simulation When auto-rate is in use
Collect the rate at which the wireless card is operating and provide the reported rate to the simulator
OtherwiseData rate is known to the simulator
![Page 23: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812c90550346895d913e35/html5/thumbnails/23.jpg)
How to model accurate packet loss as a function of packet-size, RSS and ambient noise? Use offline analysis Calibrate the wireless cards and create a
database associating environmental factors with expected performanceE.g., mapping from signal strength and noise
to loss rate
![Page 24: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812c90550346895d913e35/html5/thumbnails/24.jpg)
Experiment to model the loss rates due to poor signal strength Collect another set of traces
Slowly send out packetsPlace packet sniffers near both the sender
and the receiver, and derive loss rate from the packet level trace
Seed the wireless link in the simulator with a Bernoulli loss rate that matches loss rate with the real traces
![Page 25: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812c90550346895d913e35/html5/thumbnails/25.jpg)
Estimated and measured throughput when compensating for the loss rate due to poor signal strength
Even though the match is not perfect, its not expected to be a problem, because many routing protocols try to avoid the use of poor quality
links Poor quality links are used only when certain parts of mesh
network have poor connectivity to the rest of the network In a well-engineered network, not many nodes depend on
such bad link for routing
Loss rate and the measured throughput do not monotonically decrease with the signal strength due to the effect of auto-rate
![Page 26: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812c90550346895d913e35/html5/thumbnails/26.jpg)
Stability of channel conditions
How rapidly do channel conditions change and how often a trace should be collected?
![Page 27: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812c90550346895d913e35/html5/thumbnails/27.jpg)
Temporal fluctuation in RSS
Fluctuation magnitude is not significant Relative quality of signals across different
number of walls remain stable
![Page 28: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812c90550346895d913e35/html5/thumbnails/28.jpg)
Stability of channel conditions
How rapidly do channel conditions change and how often a trace should be collected?When the environment is generally static,
nodes may report only the average and standard deviation of the RSS to the manager every few minutes
![Page 29: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812c90550346895d913e35/html5/thumbnails/29.jpg)
Dealing with imperfect data
By neighborhood monitoring Each node reports performance and traffic statistics
for its incoming and outgoing links And for other links in its communication range
Possible when node is in promiscuous mode Thus multiple reports are sent for each link Redundant reports can be used to detect
inconsistency Find the minimum set of nodes that can explain
the inconsistency in the reports
![Page 30: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812c90550346895d913e35/html5/thumbnails/30.jpg)
Summary
How to accurately model the real behavior? Solution: Use trace-based simulation
Problem: Simulation results are good for strong signals but deviate for bad RF conditions Need to model the autorate control
Use trace-driven data Need to model the loss rate due to poor signal strength
Use offline analysis How often a trace should be collected?
Very little data (average and standard deviation of RSS), at fairly low time granularity, as channels are relatively stable
How to deal with imperfect data By neighborhood monitoring
![Page 31: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812c90550346895d913e35/html5/thumbnails/31.jpg)
Fault diagnosis
Faulty network
…
Types of faults
Network model
Detected faults
Faults directory
![Page 32: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812c90550346895d913e35/html5/thumbnails/32.jpg)
Current fault diagnosis approaches
AI techniquesRule based systemsNeural networks
Model traversing techniquesDependency graphsCausality graphsBayesian networks
![Page 33: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812c90550346895d913e35/html5/thumbnails/33.jpg)
Fault Isolation and Diagnosis
Establish the expected performance in the simulation
Find difference between expected and observed performance
Search over the fault space to detect which set of faults can re-produce performance similar to what has been observed
![Page 34: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812c90550346895d913e35/html5/thumbnails/34.jpg)
Collecting data from traces
Trace data collection Network topology
Each node reports its neighbor and routing tables Traffic statistics
Each node maintains counters of traffic sent and received from immediate neighbors
Physical medium Each node reports signal strength of wireless links to neighbors
Network performance Includes both the link and end-to-end performance, which can be
measured through loss rate, delay, throughputs Focus is on link level performance
![Page 35: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812c90550346895d913e35/html5/thumbnails/35.jpg)
Simulating the network performance Traffic load simulation
Link based traffic simulation Adjust application sending rate to match the observed link-level
traffic counts Route simulation
Use actual routes taken by packets as input to the simulator Wireless signal
Use real measurement of signal strength Fault injection
Random packet dropping External noise sources MAC misbehavior
![Page 36: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812c90550346895d913e35/html5/thumbnails/36.jpg)
Fault diagnosis algorithm
General approach
Simulator Expected performanceNetwork settings
Simulator Observed performanceNetwork settings
Faults set
How to find ?
![Page 37: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812c90550346895d913e35/html5/thumbnails/37.jpg)
How to search the faults efficiently?
Different types of faults often change one or few metricsE.g., random dropping only affects link loss
rate Thus use metrics in which observed and
expected performance is significantly different, to guide the search
![Page 38: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812c90550346895d913e35/html5/thumbnails/38.jpg)
Scenario where faults do not have strong interactions
Consider large deviation from expected performance as anomaly
Use decision tree to determine the type of fault
Fault type determines the metric to quantify performance difference
Locate faults by finding the set of nodes and links with large difference between expected and observed performance
![Page 39: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812c90550346895d913e35/html5/thumbnails/39.jpg)
Scenario where faults have strong interactions Get the initial diagnosis set from the decision
tree algorithm Iteratively refine the fault set
Adjust the magnitudes of faults in the fault set Translate difference in performance into change in faults’
magnitude It maps the impact of a fault into its magnitude Remove fault whose magnitude is too small
Add new faults that can explain large differences between the expected and observed performances
Iterate till the change in fault set is negligible
![Page 40: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812c90550346895d913e35/html5/thumbnails/40.jpg)
Example scenario
1
2
3
4 5
![Page 41: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812c90550346895d913e35/html5/thumbnails/41.jpg)
Example scenario
1
2
3
4 5
Observed performance• Increased loss rate at 1-4 and 1-2• No increase in the sending rate of 1-4, 1-2• No increase in noise experienced by neighbors
Inference
Increased Sending Rate
Increased Noise
Increased Loss
Too low CW
Noise
Packet Drop Normal
Y N
Y
Y
N
N
![Page 42: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812c90550346895d913e35/html5/thumbnails/42.jpg)
Example scenario
1
2
3
4 5
Observed performance• Increased loss rate at 1-4 and 1-2• No increase in the sending rate of 1-4, 1-2• No increase in noise experienced by neighbors
Inference
Increased Sending Rate
Increased Noise
Increased Loss
Too low CW
Noise
Packet Drop Normal
Y N
Y
Y
N
NPacket dropping at node 1
![Page 43: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812c90550346895d913e35/html5/thumbnails/43.jpg)
Accuracy of fault diagnosis
Correctness of the model Complete information Consistent information Timely information
Correctness of the reported symptoms Right size of the threshold to report a symptom Difference in the behavior of faults Timely reporting of symptoms
![Page 44: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812c90550346895d913e35/html5/thumbnails/44.jpg)
System implementation
Windows XP Agents run on every wireless node and reports information collected
on demand Managers collect and analyze information Collected information is cast into performance counters supported
by Windows Manager is connected to a backend simulator. Collected information
is converted to script to drive the simulation Testbed:
Multihop wireless testbed built using IEEE 802.11a cards Commercially available network sniffer called Airopeek is used for data
collection Native 802.11 NICs provide rich set of networking information
![Page 45: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812c90550346895d913e35/html5/thumbnails/45.jpg)
Evaluation: Data collection overhead
Management traffic overhead Performance of FTP flow withand without data collection
No data cleaning: Each link is reported only onceWith data cleaning: Each link is reported by all observers for consistency check
Overhead < 800 bits/s/node Data collection traffic has little effect
![Page 46: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812c90550346895d913e35/html5/thumbnails/46.jpg)
Data cleaning effectiveness
Higher accuracy with denser networks
Higher accuracy with client-server traffic
Coverage greater than 80% in all cases
Higher accuracy with grid topology
Higher coverage when using history
![Page 47: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812c90550346895d913e35/html5/thumbnails/47.jpg)
Evaluation: Fault diagnosisDetecting random dropping Detecting external noise
•Symptom: Significant difference in loss rates in links•Less than 20% of fault links are left undetected•No-effect faults are faulty links sending less that threshold (250) packets of data
•Symptom: Significant difference in noise level in nodes•Noise sources are correctly identified with at most one or two false positives•Inference error in magnitudes of noises is within 4%
![Page 48: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812c90550346895d913e35/html5/thumbnails/48.jpg)
Evaluation: Fault diagnosisDetecting MAC misbehavior Detecting combinations of all
•Symptom: Significant discrepancy in throughput on links•Coverage is mostly around 80% or higher•False positives within 2
![Page 49: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812c90550346895d913e35/html5/thumbnails/49.jpg)
what-if analysis
…
Types of faults
Network model
Detected faults
Faults directory
Corrective measures
![Page 50: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812c90550346895d913e35/html5/thumbnails/50.jpg)
What-if analysisDiagnosisTopology
Corrective measures
![Page 51: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812c90550346895d913e35/html5/thumbnails/51.jpg)
Limitations
Limited by accuracy of the simulator Time to detect the faults is acceptable for
detecting long term faults but not transient faults Choices of traces to drive the simulation has
important implications Focus has only been on faults resulting in
different behavior
![Page 52: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812c90550346895d913e35/html5/thumbnails/52.jpg)
Conclusion
Used trace data for modeling the network Data collection techniques are presented to
collect network information and detect a deviation from the expected performance
Fault diagnosis algorithm is proposed to detect the root causes of failure
A scheme for what-if analysis is proposed to evaluate alternative network configuration for efficient network operation
![Page 53: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812c90550346895d913e35/html5/thumbnails/53.jpg)
Future work
Validation on a large test-bed Performance analysis in presence of mobility Detecting malicious attacks Diagnosis in presence of incomplete network
information More deeply investigating the potential of what-if
analysis
![Page 54: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812c90550346895d913e35/html5/thumbnails/54.jpg)
References
L. Qiu, P. Bahl, A. Rao, L. Zhou, Fault Detection, Isolation, and Diagnosis in Multihop Wireless Networks, Microsoft Technical Report, Microsoft Researh-TR-2004-11, Dec. 2003
M. Steinder, A. Sethi, A survey of fault localization techniques in computer networks, Technical Report 2001, CIS Dept., Univ of Delaware, Feb 2001
M. Steinder, Probabilistic inference for diagnosing service failures in communication systems, PhD thesis, Univ. of Delaware, 2003
![Page 55: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812c90550346895d913e35/html5/thumbnails/55.jpg)
Questions
What is proposed solution to model the throughput when the signal strength is poor? In Table 2, the simulated throughput monotonically decreases with the loss rate while the measured throughput does not. Why?
What could be the causes of generation of false positives in the fault diagnosis results? When can the false positive ratio increase?
http://www.cis.udel.edu/~natu/861/861.html