Automatically Classifying Benign and Harmful Data Races Using Replay Analysis
description
Transcript of Automatically Classifying Benign and Harmful Data Races Using Replay Analysis
Automatically Classifying Benign and Harmful Data
Races Using Replay AnalysisS. Narayanasamy, Z. Wang, J. Tigani, A. Edwards, B. Calder
UCSD and MicrosoftPLDI 2007
Data Races hard to debug◦ Difficult to detect◦ Even more difficult to reproduce
Data Race Detectors help in detection◦ LockSet, Happens-Before and Atomicity Violation
But they tend to overdo it◦ Up to 90% false alarms
Especially with LockSet
We need a tool that detects and reliably classifies
all harmful Data Races
Motivation
2
3
Offline Dynamic Happens-BeforeData Race Detection
◦ Step 1: Trace Capturing◦ Step 2: Offline Happens-Before Analysis◦ Step 3: Replay Critical Segments◦ Step 4: Auto Classify harmful vs. benign races
Algorithm Overview
4
iDNA captures the execution of an application
Simply records initial state,◦ Registers and PC
load values,◦ Only those needed absolutely◦ 1st load after a store, DMA etc…
and a global clock (sequensers)◦ Inserted in the thread’s replay log for
Synchronization events System calls
1. Trace Capturing & Replaying
5
2. Offline Happens-Before Analysis Good old Happens-
Before◦ Two conflicting
accesses At least one write Not ordered
Detects only the data races that happened
6
When a data race is detected replay the affected segments twice◦ 1st with the actual order
Given by the load values◦ 2nd reverse the racing accesses
Store the replay result◦ No-State-Change: If all live-outs are the same◦ State-Change: If at least 1 live-out changed◦ Replay Failure: If disaster encountered
Load null or unencountered address Branch someplace else
3. Replay Critical Segments
7
3. Replay Critical Segments cont.
Replay Failure Potentially Harmful Data Race
8
Repeat step 3 for each instance of a data race
Potentially Benign Data Race◦ every replay results to No-State-Change
Potentially Harmful Data Race◦ ≥1 replay results in State-Change or Replay Failure
State-Change shows that something would be different if things took the other path
Replay Failure indicates that a program changed that much, so we cannot simulate the other state
◦ Concrete proof that something definitely changed Easier for the programmers to accept it
4. Automatic Classification
9
18 different executions of various services in Windows Vista and Internet Explorer
Happens-Before returns 16,642 data races◦ 68 unique
Trace capture◦ 0.8 bits per instruction
96 MB per 1,000,000,000 instructions Only 1st loads and synchronizers captured
◦ 0.3 if compressed with zip
Evaluation
10
Results for Internet Explorer◦ P4 Xeon 2.2 GHz, 1 GB of RAM
Start adding…◦ 6x for capturing◦ 10x for replaying (unnecessary)◦ 45x offline Happens-Before Data Race Detection◦ 280x replay analysis
2,196 dynamic data races
Slowdowns
11
Potentially Benign
Potentially Harmful To-
talReal Benign Real Harmful Real Benign Real
HarmfulNo-
State-Change
32 0 32
State Change 15 2 17
Replay Failure 14 5 19
Total 32 0 29 7 68
Data Race Classifications
Impossible State
Impossible State
Automatically Classified
Manually Classified
All harmful races identified correctly0 false negatives
Half benign races identified correctly.
Half still persist
12
32 Real Benign races classified as such◦ Every instance
must return No-State-Changed
◦ The more instances, the more confidence in the classification
True Negatives
13
7 Real bugs, correctly identified
At least 1 State-Change or Replay Failure required
True PositivesDangerous Zone
14
29 Benign races incorrectly classified as harmful ◦ Approximate
Computation (23/29) Statistics etc
◦ Replayer Limitation (6/29) At least 1 instance
caused replay failure The final outcome is the
same
False Positives
15
User Constructed◦ Garbage collector does not
use locks Double Checks
◦ If (a) {lock(…); if(a) {…}} Both Values Valid
◦ Use cache? High Perf? Redundant Writes
◦ Rewrite the same value Disjoint bit manipulation
◦ Modify different bits in same variable
# Race
sUser Constructed Synchronization 8Double Checks 3Both Values Valid 5Redundant Writes 13Disjoint bit manipulation 9Approx. Computation 23
False Positives (cont.)
23 false positives that were not caused by replay failure
16
Interesting approach to identify benign races
It would be interesting to apply it to LockSet◦ LockSet has far more false positives◦ But it can detect bugs that did not happen in
production runs
A grand total overhead is missing
Conclusion
17
Questions?
Thank You!!!