Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof....
-
Upload
warren-phillips -
Category
Documents
-
view
220 -
download
0
Transcript of Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof....
![Page 1: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/1.jpg)
Hardware Mechanisms for Distributed Dynamic
Software Analysis
Joseph L. Greathouse
Advisor: Prof. Todd Austin
May 10, 2012
![Page 2: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/2.jpg)
NIST: Software errors cost U.S. ~$60 billion/year
FBI: Security Issues cost U.S. $67 billion/year >⅓ from viruses, network intrusion, etc.
Software Errors Abound
2
![Page 3: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/3.jpg)
Example of a Modern Bug
3
Thread 2mylen=large
Thread 1mylen=small
ptr∅
Nov. 2010 OpenSSL Security FlawNov. 2010 OpenSSL Security Flaw
if(ptr == NULL) { len=thread_local->mylen; ptr=malloc(len); memcpy(ptr, data, len);}
![Page 4: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/4.jpg)
Example of a Modern Bug
4
if(ptr==NULL)if(ptr==NULL)
if(ptr==NULL)if(ptr==NULL)
memcpy(ptr, data2, len2)memcpy(ptr, data2, len2)
ptrLEAKED
Thread 2mylen=large
Thread 1mylen=small
∅
len2=thread_local->mylen;len2=thread_local->mylen;
ptr=malloc(len2);ptr=malloc(len2);
len1=thread_local->mylen;len1=thread_local->mylen;
ptr=malloc(len1);ptr=malloc(len1);
memcpy(ptr, data1, len1)memcpy(ptr, data1, len1)
![Page 5: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/5.jpg)
In spite of proposed hardware solutions
Hardware Plays a Role in this Problem
5
Hardware Data
Race RecordingDeterministic Execution/Replay
Atomicity Violation
DetectorsBug-Free Memory Models
Bulk Memory Commits
TRANSACTIONALMEMORY
![Page 6: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/6.jpg)
Taint Analysis
Dynamic Bounds Checking
Data Race Detection(e.g. Inspector XE)
Memory Checking(e.g. MemCheck)
Dynamic Software Analyses
6
2-200x2-200x
2-80x2-80x5-50x5-50x
2-300x2-300x
Analyze the program as it runs+ Find errors on any executed path– LARGE overheads, only test one path at a time
![Page 7: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/7.jpg)
Goals of this Thesis Allow high quality dynamic software analyses
Find difficult bugs that weaker analyses miss
Distribute the tests to large populations Must be low overhead or users will get angry
Sampling + Hardware to accomplished this Each user only tests a small part of the program Each test should be helped by hardware
7
![Page 8: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/8.jpg)
Meeting These Goals - Thesis Overview
8
Allow high quality dynamic software analyses
Allow high quality dynamic software analyses
Dat
aflo
wA
naly
sis Dataflow
Analysis
Data Race Detection
Software Support Hardware Support
Dat
a R
ace
Det
ectio
n
![Page 9: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/9.jpg)
Meeting These Goals - Thesis Overview
9
Dataflow Analysis Sampling (CGO’11)
Dataflow Analysis Sampling (CGO’11)
Dataflow Analysis Sampling
(MICRO’08)
Dataflow Analysis Sampling
(MICRO’08)Dat
aflo
wA
naly
sis
Dat
a R
ace
Det
ectio
n
Software Support Hardware Support
Unlimited Watchpoint
System (ASPLOS’12)
Unlimited Watchpoint
System (ASPLOS’12)
Hardware-Assisted Demand-DrivenRace Detection (ISCA’11)
Hardware-Assisted Demand-DrivenRace Detection (ISCA’11)
Distribute the tests + Sample the analyses
Distribute the tests + Sample the analyses
Tests must below overhead
Tests must below overhead
![Page 10: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/10.jpg)
Outline
Problem Statement
Distributed Dynamic Dataflow Analysis
Demand-Driven Data Race Detection
Unlimited Watchpoints
10
![Page 11: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/11.jpg)
Outline
Problem Statement
Distributed Dynamic Dataflow Analysis
Demand-Driven Data Race Detection
Unlimited Watchpoints
11
SWDataflowSampling
SWDataflowSampling
HW Dataflow Sampling
HW Dataflow SamplingWatc
hpoints
Watch
points
Demand-Driven Race DetectionDemand-Driven Race Detection
![Page 12: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/12.jpg)
Distributed Dynamic Dataflow Analysis
12
Split analysis across large populations Observe more runtime states Report problems developer never thought to test
Instrumented Program Potential
problems
![Page 13: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/13.jpg)
Taint Analysis(e.g.TaintCheck)
Dynamic Bounds Checking
Data Race Detection(e.g. Thread Analyzer)
Memory Checking(e.g. MemCheck)
The Problem: OVERHEADS
13
2-200x2-200x
2-80x2-80x5-50x5-50x
2-300x2-300x
Analyze the program as it runs+ System state, find errors on any executed path– LARGE runtime overheads, only test one path
![Page 14: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/14.jpg)
Current Options Limited
14
CompleteAnalysis
CompleteAnalysis
NoAnalysis
NoAnalysis
![Page 15: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/15.jpg)
Solution: Sampling
15
Lower overheads by skipping some analyses
CompleteAnalysis
CompleteAnalysis
NoAnalysis
NoAnalysis
![Page 16: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/16.jpg)
Fe
w U
sers
Ma
ny
Use
rs
Lower overheads mean more users
CompleteAnalysis
CompleteAnalysis
NoAnalysis
NoAnalysis
Sampling Allows Distribution
16
Developers
Beta Testers
End Users
Many users testing at little overhead see more errors than
one user at high overhead.
Many users testing at little overhead see more errors than
one user at high overhead.
![Page 17: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/17.jpg)
a += ya += y z = y * 75z = y * 75
y = x * 1024y = x * 1024 w = x + 42w = x + 42 Check wCheck w
Example Dynamic Dataflow Analysis
17
validate(x)validate(x)x = read_input()x = read_input() Clear
a += ya += y z = y * 75z = y * 75
y = x * 1024y = x * 1024
x = read_input()x = read_input()
Propagate
Associate
InputInput
Check aCheck a
Check zCheck z
Data
Meta-data
![Page 18: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/18.jpg)
Sampling must be aware of meta-data
Remove meta-data from skipped dataflows
Sampling Dataflows
18
![Page 19: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/19.jpg)
Dataflow Sampling
19
Sampling ToolSampling Tool
Analysis ToolAnalysis ToolApplicationApplication Analysis ToolAnalysis Tool
Meta-Data DetectionMeta-Data DetectionMeta-dataMeta-data
Clear meta-dataClear meta-dataOH ThresholdOH Threshold
![Page 20: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/20.jpg)
Finding Meta-Data
20
No additional overhead when no meta-data Needs hardware support
Take a fault when touching shadowed data Solution: Virtual Memory Watchpoints
V→P V→PFAULT
![Page 21: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/21.jpg)
Prototype Setup Xen+QEMU Taint analysis sampling system
Network packets untrusted
Performance Tests – Network Throughput Example: ssh_receive
Sampling Accuracy Tests Real-world Security Exploits
21
Xen HypervisorXen Hypervisor
OS and ApplicationsOS and Applications
AppApp AppApp AppApp…
LinuxLinux
ShadowPage Table
ShadowPage Table
Admin VMAdmin VM
Taint Analysis QEMU
Taint Analysis QEMU
Net StackNet
Stack OHMOHM
![Page 22: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/22.jpg)
ssh_receive
Performance of Dataflow Sampling
22
Throughput with no analysis
![Page 23: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/23.jpg)
Accuracy with Background Tasks ssh_receive running in background
23
![Page 24: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/24.jpg)
Outline
Problem Statement
Distributed Dynamic Dataflow Analysis
Demand-Driven Data Race Detection
Unlimited Watchpoints
24
SWDataflow Sampling
SWDataflow Sampling
HW Dataflow Sampling
HW Dataflow SamplingWatc
hpoints
Watchpoint
s
Demand-Driven Race DetectionDemand-Driven Race Detection
![Page 25: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/25.jpg)
Dynamic Data Race Detection
Add checks around every memory access
Find inter-thread sharing
Synchronization between write-shared accesses? No? Data race.
25
![Page 26: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/26.jpg)
SW Race Detection is Slow
26
Phoenix PARSEC
![Page 27: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/27.jpg)
Inter-thread Sharing is What’s Important
27
if(ptr==NULL)if(ptr==NULL)
if(ptr==NULL)if(ptr==NULL)
len1=thread_local->mylen;len1=thread_local->mylen;
ptr=malloc(len1);ptr=malloc(len1);
memcpy(ptr, data1, len1)memcpy(ptr, data1, len1)
len2=thread_local->mylen;len2=thread_local->mylen;
ptr=malloc(len2);ptr=malloc(len2);
memcpy(ptr, data2, len2)memcpy(ptr, data2, len2)
Thread-local dataNO SHARING
Shared dataNO INTER-THREAD SHARING EVENTS
![Page 28: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/28.jpg)
Very Little Dynamic Sharing
28
Phoenix PARSEC
![Page 29: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/29.jpg)
Run the Analysis On Demand
29
Multi-threadedApplication
Multi-threadedApplication
SoftwareRace Detector
SoftwareRace Detector
Local Access
Inter-thread sharing
Inter-thread sharing
Inter-thread Sharing MonitorInter-thread Sharing Monitor
![Page 30: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/30.jpg)
Finding Inter-thread Sharing Virtual Memory Watchpoints?
– ~100% of accesses cause page faults
Granularity Gap Per-process not per-thread
30
FAULTFAULT
Inter-Thread Sharing
![Page 31: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/31.jpg)
Hardware Sharing Detector HITM in Cache Memory: W→R Data Sharing
Hardware Performance Counters
31
Read YRead YSS
IISS
IIHITM
Core 1 Core 2
PipelinePipeline
CacheCache
00
00
00
00
Perf. Ctrs
11
11-1-1 FAULT
Write Y=5Write Y=5 MMY=5Y=5
![Page 32: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/32.jpg)
Potential Accuracy & Perf. Problems Limitations of Performance Counters
Intel HITM only finds W→R Data Sharing
Limitations of Cache Events SMT sharing can’t be counted Cache eviction causes missed events
Events go through the kernel
32
![Page 33: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/33.jpg)
On-Demand Analysis on Real HW
33
Execute Instruction
SW Race Detection
Enable Analysis
Disable Analysis
HITMInterrupt?
HITMInterrupt?
Sharing Recently?
AnalysisEnabled?
NO
NO
NOYES
YES
YES
> 97%
< 3%
![Page 34: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/34.jpg)
Performance Increases
34
Phoenix PARSEC
51x51x
Accuracy vs. Continuous Analysis:
97%
Accuracy vs. Continuous Analysis:
97%
![Page 35: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/35.jpg)
Outline
Problem Statement
Distributed Dynamic Dataflow Analysis
Demand-Driven Data Race Detection
Unlimited Watchpoints
35
SWDataflow Sampling
SWDataflow Sampling
HW Dataflow Sampling
HW Dataflow SamplingWatc
hpoints
Watchpoint
s
Demand-Driven Race DetectionDemand-Driven Race Detection
![Page 36: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/36.jpg)
Bounds Checking
Watchpoints Work for Many Analyses
36
Taint Analysis
Data Race Detection
Deterministic Execution
Transactional Memory
Speculative Parallelization
![Page 37: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/37.jpg)
Desired Watchpoint Capabilities Large Number
Store in memory Cache on chip
Fine-grained Watch full VA
Per Thread Cached per HW thread
Ranges Range Cache
37
V W X YZ???
WP Fault False Fault
False Faults
![Page 38: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/38.jpg)
Range Cache
38
0x0 0xffff_ffff Not Watched
Start Address End Address Watchpoint?
1
0
0
Valid
Set Addresses 0x5 – 0x2000R-Watched
Set Addresses 0x5 – 0x2000R-Watched
0x4
0x20000x5 R Watched 1
0x2001 0xffff_ffff Not Watched 1
Load Address 0x400
Load Address 0x400
≤ 0x400? ≥ 0x400?
R Watched
WPInterrupt
![Page 39: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/39.jpg)
Memory
Watchpoint System Design
39
Store Ranges in Main Memory Per-Thread Ranges, Per-Core Range Cache Software Handler on RC miss or overflow Write-back RC works as a write filter Precise, user-level watchpoint faults
T1 Memory T2 Memory Core 1 Core 2
WP Changes
![Page 40: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/40.jpg)
Experimental Evaluation Setup
Trace-based timing simulator using Pin
Taint analysis on SPEC INT2000 Race Detection on Phoenix and PARSEC
Comparing only shadow value checks
40
![Page 41: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/41.jpg)
Watchpoint-Based Taint Analysis
41
19x10x 30x 206x 423x 23x1429x
128 entry RC –or– 64 entry RC + 2KB Bitmap
20%Slowdown
28x
RC+Bitmap
![Page 42: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/42.jpg)
Watchpoint-Based Data Race Detection
42
+10% +20%
RC+Bitmap
![Page 43: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/43.jpg)
Future Directions Dataflow Tests find bugs on executed code
What about code that is never executed?
Sampling + Demand-Driven Race Detection Good synergy between the two, like taint analysis
Further watchpoint hardware studies: Clear microarchitectural analysis More software systems, different algorithms
43
![Page 44: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/44.jpg)
SoftwareDataflow Analysis Sampling
SoftwareDataflow Analysis Sampling
Conclusions Sampling allows distributed dataflow analysis
Existing hardware can speed up race detection
Watchpoint hardware useful everywhere
44
HardwareDataflow Analysis Sampling
HardwareDataflow Analysis Sampling
Unlimited Watchpoint System
Unlimited Watchpoint System
Hardware-Assisted Demand-DrivenData Race Detection
Hardware-Assisted Demand-DrivenData Race Detection
Distributed DynamicSoftware Analysis
Distributed DynamicSoftware Analysis
![Page 45: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/45.jpg)
45
Thank You
![Page 46: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/46.jpg)
BACKUP SLIDES
46
![Page 47: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/47.jpg)
Finding Errors Brute Force
Code review, fuzz testing,whitehat/grayhat hackers
– Time-consuming, difficult
Static Analysis Automatically analyze source,
formal reasoning, compiler checks– Intractable, requires expert input,
no system state
47
![Page 48: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/48.jpg)
Dynamic Dataflow Analysis
Associate meta-data with program values
Propagate/Clear meta-data while executing
Check meta-data for safety & correctness
Forms dataflows of meta/shadow information
48
![Page 49: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/49.jpg)
Demand-Driven Dataflow Analysis
49
Only Analyze Shadowed Data
NativeApplication
NativeApplication
InstrumentedApplication
InstrumentedApplication
Meta-Data DetectionMeta-Data Detection
Non-Shadowed
Data
Shadowed Data
Shadowed Data
![Page 50: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/50.jpg)
Results by Ho et al. lmbench Best Case Results:
Results when everything is tainted:
50
System Slowdown
Taint Analysis 101.7x
On-Demand Taint Analysis 1.98x
![Page 51: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/51.jpg)
CompleteAnalysis
CompleteAnalysis
NoAnalysis
NoAnalysis
Sampling Allows Distribution
51
Developer
Beta TestersEnd Users
Many users testing at little overhead see more errors than
one user at high overhead.
Many users testing at little overhead see more errors than
one user at high overhead.
![Page 52: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/52.jpg)
a += ya += y z = y * 75z = y * 75
y = x * 1024y = x * 1024 w = x + 42w = x + 42
Cannot Naïvely Sample Code
52
Validate(x)Validate(x)x = read_input()x = read_input()
a += ya += y
y = x * 1024y = x * 1024
False PositiveFalse Positive
False NegativeFalse Negative
w = x + 42w = x + 42
validate(x)validate(x)x = read_input()x = read_input() Skip Instr.
InputInput
Check wCheck w
Check zCheck zSkip Instr.
Check aCheck a
![Page 53: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/53.jpg)
a += ya += y z = y * 75z = y * 75
y = x * 1024y = x * 1024 w = x + 42w = x + 42
Dataflow Sampling Example
53
validate(x)validate(x)x = read_input()x = read_input()
a += ya += y
y = x * 1024y = x * 1024
False NegativeFalse Negative
x = read_input()x = read_input()Skip Dataflow
InputInput
Check wCheck w
Check zCheck z
Skip Dataflow
Check aCheck a
![Page 54: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/54.jpg)
Benchmarks Performance – Network Throughput
Example: ssh_receive Accuracy of Sampling Analysis
Real-world Security Exploits
54
Name Error Description
Apache Stack overflow in Apache Tomcat JK Connector
Eggdrop Stack overflow in Eggdrop IRC bot
Lynx Stack overflow in Lynx web browser
ProFTPD Heap smashing attack on ProFTPD Server
Squid Heap smashing attack on Squid proxy server
![Page 55: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/55.jpg)
netcat_receive
Performance of Dataflow Sampling (2)
55
Throughput with no analysis
![Page 56: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/56.jpg)
ssh_transmit
Performance of Dataflow Sampling (3)
56
Throughput with no analysis
![Page 57: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/57.jpg)
Accuracy at Very Low Overhead Max time in analysis: 1% every 10 seconds Always stop analysis after threshold
Lowest probability of detecting exploits
57
Name Chance of Detecting Exploit
Apache 100%
Eggdrop 100%
Lynx 100%
ProFTPD 100%
Squid 100%
![Page 58: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/58.jpg)
netcat_receive running with benchmark
Accuracy with Background Tasks
58
![Page 59: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/59.jpg)
Outline Problem Statement
Proposed Solutions Distributed Dynamic Dataflow Analysis Testudo: Hardware-Based Dataflow Sampling Demand-Driven Data Race Detection
Future Work
Timeline
59
HW Dataflow Sampling
HW Dataflow SamplingWatc
hpoints
Watchpoint
s
DataflowSamplingDataflowSampling
Demand-Driven Race DetectionDemand-Driven Race Detection
![Page 60: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/60.jpg)
Virtual Memory Not Ideal
60
FAULTFAULT netcat_receive
![Page 61: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/61.jpg)
Word Accurate Meta-Data
What happens when the cache overflows? Increase the size of main memory? Store into virtual memory?
Use Sampling to Throw Away Data
61
Data CacheData Cache
Word AccurateMeta Data CacheWord Accurate
Meta Data Cache
PipelinePipeline
![Page 62: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/62.jpg)
On-Chip Sampling Mechanism
62
Avg
. #
of e
xecu
tions
512-entry cache 1024-entry cache17601
![Page 63: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/63.jpg)
Useful for Scaling to Complex AnalysesIf each shadow operation uses 1000 instructions:
63
Av
era
ge
% O
ve
rhea
d
1024-entry Sample Cache
Av
era
ge
% O
ve
rhea
d
telnet server benchmark
% %
3500
executions
17.3%
0.3%
169,000
executions
![Page 64: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/64.jpg)
Thread 2mylen=large
Thread 1mylen=small
if(ptr==NULL)if(ptr==NULL)
if(ptr==NULL)if(ptr==NULL)
len1=thread_local->mylen;len1=thread_local->mylen;
ptr=malloc(len1);ptr=malloc(len1);
memcpy(ptr, data1, len1)memcpy(ptr, data1, len1)
len2=thread_local->mylen;len2=thread_local->mylen;
ptr=malloc(len2);ptr=malloc(len2);
memcpy(ptr, data2, len2)memcpy(ptr, data2, len2)
Example of Data Race Detection
64
ptr write-shared?ptr write-shared?Interleaved Synchronization?
Interleaved Synchronization?
![Page 65: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/65.jpg)
Demand-Driven Analysis Algorithm
65
![Page 66: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/66.jpg)
Demand-Driven Analysis on Real HW
66
![Page 67: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/67.jpg)
Performance Difference
67
Phoenix PARSEC
![Page 68: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/68.jpg)
Demand-Driven Analysis Accuracy
68
1/11/1 2/42/4 3/33/3 4/44/4 3/33/3 4/44/4 4/44/42/42/4
Accuracy vs. Continuous Analysis:
97%
Accuracy vs. Continuous Analysis:
97%
![Page 69: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/69.jpg)
Accuracy on Real Hardware
69
kmeans facesim ferret freqmine vips x264 streamcluster
W→W 1/1 (100%)
0/1(0%)
- - 1/1 (100%)
- 1/1(100%)
R→W - 0/1(0%)
2/2 (100%)
2/2 (100%)
1/1 (100%)
3/3 (100%)
1/1(100%)
W→R - 2/2 (100%)
1/1 (100%)
2/2 (100%)
1/1 (100%)
3/3/ (100%)
1/1(100%)
SpiderMonkey-0
SpiderMonkey-1
SpiderMonkey-2
NSPR-1 Memcached-1 Apache-1
W→W 9/9(100%)
1/1(100%)
1/1(100%)
3/3(100%)
- 1/1(100%)
R→W 3/3(100%)
- 1/1 (100%)
1/1(100%)
1/1(100%)
7/7(100%)
W→R 8/8 (100%)
1/1(100%)
2/2 (100%)
4/4(100%)
- 2/2(100%)
kmeans facesim ferret freqmine vips x264 streamcluster
W→W 1/1 (100%)
0/1(0%)
- - 1/1 (100%)
- 1/1(100%)
R→W - 0/1(0%)
2/2 (100%)
2/2 (100%)
1/1 (100%)
3/3 (100%)
1/1(100%)
W→R - 2/2 (100%)
1/1 (100%)
2/2 (100%)
1/1 (100%)
3/3/ (100%)
1/1(100%)
![Page 70: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/70.jpg)
HW Interrupt when touching watched data
SW knows it’s touching important data AT NO OVERHEAD
Normally used for debugging
Hardware-Assisted Watchpoints
70
0 1 2 3 4 5 6 7
A B C D E F G H
LD 2R-Watch 2-4
DC E
WR X→7
X
W-Watch 6-7
G X
![Page 71: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/71.jpg)
Existing Watchpoint Solutions Watchpoint Registers
– Limited number (4-16), small reach (4-8 bytes)
Virtual Memory– Coarse-grained, per-process, only aligned ranges
ECC Mangling– Per physical address, all cores, no ranges
71
![Page 72: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/72.jpg)
Meeting These Requirements Unlimited Number of Watchpoints
Store in memory, cache on chip Fine-Grained
Watch full virtual addresses Per-Thread
Watchpoints cached per core/thread TID Registers
Ranges Range Cache
72
![Page 73: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/73.jpg)
The Need for Many Small Ranges Some watchpoints better suited for ranges
32b Addresses: 2 ranges x 64b each = 16B Some need large # of small watchpoints
51 ranges x 64b each = 408B Better stored as bitmap? 51 bits!
Taint analysis has good ranges Byte-accurate race detection does not..
73
![Page 74: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/74.jpg)
Watchpoint System Design II Make some RC entries point to bitmaps
74
-
Start Addr End Addr Pointer toWP Bitmap
R
-
W
1
V
1
B
Memory CoreRanges Bitmaps Range Cache Bitmap Cache
Accessed in ParallelAccessed in Parallel
![Page 75: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/75.jpg)
Watchpoint-Based Taint Analysis
75
19x10x 30x 206x 423x 23x1429x
128 entry Range Cache
20%Slowdown
28x
![Page 76: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062309/5697bfc51a28abf838ca6b76/html5/thumbnails/76.jpg)
Width Test
76