SWAT Memory Leak Detection
description
Transcript of SWAT Memory Leak Detection
1
SWATMemory Leak Detection
Matthias Hauswirth
2
Agenda
Approaches to memory leak detection SWAT infrastructure Heap model Staleness predicates Leak analysis tool
3
Memory Leaks
time
object1
alloc freeaccess
4
Memory Leaks
time
object1
alloc access
alloc freeaccess
shutdown
object2
5
Memory Leaks
time
object1
alloc access
alloc access
reachable unreachable
alloc freeaccess
shutdown
object2
object3
6
Approaches to Leak Detection
Survivors Objects surviving until program termination
Unreachables Objects unreachable at snapshot (GC)
Stales Objects not recently accessed at snapshot
(SWAT)
7
Survivors: Guess
time
o1
o2
startup shutdown
o3
o4
o5
leak
leak
leak
leak
-
8
Survivors: Reality
time
o1
o2
startup shutdown
o3
o4
o5
leak ?
leak
leak ?
leak
-
9
Unreachables: Guess
time
o1
o2
startup shutdown
o3
o4
o5
leak
snapshot
-
alive
-
alive
10
Unreachables: Reality
time
o1
o2
startup shutdown
o3
o4
o5
alive ?
snapshot
leak
-
alive
-
11
Stales (SWAT): Guess
time
o1
o2
startup shutdown
o3
o4
o5
leak
snapshot
leak
-
-
alive
12
Stales (SWAT): Reality
time
o1
o2
startup shutdown
o3
o4
o5
snapshot
-
-
alive
leak
leak
13
SWAT Infrastructure
instrument
winword.exe
winword.swat.exe
runswatruntime.dll
source info
postprocess
snapshots
statistics
view
settings
14
Instrument
proc1
comp1
15
Bursty Tracing:Duplicate Basic Blocks
proc1 prof$proc1
comp1
16
Bursty Tracing:Insert Dispatch Checks
proc1 prof$proc1
comp1
17
Instrumentation:Patch Allocations & Frees
xalloc XallocWrapper
comp1 swatruntime.dll
18
Instrumentation:Instrument Loads & Stores
proc1 prof$proc1
comp1
RecordReference
swatruntime.dll
19
Bursty TracingDispatch Check
DecOrig
StayOrig
OrigTgt
OrigZero
DecProf
StayProfStartOrig StartProf
ProfTgt
OrigSrc ProfSrcGlobal Counters:cOrig # of StayOrigcProf # of StayProf
cOrig==1
cOrig>1 cProf==0cProf==1 cProf>1
20
Adaptive Bursty Tracing
Bursty tracing Sampling rate influences results Rate chosen at runtime
Adaptive bursty tracing Different sampling rate by dispatch check point Start at high rate Wait until average gets down to requested rate Start rate, delta & target rate chosen at runtime
21
Why Adaptive Bursty Tracing?
Skewed Code Coverage
0
0.5
1
1.5
2
2.5
Bil
lio
ns
Dispatch Check (sorted by # executions, top 30 of 1200)
# e
xe
cu
tio
ns
22
Adaptive Bursty TracingDispatch Check
DecOrig
StayOrig
OrigTgt
OrigZero
DecProf
StayProfStartOrig StartProf
ProfTgt
OrigSrc ProfSrc Per-Dispatch Check Counter: cOrig[dcid] # of StayOrigGlobal Counter:cProf # of StayProf
cOrig[dcid]==1
cOrig[dcid]>1 cProf==0cProf==1 cProf>1
dcid
23
Effect of Adaptive Bursty Tracing on Coverage
Boosting Coverage of Rare Dispatch Checks
-50%
-25%
0%
25%
50%
75%
100%
125%
150%
Dispatch Checks (sorted # executions, top 30 of 1200)
% m
ore
# p
rofi
led
ex
ec
uti
on
s w
ith
ad
ap
tiv
e b
urs
ty t
rac
ing
24
SWAT Heap Model
Requirements AllocateObject(eip, startAddress, size) FreeObject(eip, startAddress) FindObject(eip, address) GetObjectIterator()
Implementations Hash table (address→objectInfo) Hash table (startAddress→objectInfo)
Hash table (address→offsetToStartAddress) Address tree
25
SWAT Heap Model0
0
0
0
0
0 00
0 0 0 0 0 0 0
0000 10000100 1100
1
1
1
1
1
1 11
1 1 1 1 1 1 1
Address: 0101
0101
26
SWAT Heap Model0
0
0
0
0
0 00
0 0 0 0 0 0 0
0000 10000100 1100
1
1
1
1
1
1 11
1 1 1 1 1 1 1
8 byte0101
27
SWAT Heap Model0
0
0
0
0
0 00
0 0 0 0 0 0 0
0000 10000100 1100
1
1
1
1
1
1 11
1 1 1 1 1 1 1
28
SWAT Heap Model0
0 0
0 00
0 0 0 0 0
0000 10000100 1100
1
1 1
1 11
1 1 1 1 1
29
SWAT Heap Model0
0 0
0 00
0 0 0 0 0
0000 10000100 1100
1
1 1
1 11
1 1 1 1 1
30
SWAT Heap Model0
0 0
00
0 0
0000 10000100 1100
1
1 1
11
1 1
31
SWAT Heap Model0
0 0
00
0 0
1
1 1
11
1 1
Start address: 0101Size: 8Access count: 19Last access time: 19’000’000Alloc site: EIP 0x400019Last access site: EIP 0x400190
32
SWAT Heap Model
Space Overhead Address Tree Nodes
0.03 … 0.35 allocated node bytes / allocated byte Overall
0.12 … 3.4 times the allocated memory
Time FindObject(eip, address)
Log(addressSpaceSize) --- (32 bits = 32 nodes)
33
Evaluation: Time Overhead
0
1000
2000
3000
4000
5000
6000
7000
8000b
0.1
%
a 0
.1%
b 1
%
a 1
%
b 1
0%
a 1
0%
b 0
.1%
a 0
.1%
b 1
%
a 1
%
b 1
0%
a 1
0%
twolf vpr
Sec
on
ds
Uninstrumented
Excluding Snapshots
Total
Leak (All)
Benchmark Config
Data
34
active
Staleness Predicates
Stale = object not needed anymore Stale, if…
Never accessed
Idle time > t
Idle time > n * active time
idle
n*active
t
idle
35
Evaluation
Inject leaks Randomly, at runtime, decide not to execute a free
Variables Sampling rate Adaptive or bursty Predicate
Measurement results per snapshot List of objects assumed leaked
Some true, some false List of objects assumed alive
Some true, some false
36
Comparing Predicates
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
50%Id
le>
10
0*A
ctiv
e
Idle
>1
0*A
ctiv
e
Idle
>1
*Act
ive
Idle
>1
00
0M
io
Idle
>1
00
Mio
Idle
>1
0M
io
Error in Leak ListError in Alive List
benchmark (All) leakage (All) # (All) config (All)
predicate
Data
37
Comparing Sampling Rates
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
b 0.1% a 0.1% b 1% a 1% b 10% a 10%
Error in Leak ListError in Alive List
benchmark (All) leakage (All) # (All) predicate (All)
config
Data
38
Lucky Omission Effect
maxIdleTime
time [# actual references]
Injected Leak
QuestionAt time of snapshot, is object a leak?
snapshot
39
Lucky Omission Effect
maxIdleTime
Low sampling rate
time [# actual references] snapshot
40
Lucky Omission Effect
maxIdleTime
Low sampling rate assumed leaked: true
time [# actual references] snapshot
41
Lucky Omission Effect
maxIdleTime
Low sampling rate
High sampling rate
assumed leaked: true
time [# actual references] snapshot
42
Lucky Omission Effect
maxIdleTime
Low sampling rate
High sampling rate
assumed leaked: true
assumed alive: false
time [# actual references] snapshot
43
Lucky Omission Effect
lucky omission window
maxIdleTime
Low sampling rate
High sampling rate
assumed leaked: true
assumed alive: false
time [# actual references] snapshot
44
Mitigation ofLucky Omission Effect
Reduce chance of leak happening during maxIdleTime snapshotInterval >> maxIdleTime
maxIdleTime
time [# actual references]
maxIdleTime
snapshotInterval
snapshotsnapshot
45
Practical Sampling Rates &Useful Predicates
0%
5%
10%
15%
20%
25%
30%
a 1% b 1% a 1% b 1%
Idle>1*Active Idle>1000Mio
Error in Leak List
Error in Alive List
benchmark (All) leakage (All) # (All)
predicate config
Data
46
Leak Analysis Tool
47
Ranking
Sort <alloc site, last access site> pairs
Old rankings: # of stale objects [currently used]
# of stale bytes Drag caused by stale objects (bytes*idle time)
New ranking: # of predicates declaring an object stale
48
Conclusions
Many ways to leak detection Predicting leaks by looking at past events:
Important objects might never be used (boxsim) Lots of stale objects might indicate a space-
inefficient algorithm Leak Analysis Tool
Made it easy to find several statically injected leaks
49
Future Work
Currently: Store source info compactly (at instrumentation time) Snapshots at runtime don’t use source info Post process snapshots to add source info
This week: Rank leaks Update Leak Analysis Tool to use ranking Run new version on winword.exe and mshtml.dll
Later: Combine “Unreachables” with “Stales” approach