Mining Windows Kernel API Rules
Jinlin Yang
09/28/2005 CS696
09/28/2005 Jinlin Yang, CS696 2
My Background
• Bounded exhaustive testing, 09/2001-01/2004– D. Coppit, J. Yang, S. Khurshid, W. Le, and K. Sullivan. Software Assurance by Bounded
Exhaustive Testing. IEEE Transactions on Software Engineering. April 2005
– K. Sullivan, J. Yang, D. Coppit, S. Khurshid, and D. Jackson. Software Assurance by Bounded Exhaustive Testing. ISSTA ‘04
• Temporal properties inference, 01/2004-present– J. Yang and D. Evans. Dynamically Inferring Temporal Properties. PASTE ’04
– J. Yang and D. Evans. Automatically Inferring Temporal Properties for Program Evolution. ISSRE ’04
– J. Yang and D. Evans. Automatically Discovering Temporal Properties for Program Verification. Submitted to FMSD
– J. Yang, D. Evans, D. Bhardwah, T. Bhat, and M. Das. Terracotta: Mining Temporal API Rules from Imperfect Traces. Submitted to ICSE ‘06
09/28/2005 Jinlin Yang, CS696 3
Overview
• Problem: unavailability of specification is a big issue in defect detection
• Solution: automatically inferring specification from execution traces
• Benefits: better understanding of legacy code and opportunity to find more defects– Experiments on finding kernel API rules– Found one previously unknown bug in Windows– Found interesting properties that should have been checked
09/28/2005 Jinlin Yang, CS696 4
Problem
• Defect detection technique• Generic properties
– E.g. pointer and buffer usage– PREfix [Bush et al, SP&E00], PREfast– Very effective
• Application specific properties– E.g. lock/unlock, resource creation/deletion– SLAM/SDV [Ball et al, SPIN01], ESP [Das et al, PLDI02]
• Where do we get such properties?
09/28/2005 Jinlin Yang, CS696 5
My Approach
ProgramInstrumented
Program
Instrumentation
Test Suite
ExecutionTraces
Running
Inferred Properties
PropertyTemplates
Inference
Post-processing
Report
J. Yang and D. Evans. Dynamically inferring temporal properties. PASTE ‘04.
09/28/2005 Jinlin Yang, CS696 6
An Example
• Alternating template
(PS)*, P≠S. P and S are placeholders
Lock::acq Lock::rel Lock::acq Lock::rel
P=Lock::acq and S=Lock::rel
P=Lock::rel and S=Lock::acq
PSPS
SPSP
Lock::acqLock::rel
Lock::relLock::acq
09/28/2005 Jinlin Yang, CS696 7
Implementation
• Terracotta– Inference engine– Context-aware trace analysis– Heuristics for prioritizing and presenting
properties
• Performance linear to length of trace and number of distinct events
• More information
http://www.cs.virginia.edu/terracotta
09/28/2005 Jinlin Yang, CS696 8
Lessons
• Missing interesting properties– Original algorithm requires 100% satisfaction
• Real world is never perfect – Trace collected by sampling– Object information unavailable – Imperfect programs
• Can we develop better inference to handle this?
• Too many noises in results– Interesting properties are buried in a group of uninteresting ones
• Can we develop heuristics to select interesting ones?
09/28/2005 Jinlin Yang, CS696 9
Refinement of Inference
• How to detect interesting properties in face of imperfect traces?
• Example– PS PS PS PS PS PS PS PS PS PPP– The dominant behavior is P and S alternate– 10 subtraces, 90% satisfy Alternating
09/28/2005 Jinlin Yang, CS696 10
Refinement of Inference (2)
• How to pick out interesting properties?
• Which one is more likely to be interesting?– Heuristics: CD is often more interesting– Compute call graph for windows binaries– Keep AB if B is not reachable from A
void A(){ ... B(); ...}
Case 1
void x(){ C(); ... D();}
Case 2
void KeSetTimer(){ KeSetTimerEx();}
void x(){ ExAcquireFastMutexUnsafe(&m); ... ExReleaseFastMutexUnsafe(&m);}
09/28/2005 Jinlin Yang, CS696 11
Refinement of Inference (3)
• Heuristics: the more similar two events are, the more likely that the properties is interesting
• Relative edit distance between A and B– Partition A and B into words
– A has wA words, B has wB, w common words
–
• For example:– Ke Acquire In Stack Queued Spin Lock
Ke Release In Stack Queued Spin Lock– Similarity = 85.7%
wwdistBA
AB
w
2
09/28/2005 Jinlin Yang, CS696 12
Results: Kernel
• Approximation– PAL threshold = 0.90
– 7611 properties
• Call-graph and edit distance based reduction– Use the call-graph of ntoskrnl.exe, edit dist > 0.5– 142 properties. 53 times reduction!– Small enough for manual inspection
• 56 apparently interesting properties (40%)– Locking discipline– Resource allocation and deletion
09/28/2005 Jinlin Yang, CS696 13
Result: Kernel (2)
• Found interesting properties that should be checked– Several types of kernel SpinLock– The Static Device Verifier should have checked them
• ESP found one previously unknown bug in ntfs.sys – Double-acquire of FastMutex– Confirmed and fixed by the responsible developers
M. Das, S. Lerner, and M. Seigle. ESP: Path-Sensitive Program Verification in Polynomial Time. PLDI ‘02
Static Driver Verifier: Finding Bugs in Device Drivers at Compile-Time. WinHEC, April 2004.
09/28/2005 Jinlin Yang, CS696 14
Summary of Experiments
• We inferred interesting rules about kernel APIs!– SDV already encodes some propertieshttp://download.microsoft.com/download/5/b/5/5b5bec17-ea71-4653-9539-204a672f11cf/SDV-intro.doc
– We inferred undocumented ones too
• Inference scales well to realistic traces• Approximation is effective in tolerating imperfect traces
and detect dominant patterns• Call-graph and edit distance based reduction is very
effective• Check with defect detection tool is promising• Other experiments: Vulcan APIs, Daisy file system
09/28/2005 Jinlin Yang, CS696 15
Conclusion
• Constructing interesting properties is important and difficult
• Automatic inference from execution traces is light-weight and effective
• Practical values– Helping developers understand legacy code– Giving us opportunity of leveraging sophisticated static analysis
tools to find application specific defects
09/28/2005 Jinlin Yang, CS696 16
Q & A
• For more information
http://www.cs.virginia.edu/terracotta
• Great collaborators– UVa
David Evans, Ed Mitchell
– Microsoft
Stephen Adams,
Deepali Bhardwaj,
Thirumalesh Bhat,
Manuvir Das,
Damian Hasse,
Marne Staples, Rick Vicik,
Jason Yang, Zhe Yang
Top Related