Reverse Engineering State Machines by Interactive Grammar Inference Neil Walkinshaw, Kirill...
-
Upload
suzan-gardner -
Category
Documents
-
view
218 -
download
0
Transcript of Reverse Engineering State Machines by Interactive Grammar Inference Neil Walkinshaw, Kirill...
Reverse Engineering State Machines by Interactive Grammar Inference
Neil Walkinshaw, Kirill Bogdanov, Mike Holcombe, Sarah Salahuddin
State Machines
• Used to model software behaviour
load
exit
close
edit
save as
ok
Documentation
Inspection / review
Model-based testing
Model checking
State Machines
• Used to model software behaviour
load
exit
close
edit
save as
ok
Documentation
Inspection / review
Model-based testing
Model checking
• Only useful if complete and up-to-date• Usually not the case due to time constraints and software
evolution
Reverse Engineering State Machines
• Static analysis – analysis of source code– symbolic execution, flow analyses, ...– Inevitably considers executions that are infeasible in
practice• Dynamic analysis – infer model from sample
executions– Favoured for accuracy– States considered equal if subsequent trace is similar– Variants of the k-tails algorithm [Biermann, Feldman-
1972] most common reverse engineering algorithm
Traditional Approach• For any point in a trace, its k-tail is the
following sequence of k events or functions– Point x is considered equivalent to y if the k-tails are equal<load,edit,edit,edit,save_as,ok,edit,edit>
load edit edit save_as ok edit editedit
load edit edit save_as ok edit editedit
Traditional Approach• For any point in a trace, its k-tail is the
following sequence of k events or functions– Point x is considered equivalent to y if the k-tails are equal<load,edit,edit,edit,save_as,ok,edit,edit> K=2
load edit edit save_as ok edit editedit
Traditional Approach• For any point in a trace, its k-tail is the
following sequence of k events or functions– Point x is considered equivalent to y if the k-tails are equal<load,edit,edit,edit,save_as,ok,edit,edit> K=2
load edit save_as
edit
ok
load edit edit save_as ok edit editedit
Traditional Approach• For any point in a trace, its k-tail is the
following sequence of k events or functions– Point x is considered equivalent to y if the k-tails are equal<load,edit,edit,edit,save_as,ok,edit,edit> K=2
load edit save_as
edit
okRemove
Non determinism load save_as
edit
ok
Problems• Too expensive if result is to be correct and complete:– Need complete set of executions up to certain length– Passive – all executions need to be presented at once
• If provided traces only partial (probable for non-trivial system) the resulting model is untrustworthy– Difficult to tell how complete the model is – what’s
missing?
load save_as
edit
okload
exit
close
edit
save as
ok
Regular Grammar Inference
• Given a set of valid and (optionally) invalid sentences from a language, infer its grammar.
• Regular grammars can be represented as deterministic finite state machines
• Problem of regular grammar inference equivalent to that of reverse engineering state machines
• Several sophisticated grammar inference techniques– Effectively address many problems that arise with
current reverse-engineering approaches
Benefits of Adapting Grammar Inference Techniques
• Active techniques – Do not require set of executions to be presented at
once– Interact with an oracle to identify missing information
• More efficient– Can efficiently process large sample sets.
• Reasonably accurate given sparse sets of executions– More sophisticated heuristics to accurately identify
equivalent states
Query-Driven State Merging (QSM)
• Devised by Dupont et al. • Combines benefits mentioned on previous slide– Active, efficient, reasonably accurate for sparse sets of
sample executions• Guaranteed to produce correct machine if set of
sample executions is characteristic:– Must cover every transition in the target grammar– Enough positive and negative samples to differentiate
between different states (to prevent false merges)– Questions aim to elicit characteristic sample from oracle
Query-Driven State Merging (QSM)<load, close, exit><load, edit, edit, save_as, ok, close, exit><load, edit, edit, edit, close, exit>
load close
exit
editedit save_as ok close exit
edit
close exit
Generate “Prefix Tree Acceptor”
Query-Driven State Merging (QSM)
load closeexit
editedit save_as ok close exit
edit
close exit
Attempt mergeProduce questions (executions valid in this machine, but not in unmerged version)
<close,exit>?<edit,edit...>?<Load,load,close,exit>?
Query-Driven State Merging (QSM)Attempt mergeProduce questions (executions valid in this machine, but not in unmerged version)If all questions answered yes,
merge nodesElse
add negative questions to graph
load close
exit
editedit save_as ok close exit
edit
close exit
close, edit
ActiveEfficientAccepts negative information about model
Implementation• Use Eclipse TPTP to record traces– Sequence of method calls → <load,edit...>
• Questions can either be answered manually– OR as tests directly to the system– Can vary number of questions generated
• QSM component accepts simple text files of strings (prefixed with “+” and “-”)
Evaluation
• Used traces to generate JHotDraw case study– Described in paper
• Generated random state machines – Subject to certain constraints – minimal, deterministic
etc.– Three sets of 10 random machines (5, 25, 50 states)– Random paths over these machines = initial set of
traces– Measured accuracy of final machine, and number of
questions required
Current and Future Work
• Identify data constraints associated with states– Can use tools such as Daikon
• Automatically answer queries– Static analysis – using call graph analysis to
automatically propose negative / impossible executions
– Automated test generation• Heuristics – can certain questions be safely
ignored?
Conclusions
• Preliminary results show technique is reasonably accurate and efficient
• Can potentially be almost entirely automated– Automatically generates tests (questions), many of
which can be eliminated by static analysis anyway• Grammar Inference is useful source of ideas
for dynamic analysis and reverse engineering