Modeling Regular Replacement for String Constraints Solving
description
Transcript of Modeling Regular Replacement for String Constraints Solving
NFM 2010 1
Modeling Regular Replacement for String Constraints Solving
Xiang FuHofstra University
Chung-Chih LiIllinois State University
04/13/2010
Background
Hacker
Server
malicious scripts Cool page!
04/13/2010 NFM 2010 2
Problem? Lack of Sufficient Sanitation of Text
Inputs
NFM 2010 3
One Typical Error1 <?php2 $msg = $_POST[”msg”];3 $sanitized = pregreplace(4 ”/\< s c r i p t .*?\>.*?\<\/ s c r i p t .*?\ >/ i ”,5 ” ” , 6 $msg ) ;7 savetodb($sanitized )8 ?>
04/13/2010
<<script></script>script>alert(’a’)</script>Attacker’s Input
<script>alert(’a’)</script>
Reluctant Kleene Star
NFM 2010 4
Bigger PictureObjective: Automatic Discovery of
Vulnerabilities
04/13/2010
Symbolic
Execution
Test
Replayer
Bytecode
Attack Patter
n
StringConstrai
ntSolver
SUSHI
NFM 2010 5
Our ContributionAtomic Replacement ConstraintsConsider Two Semantics
GreedyReluctant
Modeling Using Finite State Transducer (FST)
Compact Representation of FSTSecurity Analysis
04/13/2010
NFM 2010 6
Finite State TransducerAccepts Regular RelationUnion, Concat,
CompositionIntersection, Complement
Used for Modeling Rewriting Rules [Kaplan94, Karttunen96]
04/13/2010
ε:11 2
34
a:2
b:3
A(ab,123) ∈ L(A)
NFM 2010 7
Hierarchical FST &Modeling Declarative Semantics
04/13/2010
Id(∑* - ∑* r ∑*) r : ω
ε:ε
Id(∑* - ∑* r ∑*)
1 2 34
Identical Relation
Any String not Containing
patter r
Goal:
rS
Regular Search Pattern
Replacement
baaaa
}bbbbb,b,{
NFM 2010 8
Modeling Reluctant Semantics
2 StepsMark the beginning of patternDo the replacement
04/13/2010
Goal:
rS
-ba
aaa }bbb{
Key: Left-Most Matching
NFM 201004/13/2010 9
a a b b c d a b c a b d
Input Worda+b+c x
Search Pattern
#: εreluc(r)#’ : ω
ε: ε
Id(∑)
f1
s1
s2
Begin Marker# a # a b b c d # a b c a b d
x d x a b d
NFM 2010 10
The Challenge: Begin Marker
04/13/2010
a a b b c d a b c a b d
Input Word
# # #
a+b+c xSearch Pattern
#
Look-ahead Capability?
Non-determinism
3 Steps:(1)End marker(2)Generic end
marker(3)Begin marker
NFM 2010 11
Preliminary End Marker
04/13/2010
1 c: c
5
2 3
4
b: b
a: aε:$ b : b
a: a
A1
a+b+c xSearch Pattern
Idea: Start with End Marker for Reverse of
Search Pattern
Problem: Input tape accepts cb+a+
only!
Reversed Patterncb+a+
NFM 2010 12
Generic End Marker
04/13/2010
11
22,1
33,1
44,1
55,1
c:c b:b a:a ε:$
b:ba:a
c:cc:c
a:a
b:b
c:c b:b
A2
cb+a+
Pattern
c c b a aInput Word
c c b a $ a $Output Word
Deterministic!
a:a
NFM 2010 13
Finally, the Begin Marker
04/13/2010
a+b+c xSearch Pattern
11
22,1
33,1
44,1
55,1
c:c b:b a:a ε:#
b:ba:a
c:c
c:c
a:a
b:b
c:c b:b
A3
0ε:ε
ε:εε:ε
NFM 201004/13/2010 14
a a b b c d a b c a b d
Input Worda+b+c x
Search Pattern
#: εreluc(r)#’ : ω
ε: ε
Id(∑)
f1
s1
s2
Begin Marker# a # a b b c d # a b c a b d
x d x a b d
NFM 2010 15
Greedy Semantics
04/13/2010
Goal:
rS
ba
aaa }b{
greedy
Challenge:
Look-ahead longest match
NFM 2010 1604/13/2010
Step 1: Begin Marker
Step 2: ND End Marker
Step 3: Pairing Markers
Step 4: Checking MatchStep 5: Check LongestStep 6: Replacement
a+ xSearch Pattern
aabab
#a#ab#ab
#a#a$b#ab#a$#a$b#a$b
#a$#a$b#a$b
#a#a$b#a$b
#aa$b#a$b
xbxb
#a#ab#a$b
#aaba$b
NFM 2010 17
ApplicationsSolve String Constraints
04/13/2010
''uname*. OR ).*''|']([ˆ''uname
''']16,0[''pwd AND'']16,0[ ...' HERESELECT...W' ''''''
yx
Login Servlet
Input: user nameAfter filtering single quote and length restriction
NFM 2010 18
Solving Atomic Constraint
04/13/2010
Goal:
P rS
A1 Id(P)
Project to Input Tape
Solution
NFM 2010 19
SUSHI Constraint SolverSolves Simple Linear String Constraints
(SISE)Relies on
dk.brics.automaton for FSA operationsSelf-made Java package for FST operations
Supports 16-bit UnicodeCompact Transition Representation
04/13/2010
Type I
Type II
Type III
(I,I) (II,I) (III,II)
NFM 2010 20
Efficiency of Solver
04/13/2010
Benchmark Equations
}2,2{},{
nnbxnnba
}2,2{},{
nnbxnnba
}2,2{},{* nnbxnnba
}2,2{},{* nnbxnnba
1
2
3
4
Login Servlet
1.4 Seconds on 2Ghz PC
Flex SDKXSS AttackEquation Size: 565
74 SecondsShorter than Security Track #1022748
NFM 2010 21
Related WorkForward String Analysis
Christensen & Møller [SAS’03]Wasserman & Su [PLDI’07, ICSE’08]Bjørner & Tillmann [TACAS’09]
Backward String AnalysisKiezun & Ganesh [ISSTA’09]Yu & Bultan [SPIN’08, ASE’09]Fu [COMPSAC’07, TAVWEB’08]
Natural Language Processing* Kaplan and Kay [CL’1994]
04/13/2010
Our Contribution:
Precise Modeling of
Various Regular Substitution Semantics
NFM 2010 22
LimitationsSISE String Constraints
All Variables Appear on LHS (Once)No Easy Solution for Equation System YetNo string length
Future DirectionsEncoding string length in automataFinite model on bit-vector
04/13/2010
NFM 2010 23
Questions?
04/13/2010