Datalog-Based Program Analysis
Transcript of Datalog-Based Program Analysis
1. Motivation2. Introduction to Datalog3. Pointer Analysis via Datalog4. Taint Analysis via Datalog
Tian Tan @ Nanjing University 2
Contents
Contents
1. Motivation2. Introduction to Datalog3. Pointer Analysis via Datalog4. Taint Analysis via Datalog
Tian Tan @ Nanjing University 3
Imperative vs Declarative
Goal: select adults from a set of personsโข Imperative: how to do (~implementation)
Tian Tan @ Nanjing University 5
Set<Person> selectAdults(Set<Person> persons) {Set<Person> result = new HashSet<>();for (Person person : persons)if (person.getAge() >= 18)result.add(person);
return result;}
Imperative vs Declarative
Goal: select adults from a set of personsโข Imperative: how to do (~implementation)
โข Declarative: what to do (~specification)
Tian Tan @ Nanjing University 6
SELECT * FROM Persons WHERE Age >= 18;
Set<Person> selectAdults(Set<Person> persons) {Set<Person> result = new HashSet<>();for (Person person : persons)if (person.getAge() >= 18)result.add(person);
return result;}
How to Implement Program Analyses?
Tian Tan @ Nanjing University 7
Kind Statement Rule
New i: x = new T() ๐๐๐๐ โ ๐๐๐๐(๐ฅ๐ฅ)
Assign x = y ๐๐๐๐ โ ๐๐๐๐(๐ฆ๐ฆ)๐๐๐๐ โ ๐๐๐๐(๐ฅ๐ฅ)
Store x.f = y ๐๐๐๐ โ ๐๐๐๐ ๐ฅ๐ฅ , ๐๐๐๐ โ ๐๐๐๐ ๐ฆ๐ฆ๐๐๐๐ โ ๐๐๐๐(๐๐๐๐ .๐๐)
Load y = x.f ๐๐๐๐ โ ๐๐๐๐ ๐ฅ๐ฅ , ๐๐๐๐ โ ๐๐๐๐ ๐๐๐๐ .๐๐๐๐๐๐ โ ๐๐๐๐(๐ฆ๐ฆ)
Call l: r = x.k(a1,โฆ,an)
๐๐๐๐ โ ๐๐๐๐ ๐ฅ๐ฅ , ๐๐ = Dispatch(๐๐๐๐ , k)๐๐๐ข๐ข โ ๐๐๐๐ ๐๐๐๐ , 1 โค ๐๐ โค ๐๐
๐๐๐ฃ๐ฃ โ ๐๐๐๐(๐๐๐๐๐๐๐๐)๐๐๐๐ โ ๐๐๐๐(๐๐๐๐๐ก๐๐๐ก๐ก)
๐๐๐ข๐ข โ ๐๐๐๐ ๐๐๐๐๐๐ , 1 โค ๐๐ โค ๐๐๐๐๐ฃ๐ฃ โ ๐๐๐๐(๐๐)
Specification
Pointer Analysis, Imperative Implementation
Tian Tan @ Nanjing University 8
Solve(๐๐๐๐๐๐๐๐๐๐๐๐)WL=[ ],PFG={},S={},RM={},CG={}AddReachable(๐๐๐๐๐๐๐๐๐๐๐๐)while WL is not empty do
remove ๐๐, ๐๐๐๐๐๐ from WLฮ = pts โ pt(n)Propagate(n, ฮ)if n represents a variable x then
foreach ๐๐๐๐ โ ฮ doforeach x.f = y โ ๐๐ do
AddEdge(y, ๐๐๐๐ . ๐๐)foreach y = x.f โ ๐๐ do
AddEdge(๐๐๐๐ . ๐๐, y)ProcessCall(x, ๐๐๐๐)
AddReachable(m)if m โ RM then
add m to RM๐๐ โช= ๐๐๐๐foreach i: x = new T() โ ๐๐๐๐ do
add ๐ฅ๐ฅ, {๐๐๐๐} to WLforeach x = y โ ๐๐๐๐ do
AddEdge(y, x)
AddEdge(s, t)if s โ t โ PFG then
add s โ t to PFGif ๐๐๐๐(๐๐) is not empty then
add ๐๐, ๐๐๐๐(๐๐) to WL
ProcessCall(x, ๐๐๐๐)foreach l: r = x.k(a1,โฆ,an) โ ๐๐ do๐๐ = Dispatch(๐๐๐๐, k)add ๐๐๐๐๐ก๐๐๐ก๐ก, {๐๐๐๐} to WLif l โ ๐๐ โ CG then
add l โ ๐๐ to CGAddReachable(๐๐)foreach parameter ๐๐๐๐ of ๐๐ do
AddEdge(๐๐๐๐, ๐๐๐๐)AddEdge(๐๐๐๐๐๐๐๐, ๐๐)
Propagate(n, pts)if pts is not empty then
pt(n) โ= ptsforeach n โ s โ PFG do
add ๐๐, pts to WL
Pointer Analysis, Imperative Implementation
Tian Tan @ Nanjing University 9
Solve(๐๐๐๐๐๐๐๐๐๐๐๐)WL=[ ],PFG={},S={},RM={},CG={}AddReachable(๐๐๐๐๐๐๐๐๐๐๐๐)while WL is not empty do
remove ๐๐, ๐๐๐๐๐๐ from WLฮ = pts โ pt(n)Propagate(n, ฮ)if n represents a variable x then
foreach ๐๐๐๐ โ ฮ doforeach x.f = y โ ๐๐ do
AddEdge(y, ๐๐๐๐ . ๐๐)foreach y = x.f โ ๐๐ do
AddEdge(๐๐๐๐ . ๐๐, y)ProcessCall(x, ๐๐๐๐)
AddReachable(m)if m โ RM then
add m to RM๐๐ โช= ๐๐๐๐foreach i: x = new T() โ ๐๐๐๐ do
add ๐ฅ๐ฅ, {๐๐๐๐} to WLforeach x = y โ ๐๐๐๐ do
AddEdge(y, x)
AddEdge(s, t)if s โ t โ PFG then
add s โ t to PFGif ๐๐๐๐(๐๐) is not empty then
add ๐๐, ๐๐๐๐(๐๐) to WL
ProcessCall(x, ๐๐๐๐)foreach l: r = x.k(a1,โฆ,an) โ ๐๐ do๐๐ = Dispatch(๐๐๐๐, k)add ๐๐๐๐๐ก๐๐๐ก๐ก, {๐๐๐๐} to WLif l โ ๐๐ โ CG then
add l โ ๐๐ to CGAddReachable(๐๐)foreach parameter ๐๐๐๐ of ๐๐ do
AddEdge(๐๐๐๐, ๐๐๐๐)AddEdge(๐๐๐๐๐๐๐๐, ๐๐)
Propagate(n, pts)if pts is not empty then
pt(n) โ= ptsforeach n โ s โ PFG do
add ๐๐, pts to WL
โข How to implement worklist?โข Array list or linked list?โข Which worklist entry should be
processed first?
Pointer Analysis, Imperative Implementation
Tian Tan @ Nanjing University 10
Solve(๐๐๐๐๐๐๐๐๐๐๐๐)WL=[ ],PFG={},S={},RM={},CG={}AddReachable(๐๐๐๐๐๐๐๐๐๐๐๐)while WL is not empty do
remove ๐๐, ๐๐๐๐๐๐ from WLฮ = pts โ pt(n)Propagate(n, ฮ)if n represents a variable x then
foreach ๐๐๐๐ โ ฮ doforeach x.f = y โ ๐๐ do
AddEdge(y, ๐๐๐๐ . ๐๐)foreach y = x.f โ ๐๐ do
AddEdge(๐๐๐๐ . ๐๐, y)ProcessCall(x, ๐๐๐๐)
AddReachable(m)if m โ RM then
add m to RM๐๐ โช= ๐๐๐๐foreach i: x = new T() โ ๐๐๐๐ do
add ๐ฅ๐ฅ, {๐๐๐๐} to WLforeach x = y โ ๐๐๐๐ do
AddEdge(y, x)
AddEdge(s, t)if s โ t โ PFG then
add s โ t to PFGif ๐๐๐๐(๐๐) is not empty then
add ๐๐, ๐๐๐๐(๐๐) to WL
ProcessCall(x, ๐๐๐๐)foreach l: r = x.k(a1,โฆ,an) โ ๐๐ do๐๐ = Dispatch(๐๐๐๐, k)add ๐๐๐๐๐ก๐๐๐ก๐ก, {๐๐๐๐} to WLif l โ ๐๐ โ CG then
add l โ ๐๐ to CGAddReachable(๐๐)foreach parameter ๐๐๐๐ of ๐๐ do
AddEdge(๐๐๐๐, ๐๐๐๐)AddEdge(๐๐๐๐๐๐๐๐, ๐๐)
Propagate(n, pts)if pts is not empty then
pt(n) โ= ptsforeach n โ s โ PFG do
add ๐๐, pts to WL
โข How to implement worklist?โข Array list or linked list?โข Which worklist entry should be
processed first?โข How to implement points-to set (pt)?
โข Hash set or bit vector?
Pointer Analysis, Imperative Implementation
Tian Tan @ Nanjing University 11
Solve(๐๐๐๐๐๐๐๐๐๐๐๐)WL=[ ],PFG={},S={},RM={},CG={}AddReachable(๐๐๐๐๐๐๐๐๐๐๐๐)while WL is not empty do
remove ๐๐, ๐๐๐๐๐๐ from WLฮ = pts โ pt(n)Propagate(n, ฮ)if n represents a variable x then
foreach ๐๐๐๐ โ ฮ doforeach x.f = y โ ๐๐ do
AddEdge(y, ๐๐๐๐ . ๐๐)foreach y = x.f โ ๐๐ do
AddEdge(๐๐๐๐ . ๐๐, y)ProcessCall(x, ๐๐๐๐)
AddReachable(m)if m โ RM then
add m to RM๐๐ โช= ๐๐๐๐foreach i: x = new T() โ ๐๐๐๐ do
add ๐ฅ๐ฅ, {๐๐๐๐} to WLforeach x = y โ ๐๐๐๐ do
AddEdge(y, x)
AddEdge(s, t)if s โ t โ PFG then
add s โ t to PFGif ๐๐๐๐(๐๐) is not empty then
add ๐๐, ๐๐๐๐(๐๐) to WL
ProcessCall(x, ๐๐๐๐)foreach l: r = x.k(a1,โฆ,an) โ ๐๐ do๐๐ = Dispatch(๐๐๐๐, k)add ๐๐๐๐๐ก๐๐๐ก๐ก, {๐๐๐๐} to WLif l โ ๐๐ โ CG then
add l โ ๐๐ to CGAddReachable(๐๐)foreach parameter ๐๐๐๐ of ๐๐ do
AddEdge(๐๐๐๐, ๐๐๐๐)AddEdge(๐๐๐๐๐๐๐๐, ๐๐)
Propagate(n, pts)if pts is not empty then
pt(n) โ= ptsforeach n โ s โ PFG do
add ๐๐, pts to WL
โข How to implement worklist?โข Array list or linked list?โข Which worklist entry should be
processed first?โข How to implement points-to set (pt)?
โข Hash set or bit vector?โข How to connect PFG nodes and pointers?
Pointer Analysis, Imperative Implementation
Tian Tan @ Nanjing University 12
Solve(๐๐๐๐๐๐๐๐๐๐๐๐)WL=[ ],PFG={},S={},RM={},CG={}AddReachable(๐๐๐๐๐๐๐๐๐๐๐๐)while WL is not empty do
remove ๐๐, ๐๐๐๐๐๐ from WLฮ = pts โ pt(n)Propagate(n, ฮ)if n represents a variable x then
foreach ๐๐๐๐ โ ฮ doforeach x.f = y โ ๐๐ do
AddEdge(y, ๐๐๐๐ . ๐๐)foreach y = x.f โ ๐๐ do
AddEdge(๐๐๐๐ . ๐๐, y)ProcessCall(x, ๐๐๐๐)
AddReachable(m)if m โ RM then
add m to RM๐๐ โช= ๐๐๐๐foreach i: x = new T() โ ๐๐๐๐ do
add ๐ฅ๐ฅ, {๐๐๐๐} to WLforeach x = y โ ๐๐๐๐ do
AddEdge(y, x)
AddEdge(s, t)if s โ t โ PFG then
add s โ t to PFGif ๐๐๐๐(๐๐) is not empty then
add ๐๐, ๐๐๐๐(๐๐) to WL
ProcessCall(x, ๐๐๐๐)foreach l: r = x.k(a1,โฆ,an) โ ๐๐ do๐๐ = Dispatch(๐๐๐๐, k)add ๐๐๐๐๐ก๐๐๐ก๐ก, {๐๐๐๐} to WLif l โ ๐๐ โ CG then
add l โ ๐๐ to CGAddReachable(๐๐)foreach parameter ๐๐๐๐ of ๐๐ do
AddEdge(๐๐๐๐, ๐๐๐๐)AddEdge(๐๐๐๐๐๐๐๐, ๐๐)
Propagate(n, pts)if pts is not empty then
pt(n) โ= ptsforeach n โ s โ PFG do
add ๐๐, pts to WL
โข How to implement worklist?โข Array list or linked list?โข Which worklist entry should be
processed first?โข How to implement points-to set (pt)?
โข Hash set or bit vector?โข How to connect PFG nodes and pointers?โข How to associate variables to the
relevant statements?
Pointer Analysis, Imperative Implementation
Tian Tan @ Nanjing University 13
Solve(๐๐๐๐๐๐๐๐๐๐๐๐)WL=[ ],PFG={},S={},RM={},CG={}AddReachable(๐๐๐๐๐๐๐๐๐๐๐๐)while WL is not empty do
remove ๐๐, ๐๐๐๐๐๐ from WLฮ = pts โ pt(n)Propagate(n, ฮ)if n represents a variable x then
foreach ๐๐๐๐ โ ฮ doforeach x.f = y โ ๐๐ do
AddEdge(y, ๐๐๐๐ . ๐๐)foreach y = x.f โ ๐๐ do
AddEdge(๐๐๐๐ . ๐๐, y)ProcessCall(x, ๐๐๐๐)
AddReachable(m)if m โ RM then
add m to RM๐๐ โช= ๐๐๐๐foreach i: x = new T() โ ๐๐๐๐ do
add ๐ฅ๐ฅ, {๐๐๐๐} to WLforeach x = y โ ๐๐๐๐ do
AddEdge(y, x)
AddEdge(s, t)if s โ t โ PFG then
add s โ t to PFGif ๐๐๐๐(๐๐) is not empty then
add ๐๐, ๐๐๐๐(๐๐) to WL
ProcessCall(x, ๐๐๐๐)foreach l: r = x.k(a1,โฆ,an) โ ๐๐ do๐๐ = Dispatch(๐๐๐๐, k)add ๐๐๐๐๐ก๐๐๐ก๐ก, {๐๐๐๐} to WLif l โ ๐๐ โ CG then
add l โ ๐๐ to CGAddReachable(๐๐)foreach parameter ๐๐๐๐ of ๐๐ do
AddEdge(๐๐๐๐, ๐๐๐๐)AddEdge(๐๐๐๐๐๐๐๐, ๐๐)
Propagate(n, pts)if pts is not empty then
pt(n) โ= ptsforeach n โ s โ PFG do
add ๐๐, pts to WL
โข How to implement worklist?โข Array list or linked list?โข Which worklist entry should be
processed first?โข How to implement points-to set (pt)?
โข Hash set or bit vector?โข How to connect PFG nodes and pointers?โข How to associate variables to the
relevant statements?โข โฆ
So many implementation details
Pointer Analysis, Declarative Implementation (via Datalog)
Tian Tan @ Nanjing University 14
VarPointsTo(this, o),Reachable(m),CallGraph(l, m) <-
VCall(l, x, k),VarPointsTo(x, o),Dispatch(o, k, m),ThisVar(m, this).
VarPointsTo(pi, o) <-CallGraph(l, m),Argument(l, i, ai),Parameter(m, i, pi),VarPointsTo(ai, o).
VarPointsTo(r, o) <-CallGraph(l, m),MethodReturn(m, ret),VarPointsTo(ret, o),CallReturn(l, r),
VarPointsTo(x, o) <-Reachable(m),New(x, o, m).
VarPointsTo(x, o) <-Assign(x, y),VarPointsTo(y, o).
FieldPointsTo(oi, f, oj) <-Store(x, f, y),VarPointsTo(x, oi),VarPointsTo(y, oj).
VarPointsTo(y, oj) <-Load(y, x, f),VarPointsTo(x, oi),FieldPointsTo(oi, f, oj).
VarPointsTo(x, o) <-Reachable(m),New(x, o, m).
VarPointsTo(x, o) <-Assign(x, y),VarPointsTo(y, o).
FieldPointsTo(oi, f, oj) <-Store(x, f, y),VarPointsTo(x, oi),VarPointsTo(y, oj).
VarPointsTo(y, oj) <-Load(y, x, f),VarPointsTo(x, oi),FieldPointsTo(oi, f, oj).
Pointer Analysis, Declarative Implementation (via Datalog)
Tian Tan @ Nanjing University 15
VarPointsTo(this, o),Reachable(m),CallGraph(l, m) <-
VCall(l, x, k),VarPointsTo(x, o),Dispatch(o, k, m),ThisVar(m, this).
VarPointsTo(pi, o) <-CallGraph(l, m),Argument(l, i, ai),Parameter(m, i, pi),VarPointsTo(ai, o).
VarPointsTo(r, o) <-CallGraph(l, m),MethodReturn(m, ret),VarPointsTo(ret, o),CallReturn(l, r),โข Succinct
โข Readable (logic-based specification)โข Easy to implement
1. Motivation2. Introduction to Datalog3. Pointer Analysis via Datalog4. Taint Analysis via Datalog
Tian Tan @ Nanjing University 16
Contents
Datalog
โข Datalog is a declarative logic programming language that is a subset of Prolog.
โข It emerged as a database language (mid-1980s)*โข Now it has a variety of applications
โข Program analysisโข Declarative networkingโข Big dataโข Cloud computingโข โฆ
Tian Tan @ Nanjing University 17
David Maier, K. Tuncay Tekle, Michael Kifer, and David S. Warren, โDatalog: Concepts, History, and Outlookโ. Chapter, 2018.
*
Datalog
โข No side-effectsโข No control flowsโข No functionsโข Not Turing-complete
Tian Tan @ Nanjing University 18
Datalog = Data + Logic(and, or, not)
Datalog
โข No side-effectsโข No control flowsโข No functionsโข Not Turing-complete
Tian Tan @ Nanjing University 19
Datalog = Data + Logic(and, or, not)
Predicates (Data)
โข In Datalog, a predicate (relation) is a set of statementsโข Essentially, a predicate is a table of data
Tian Tan @ Nanjing University 20
person age
Xiaoming 18
Xiaohong 23
Alan 16
Abao 31
Age Age is a predicate, which states the age of some persons.
Predicates (Data)
โข In Datalog, a predicate (relation) is a set of statementsโข Essentially, a predicate is a table of dataโข A fact asserts that a particular tuple (a row) belongs
to a relation (a table), i.e., it represents a predicate being true for a particular combination of values
Tian Tan @ Nanjing University 21
person age
Xiaoming 18
Xiaohong 23
Alan 16
Abao 31
Age is a predicate, which states the age of some persons. For Age:โข (โXiaomingโ,18) means
โXiaoming is 18โ, which is a factโข (โAbaoโ,23) means โAbao is 23โ,
which is not a fact
Age
Atoms
โข Atoms are basic elements of Datalog, which represent predicates of the form
โข Termsโข Variables: stand for any valuesโข Constants
Tian Tan @ Nanjing University 22
P(X1,X2,โฆ,Xn)
Name of predicate Arguments (terms)
Atoms
โข Atoms are basic elements of Datalog, which represent predicates of the form
โข Termsโข Variables: stand for any valuesโข Constants
โข Examplesโข Age(person,age)โข Age(โXiaomingโ,18)
Tian Tan @ Nanjing University 23
P(X1,X2,โฆ,Xn)
Name of predicate Arguments (terms)
Atoms (Cont.)
โข P(X1,X2,โฆ,Xn) is called relational atomโข P(X1,X2,โฆ,Xn) evaluates to true when predicate P
contains the tuple described by X1,X2,โฆ,Xn
Tian Tan @ Nanjing University 24
Atoms (Cont.)
โข P(X1,X2,โฆ,Xn) is called relational atomโข P(X1,X2,โฆ,Xn) evaluates to true when predicate P
contains the tuple described by X1,X2,โฆ,Xnโข Age(โXiaomingโ,18) is
Tian Tan @ Nanjing University 25
person age
Xiaoming 18
Xiaohong 23
Alan 16
Abao 31
Age
Atoms (Cont.)
โข P(X1,X2,โฆ,Xn) is called relational atomโข P(X1,X2,โฆ,Xn) evaluates to true when predicate P
contains the tuple described by X1,X2,โฆ,Xnโข Age(โXiaomingโ,18) is trueโข Age(โAlanโ,23) is
Tian Tan @ Nanjing University 26
person age
Xiaoming 18
Xiaohong 23
Alan 16
Abao 31
Age
Atoms (Cont.)
โข P(X1,X2,โฆ,Xn) is called relational atomโข P(X1,X2,โฆ,Xn) evaluates to true when predicate P
contains the tuple described by X1,X2,โฆ,Xnโข Age(โXiaomingโ,18) is trueโข Age(โAlanโ,23) is false
Tian Tan @ Nanjing University 27
person age
Xiaoming 18
Xiaohong 23
Alan 16
Abao 31
Age
Atoms (Cont.)
โข P(X1,X2,โฆ,Xn) is called relational atomโข P(X1,X2,โฆ,Xn) evaluates to true when predicate P
contains the tuple described by X1,X2,โฆ,Xnโข Age(โXiaomingโ,18) is trueโข Age(โAlanโ,23) is false
โข In addition to relational atoms, Datalog also has arithmetic atoms
โข E.g., age >= 18Tian Tan @ Nanjing University 28
person age
Xiaoming 18
Xiaohong 23
Alan 16
Abao 31
Age
Datalog Rules (Logic)
โข Rule is a way of expressing logical inferencesโข Rules also serve to specify how facts are deducedโข The form of a rule is
Tian Tan @ Nanjing University 29
H <- B1,B2,โฆ,Bn.
Datalog Rules (Logic)
โข Rule is a way of expressing logical inferencesโข Rules also serve to specify how facts are deducedโข The form of a rule is
Tian Tan @ Nanjing University 30
H <- B1,B2,โฆ,Bn.
Head (consequent)H is an atom
Body (antecedent)Bi is a (possibly negated) atomEach Bi is called a subgoal
The meaning of a rule is โhead is true if body is trueโ
Datalog Rules (Cont.)
โ,โ can be read as (logical) and, i.e., body B1,B2,โฆ,Bnis true if all subgoals B1, B2, โฆ, and Bn are true
For example, we can deduce adults via Datalog rule:
Tian Tan @ Nanjing University 31
H <- B1,B2,โฆ,Bn.
Adult(person) <-Age(person,age),age >= 18.
Datalog Rules (Cont.)
โ,โ can be read as (logical) and, i.e., body B1,B2,โฆ,Bnis true if all subgoals B1, B2, โฆ, and Bn are true
For example, we can deduce adults via Datalog rule:
Tian Tan @ Nanjing University 32
H <- B1,B2,โฆ,Bn.
Adult(person) <-Age(person,age),age >= 18.
How to interpret the rules?
Interpretation of Datalog Rules
โข Consider all possible combinations of values of the variables in the subgoals
โข If a combination makes all subgoals true, then the head atom (with corresponding values) is also true
โข The head predicate consists of all true atoms
Tian Tan @ Nanjing University 33
H(X1,X2) <- B1(X1,X3),B2(X2,X4),โฆ,Bn(Xm).
Rule Interpretation: An Example
Tian Tan @ Nanjing University 34
person age
Xiaoming 18
Xiaohong 23
Alan 16
Abao 31
AgeAdult(person) <-Age(person,age),age >= 18.
โข Consider all possible combinations of values of the variables in the subgoals
โข If a combination makes all subgoals true, then the head atom (with corresponding values) is also true
โข The head predicate consists of all true atoms
Rule Interpretation: An Example
Tian Tan @ Nanjing University 35
person age
Xiaoming 18
Xiaohong 23
Alan 16
Abao 31
AgeAdult(person) <-Age(person,age),age >= 18.
Age(โXiaomingโ,18),18>=18.Adult(โXiaomingโ) <-
โข Consider all possible combinations of values of the variables in the subgoals
โข If a combination makes all subgoals true, then the head atom (with corresponding values) is also true
โข The head predicate consists of all true atoms
Rule Interpretation: An Example
Tian Tan @ Nanjing University 36
person age
Xiaoming 18
Xiaohong 23
Alan 16
Abao 31
AgeAdult(person) <-Age(person,age),age >= 18.
Age(โXiaomingโ,18),18>=18.Age(โXiaohongโ,23),23>=18.
Adult(โXiaomingโ) <-Adult(โXiaohongโ) <-
โข Consider all possible combinations of values of the variables in the subgoals
โข If a combination makes all subgoals true, then the head atom (with corresponding values) is also true
โข The head predicate consists of all true atoms
Rule Interpretation: An Example
Tian Tan @ Nanjing University 37
person age
Xiaoming 18
Xiaohong 23
Alan 16
Abao 31
AgeAdult(person) <-Age(person,age),age >= 18.
Age(โXiaomingโ,18),18>=18.Age(โXiaohongโ,23),23>=18.Age(โAlanโ,16),16>=18.
Adult(โXiaomingโ) <-Adult(โXiaohongโ) <-
โข Consider all possible combinations of values of the variables in the subgoals
โข If a combination makes all subgoals true, then the head atom (with corresponding values) is also true
โข The head predicate consists of all true atoms
Rule Interpretation: An Example
Tian Tan @ Nanjing University 38
person age
Xiaoming 18
Xiaohong 23
Alan 16
Abao 31
AgeAdult(person) <-Age(person,age),age >= 18.
Age(โXiaomingโ,18),18>=18.Age(โXiaohongโ,23),23>=18.Age(โAlanโ,16),16>=18.Age(โAbaoโ,31),31>=18.
Adult(โXiaomingโ) <-Adult(โXiaohongโ) <-
Adult(โAbaoโ) <-
โข Consider all possible combinations of values of the variables in the subgoals
โข If a combination makes all subgoals true, then the head atom (with corresponding values) is also true
โข The head predicate consists of all true atoms
Rule Interpretation: An Example
Tian Tan @ Nanjing University 39
โข Consider all possible combinations of values of the variables in the subgoals
โข If a combination makes all subgoals true, then the head atom (with corresponding values) is also true
โข The head predicate consists of all true atoms
person age
Xiaoming 18
Xiaohong 23
Alan 16
Abao 31
AgeAdult(person) <-Age(person,age),age >= 18.
person
Xiaoming
Xiaohong
Abao
AdultDatalog program = Facts + Rules
Rule Interpretation: An Example
Tian Tan @ Nanjing University 40
โข Consider all possible combinations of values of the variables in the subgoals
โข If a combination makes all subgoals true, then the head atom (with corresponding values) is also true
โข The head predicate consists of all true atoms
person age
Xiaoming 18
Xiaohong 23
Alan 16
Abao 31
AgeAdult(person) <-Age(person,age),age >= 18.
person
Xiaoming
Xiaohong
Abao
AdultDatalog program = Facts + Rules
Where does initial data come from?
EDB and IDB PredicatesConventionally, predicates in Datalog are divided into two kinds:1. EDB (extensional database)
โข The predicates that are defined in a prioriโข Relations are immutableโข Can be seen as input relations
2. IDB (intensional database)โข The predicates that are established only by rulesโข Relations are inferred by rulesโข Can be seen as output relations
Tian Tan @ Nanjing University 41
EDB and IDB PredicatesConventionally, predicates in Datalog are divided into two kinds:1. EDB (extensional database)
โข The predicates that are defined in a prioriโข Relations are immutableโข Can be seen as input relations
2. IDB (intensional database)โข The predicates that are established only by rulesโข Relations are inferred by rulesโข Can be seen as output relations
Tian Tan @ Nanjing University 42
H <- B1,B2,โฆ,Bn.
โข H can only be IDBโข Bi can be EDB or IDB
Logical Or
There are two ways to express logical or in Datalog1. Write multiple rules with the same head
2. Use logical or operator โ;โ
Tian Tan @ Nanjing University 43
person hobby
Xiaoming cooking
Xiaoming singing
Xiaohong jogging
Abao sleeping
Alan swimming
โฆ โฆ
HobbySportFan(person) <- Hobby(person, โjoggingโ).SportFan(person) <- Hobby(person,โswimmingโ).
SportFan(person) <-Hobby(person,โjoggingโ);Hobby(person,โswimmingโ).
Logical Or
There are two ways to express logical or in Datalog1. Write multiple rules with the same head
2. Use logical or operator โ;โ
Tian Tan @ Nanjing University 44
person hobby
Xiaoming cooking
Xiaoming singing
Xiaohong jogging
Abao sleeping
Alan swimming
โฆ โฆ
Hobby
The precedence of โ;โ (or) is lower than โ,โ (and), so disjunctions may be enclosed by parentheses, e.g., H <- A,(B;C).
SportFan(person) <- Hobby(person, โjoggingโ).SportFan(person) <- Hobby(person,โswimmingโ).
SportFan(person) <-Hobby(person,โjoggingโ);Hobby(person,โswimmingโ).
Negation
โข In Datalog rules, a subgoal can be a negated atom, which negates its meaning
โข Negated subgoal is written as !B(โฆ), and read as not B(โฆ)
Tian Tan @ Nanjing University 45
H(X1,X2) <- B1(X1,X3),!B2(X2,X4),โฆ,Bn(Xm).
Negation
โข In Datalog rules, a subgoal can be a negated atom, which negates its meaning
โข Negated subgoal is written as !B(โฆ), and read as not B(โฆ)
โข For example, to compute the students who need to take a make-up exam, we can write
Tian Tan @ Nanjing University 46
H(X1,X2) <- B1(X1,X3),!B2(X2,X4),โฆ,Bn(Xm).
MakeupExamStd(student) <-Student(student),!PassedStd(student).
Where Student stores all students, and PassedStd stores the students who passed the exam.
Recursion
โข Datalog supports recursive rules, which allows that an IDB predicate can be deduced (directly/indirectly) from itself
Tian Tan @ Nanjing University 47
Recursion
โข Datalog supports recursive rules, which allows that an IDB predicate can be deduced (directly/indirectly) from itself
โข For example, we can compute the reachability information (i.e., transitive closure) of a graph with recursive rules:
Tian Tan @ Nanjing University 48
Reach(from, to) <-Edge(from, to).
Reach(from, to) <-Reach(from, node),Edge(node, to).
Where Edge(a,b) means that the graph has an edge from node a to node b,and Reach(a,b) means that b is reachable from a.
Recursion (Cont.)
โข Without recursion, Datalog can only express the queries of basic relational algebra
โข Basically a SQL with SELECT-FROM-WHERE
โข With recursion, Datalog becomes much more powerful, and is able to express sophisticated program analyses, such as pointer analysis
Tian Tan @ Nanjing University 49
Rule Safety
Are these rules ok?
Tian Tan @ Nanjing University 50
A(x) <- B(y), x > y. A(x) <- B(y), !C(x,y).
Rule Safety
Are these rules ok?
Tian Tan @ Nanjing University 51
A(x) <- B(y), x > y. A(x) <- B(y), !C(x,y).
For both rules, infinite values of x can satisfy the rule,which makes A an infinite relation.
Rule Safety
Are these rules ok?
โข A rule is safe if every variable appears in at least one non-negated relational atom
โข Above two rules are unsafeโข In Datalog, only safe rules are allowed
Tian Tan @ Nanjing University 52
A(x) <- B(y), x > y. A(x) <- B(y), !C(x,y).
For both rules, infinite values of x can satisfy the rule,which makes A an infinite relation.
Recursion and Negation
Is this rule ok?
Tian Tan @ Nanjing University 54
A(x) <- B(x), !A(x)
Suppose B(1) is true.If A(1) is false, then A(1) is true.If A(1) is true, A(1) should not be true.โฆ
Recursion and Negation
Is this rule ok?
The rule is contradictory and makes no sense
In Datalog, recursion and negation of an atom must be separated. Otherwise, the rules may contain contradiction and the inference fails to converge.
Tian Tan @ Nanjing University 55
A(x) <- B(x), !A(x)
Suppose B(1) is true.If A(1) is false, then A(1) is true.If A(1) is true, A(1) should not be true.โฆ
Execution of Datalog Programs
โข Datalog engine deduces facts by given rules and EDB predicates until no new facts can be deduced. Some modern Datalog engines
LogicBlox, Soufflรฉ, XSB, Datomic, Flora-2, โฆ
Tian Tan @ Nanjing University 56
EDB
RulesIDB
Datalog engine
Execution of Datalog Programs
โข Datalog engine deduces facts by given rules and EDB predicates until no new facts can be deduced. Some modern Datalog engines
LogicBlox, Soufflรฉ, XSB, Datomic, Flora-2, โฆ
โข Monotonicity: Datalog is monotone as facts cannot be deleted
Tian Tan @ Nanjing University 57
EDB
RulesIDB
Datalog engine
Execution of Datalog Programs
โข Datalog engine deduces facts by given rules and EDB predicates until no new facts can be deduced. Some modern Datalog engines
LogicBlox, Soufflรฉ, XSB, Datomic, Flora-2, โฆ
โข Monotonicity: Datalog is monotone as facts cannot be deleted
โข Termination: A Datalog program always terminates as1) Datalog is monotone2) Possible values of IDB predicates are finite (rule safety)
Tian Tan @ Nanjing University 58
EDB
RulesIDB
Datalog engine
1. Motivation2. Introduction to Datalog3. Pointer Analysis via Datalog4. Taint Analysis via Datalog
Tian Tan @ Nanjing University 59
Contents
Pointer Analysis via Datalog
โข EDB: pointer-relevant information that can be extracted from program syntactically
โข IDB: pointer analysis resultsโข Rules: pointer analysis rules
Tian Tan @ Nanjing University 60
Pointer Analysis via Datalog
โข EDB: pointer-relevant information that can be extracted from program syntactically
โข IDB: pointer analysis resultsโข Rules: pointer analysis rules
Tian Tan @ Nanjing University 61
New x = new T()
Assign x = y
Store x.f = y
Load y = x.f
Call r = x.k(a, โฆ) Then discuss method calls later
First focus on these statements(suppose the program has just one method)
Datalog Model for Pointer Analysis
Kind Statement
New i: x = new T()
Assign x = y
Store x.f = y
Load y = x.f
Tian Tan @ Nanjing University 62
New(x : V, o : O)
Assign(x : V, y : V)
Load(y : V, x : V, f : F)
Store(x : V, f : F, y : V)
Variables: VFields: FObjects: O
VarPointsTo(v: V, o : O)
FieldPointsTo(oi : O, f: F, oj : O)
EDB
IDB
e.g., fact VarPointsTo(๐ฅ๐ฅ,๐๐๐๐) represents ๐๐๐๐ โ ๐๐๐๐(๐ฅ๐ฅ)
e.g., fact FieldsPointsTo(๐๐๐๐,๐๐,๐๐๐๐) represents ๐๐๐๐ โ ๐๐๐๐(๐๐๐๐ .๐๐)
An Example
Tian Tan @ Nanjing University 63
1 b = new C();2 a = b;3 c = new C();4 c.f = a;5 d = c;6 c.f = d;7 e = d.f;
Variables: VFields: FObjects: O
An Example
Tian Tan @ Nanjing University 64
1 b = new C();2 a = b;3 c = new C();4 c.f = a;5 d = c;6 c.f = d;7 e = d.f;
New(x : V, o : O)
New
๐๐ ๐๐1๐๐ ๐๐3
Variables: VFields: FObjects: O
An Example
Tian Tan @ Nanjing University 65
1 b = new C();2 a = b;3 c = new C();4 c.f = a;5 d = c;6 c.f = d;7 e = d.f;
New(x : V, o : O)
Assign(x : V, y : V)
New
๐๐ ๐๐1๐๐ ๐๐3
Assign
๐๐ ๐๐๐๐ ๐๐
Variables: VFields: FObjects: O
An Example
Tian Tan @ Nanjing University 66
1 b = new C();2 a = b;3 c = new C();4 c.f = a;5 d = c;6 c.f = d;7 e = d.f;
New(x : V, o : O)
Assign(x : V, y : V)
Store(x : V, f : F, y : V)
New
๐๐ ๐๐1๐๐ ๐๐3
Assign
๐๐ ๐๐๐๐ ๐๐
Store
๐๐ ๐๐ ๐๐๐๐ ๐๐ ๐๐Variables: V
Fields: FObjects: O
An Example
Tian Tan @ Nanjing University 67
1 b = new C();2 a = b;3 c = new C();4 c.f = a;5 d = c;6 c.f = d;7 e = d.f;
New(x : V, o : O)
Assign(x : V, y : V)
Load(x : V, y : V, f : F)
Store(x : V, f : F, y : V)
New
๐๐ ๐๐1๐๐ ๐๐3
Assign
๐๐ ๐๐๐๐ ๐๐
Store
๐๐ ๐๐ ๐๐๐๐ ๐๐ ๐๐
Load
๐๐ ๐๐ ๐๐
Variables: VFields: FObjects: O
An Example
Tian Tan @ Nanjing University 68
1 b = new C();2 a = b;3 c = new C();4 c.f = a;5 d = c;6 c.f = d;7 e = d.f;
New(x : V, o : O)
Assign(x : V, y : V)
Load(x : V, y : V, f : F)
Store(x : V, f : F, y : V)
New
๐๐ ๐๐1๐๐ ๐๐3
Assign
๐๐ ๐๐๐๐ ๐๐
Store
๐๐ ๐๐ ๐๐๐๐ ๐๐ ๐๐
Load
๐๐ ๐๐ ๐๐
Variables: VFields: FObjects: O
Datalog Rules for Pointer Analysis
Tian Tan @ Nanjing University 69
VarPointsTo(x, o) <-New(x, o).
VarPointsTo(x, o) <-Assign(x, y),VarPointsTo(y, o).
FieldPointsTo(oi, f, oj) <-Store(x, f, y),VarPointsTo(x, oi),VarPointsTo(y, oj).
VarPointsTo(y, oj) <-Load(y, x, f),VarPointsTo(x, oi),FieldPointsTo(oi, f, oj).
Kind Statement Rule
New i: x = new T() ๐๐๐๐ โ ๐๐๐๐(๐ฅ๐ฅ)
Assign x = y ๐๐๐๐ โ ๐๐๐๐(๐ฆ๐ฆ)๐๐๐๐ โ ๐๐๐๐(๐ฅ๐ฅ)
Store x.f = y๐๐๐๐ โ ๐๐๐๐ ๐ฅ๐ฅ๐๐๐๐ โ ๐๐๐๐ ๐ฆ๐ฆ๐๐๐๐ โ ๐๐๐๐(๐๐๐๐ .๐๐)
Load y = x.f๐๐๐๐ โ ๐๐๐๐ ๐ฅ๐ฅ๐๐๐๐ โ ๐๐๐๐ ๐๐๐๐ .๐๐๐๐๐๐ โ ๐๐๐๐(๐ฆ๐ฆ)
Datalog Rules for Pointer Analysis
Tian Tan @ Nanjing University 70
VarPointsTo(x, o) <-New(x, o).
VarPointsTo(x, o) <-Assign(x, y),VarPointsTo(y, o).
FieldPointsTo(oi, f, oj) <-Store(x, f, y),VarPointsTo(x, oi),VarPointsTo(y, oj).
VarPointsTo(y, oj) <-Load(y, x, f),VarPointsTo(x, oi),FieldPointsTo(oi, f, oj).
Kind Statement Rule
New i: x = new T() ๐๐๐๐ โ ๐๐๐๐(๐ฅ๐ฅ)
Assign x = y ๐๐๐๐ โ ๐๐๐๐(๐ฆ๐ฆ)๐๐๐๐ โ ๐๐๐๐(๐ฅ๐ฅ)
Store x.f = y๐๐๐๐ โ ๐๐๐๐ ๐ฅ๐ฅ๐๐๐๐ โ ๐๐๐๐ ๐ฆ๐ฆ๐๐๐๐ โ ๐๐๐๐(๐๐๐๐ .๐๐)
Load y = x.f๐๐๐๐ โ ๐๐๐๐ ๐ฅ๐ฅ๐๐๐๐ โ ๐๐๐๐ ๐๐๐๐ .๐๐๐๐๐๐ โ ๐๐๐๐(๐ฆ๐ฆ)
Datalog Rules for Pointer Analysis
Tian Tan @ Nanjing University 71
VarPointsTo(x, o) <-New(x, o).
VarPointsTo(x, o) <-Assign(x, y),VarPointsTo(y, o).
FieldPointsTo(oi, f, oj) <-Store(x, f, y),VarPointsTo(x, oi),VarPointsTo(y, oj).
VarPointsTo(y, oj) <-Load(y, x, f),VarPointsTo(x, oi),FieldPointsTo(oi, f, oj).
Kind Statement Rule
New i: x = new T() ๐๐๐๐ โ ๐๐๐๐(๐ฅ๐ฅ)
Assign x = y ๐๐๐๐ โ ๐๐๐๐(๐ฆ๐ฆ)๐๐๐๐ โ ๐๐๐๐(๐ฅ๐ฅ)
Store x.f = y๐๐๐๐ โ ๐๐๐๐ ๐ฅ๐ฅ๐๐๐๐ โ ๐๐๐๐ ๐ฆ๐ฆ๐๐๐๐ โ ๐๐๐๐(๐๐๐๐ .๐๐)
Load y = x.f๐๐๐๐ โ ๐๐๐๐ ๐ฅ๐ฅ๐๐๐๐ โ ๐๐๐๐ ๐๐๐๐ .๐๐๐๐๐๐ โ ๐๐๐๐(๐ฆ๐ฆ)
Datalog Rules for Pointer Analysis
Tian Tan @ Nanjing University 72
VarPointsTo(x, o) <-New(x, o).
VarPointsTo(x, o) <-Assign(x, y),VarPointsTo(y, o).
FieldPointsTo(oi, f, oj) <-Store(x, f, y),VarPointsTo(x, oi),VarPointsTo(y, oj).
VarPointsTo(y, oj) <-Load(y, x, f),VarPointsTo(x, oi),FieldPointsTo(oi, f, oj).
Kind Statement Rule
New i: x = new T() ๐๐๐๐ โ ๐๐๐๐(๐ฅ๐ฅ)
Assign x = y ๐๐๐๐ โ ๐๐๐๐(๐ฆ๐ฆ)๐๐๐๐ โ ๐๐๐๐(๐ฅ๐ฅ)
Store x.f = y๐๐๐๐ โ ๐๐๐๐ ๐ฅ๐ฅ๐๐๐๐ โ ๐๐๐๐ ๐ฆ๐ฆ๐๐๐๐ โ ๐๐๐๐(๐๐๐๐ .๐๐)
Load y = x.f๐๐๐๐ โ ๐๐๐๐ ๐ฅ๐ฅ๐๐๐๐ โ ๐๐๐๐ ๐๐๐๐ .๐๐๐๐๐๐ โ ๐๐๐๐(๐ฆ๐ฆ)
Datalog Rules for Pointer Analysis
Tian Tan @ Nanjing University 73
VarPointsTo(x, o) <-New(x, o).
VarPointsTo(x, o) <-Assign(x, y),VarPointsTo(y, o).
FieldPointsTo(oi, f, oj) <-Store(x, f, y),VarPointsTo(x, oi),VarPointsTo(y, oj).
VarPointsTo(y, oj) <-Load(y, x, f),VarPointsTo(x, oi),FieldPointsTo(oi, f, oj).
Kind Statement Rule
New i: x = new T() ๐๐๐๐ โ ๐๐๐๐(๐ฅ๐ฅ)
Assign x = y ๐๐๐๐ โ ๐๐๐๐(๐ฆ๐ฆ)๐๐๐๐ โ ๐๐๐๐(๐ฅ๐ฅ)
Store x.f = y๐๐๐๐ โ ๐๐๐๐ ๐ฅ๐ฅ๐๐๐๐ โ ๐๐๐๐ ๐ฆ๐ฆ๐๐๐๐ โ ๐๐๐๐(๐๐๐๐ .๐๐)
Load y = x.f๐๐๐๐ โ ๐๐๐๐ ๐ฅ๐ฅ๐๐๐๐ โ ๐๐๐๐ ๐๐๐๐ .๐๐๐๐๐๐ โ ๐๐๐๐(๐ฆ๐ฆ)
An Example
Tian Tan @ Nanjing University 74
1 b = new C();2 a = b;3 c = new C();4 c.f = a;5 d = c;6 c.f = d;7 e = d.f;
VarPointsTo(x, o) <-New(x, o).
VarPointsTo(x, o) <-Assign(x, y),VarPointsTo(y, o).
FieldPointsTo(oi, f, oj) <-Store(x, f, y),VarPointsTo(x, oi),VarPointsTo(y, oj).
VarPointsTo(y, oj) <-Load(y, x, f),VarPointsTo(x, oi),FieldPointsTo(oi, f, oj).
New
๐๐ ๐๐1๐๐ ๐๐3
Assign
๐๐ ๐๐๐๐ ๐๐
Store
๐๐ ๐๐ ๐๐๐๐ ๐๐ ๐๐
Load
๐๐ ๐๐ ๐๐
VarPointsTo FieldPointsTo
VarPointsTo(v:V, o:O)
FieldPointsTo(oi:O, f:F, oj:O)
New(x:V, o:O)
Assign(x:V, y:V) Load(x:V, y:V, f:F)
Store(x:V, f:F, y:V)
An Example
Tian Tan @ Nanjing University 75
1 b = new C();2 a = b;3 c = new C();4 c.f = a;5 d = c;6 c.f = d;7 e = d.f;
VarPointsTo(x, o) <-New(x, o).
VarPointsTo(x, o) <-Assign(x, y),VarPointsTo(y, o).
FieldPointsTo(oi, f, oj) <-Store(x, f, y),VarPointsTo(x, oi),VarPointsTo(y, oj).
VarPointsTo(y, oj) <-Load(y, x, f),VarPointsTo(x, oi),FieldPointsTo(oi, f, oj).
New
๐๐ ๐๐1๐๐ ๐๐3
Assign
๐๐ ๐๐๐๐ ๐๐
Store
๐๐ ๐๐ ๐๐๐๐ ๐๐ ๐๐
Load
๐๐ ๐๐ ๐๐
VarPointsTo
๐๐ ๐๐1๐๐ ๐๐3
FieldPointsTo
VarPointsTo(v:V, o:O)
FieldPointsTo(oi:O, f:F, oj:O)
New(x:V, o:O)
Assign(x:V, y:V) Load(x:V, y:V, f:F)
Store(x:V, f:F, y:V)
VarPointsTo(๐๐,๐๐1) <-New(๐๐,๐๐1).
VarPointsTo(๐๐,๐๐3) <-New(๐๐,๐๐3).
An Example
Tian Tan @ Nanjing University 76
1 b = new C();2 a = b;3 c = new C();4 c.f = a;5 d = c;6 c.f = d;7 e = d.f;
VarPointsTo(x, o) <-New(x, o).
VarPointsTo(x, o) <-Assign(x, y),VarPointsTo(y, o).
FieldPointsTo(oi, f, oj) <-Store(x, f, y),VarPointsTo(x, oi),VarPointsTo(y, oj).
VarPointsTo(y, oj) <-Load(y, x, f),VarPointsTo(x, oi),FieldPointsTo(oi, f, oj).
New
๐๐ ๐๐1๐๐ ๐๐3
Assign
๐๐ ๐๐๐๐ ๐๐
Store
๐๐ ๐๐ ๐๐๐๐ ๐๐ ๐๐
Load
๐๐ ๐๐ ๐๐
VarPointsTo
๐๐ ๐๐1๐๐ ๐๐3๐๐ ๐๐1
FieldPointsTo
VarPointsTo(v:V, o:O)
FieldPointsTo(oi:O, f:F, oj:O)
New(x:V, o:O)
Assign(x:V, y:V) Load(x:V, y:V, f:F)
Store(x:V, f:F, y:V)
VarPointsTo(๐๐,๐๐1) <-Assign(๐๐,๐๐),VarPointsTo(๐๐,๐๐1).
An Example
Tian Tan @ Nanjing University 77
1 b = new C();2 a = b;3 c = new C();4 c.f = a;5 d = c;6 c.f = d;7 e = d.f;
VarPointsTo(x, o) <-New(x, o).
VarPointsTo(x, o) <-Assign(x, y),VarPointsTo(y, o).
FieldPointsTo(oi, f, oj) <-Store(x, f, y),VarPointsTo(x, oi),VarPointsTo(y, oj).
VarPointsTo(y, oj) <-Load(y, x, f),VarPointsTo(x, oi),FieldPointsTo(oi, f, oj).
New
๐๐ ๐๐1๐๐ ๐๐3
Assign
๐๐ ๐๐๐๐ ๐๐
Store
๐๐ ๐๐ ๐๐๐๐ ๐๐ ๐๐
Load
๐๐ ๐๐ ๐๐
VarPointsTo
๐๐ ๐๐1๐๐ ๐๐3๐๐ ๐๐1๐๐ ๐๐3
FieldPointsTo
VarPointsTo(v:V, o:O)
FieldPointsTo(oi:O, f:F, oj:O)
New(x:V, o:O)
Assign(x:V, y:V) Load(x:V, y:V, f:F)
Store(x:V, f:F, y:V)
An Example
Tian Tan @ Nanjing University 78
1 b = new C();2 a = b;3 c = new C();4 c.f = a;5 d = c;6 c.f = d;7 e = d.f;
VarPointsTo(x, o) <-New(x, o).
VarPointsTo(x, o) <-Assign(x, y),VarPointsTo(y, o).
FieldPointsTo(oi, f, oj) <-Store(x, f, y),VarPointsTo(x, oi),VarPointsTo(y, oj).
VarPointsTo(y, oj) <-Load(y, x, f),VarPointsTo(x, oi),FieldPointsTo(oi, f, oj).
New
๐๐ ๐๐1๐๐ ๐๐3
Assign
๐๐ ๐๐๐๐ ๐๐
Store
๐๐ ๐๐ ๐๐๐๐ ๐๐ ๐๐
Load
๐๐ ๐๐ ๐๐
VarPointsTo
๐๐ ๐๐1๐๐ ๐๐3๐๐ ๐๐1๐๐ ๐๐3
VarPointsTo(v:V, o:O)
FieldPointsTo(oi:O, f:F, oj:O)
New(x:V, o:O)
Assign(x:V, y:V) Load(x:V, y:V, f:F)
Store(x:V, f:F, y:V)
FieldPointsTo
๐๐3 ๐๐ ๐๐1
FieldPointsTo(๐๐3,๐๐, ๐๐1) <-Store(๐๐,๐๐,๐๐),VarPointsTo(๐๐,๐๐3),VarPointsTo(๐๐,๐๐1).
An Example
Tian Tan @ Nanjing University 79
1 b = new C();2 a = b;3 c = new C();4 c.f = a;5 d = c;6 c.f = d;7 e = d.f;
VarPointsTo(x, o) <-New(x, o).
VarPointsTo(x, o) <-Assign(x, y),VarPointsTo(y, o).
FieldPointsTo(oi, f, oj) <-Store(x, f, y),VarPointsTo(x, oi),VarPointsTo(y, oj).
VarPointsTo(y, oj) <-Load(y, x, f),VarPointsTo(x, oi),FieldPointsTo(oi, f, oj).
New
๐๐ ๐๐1๐๐ ๐๐3
Assign
๐๐ ๐๐๐๐ ๐๐
Store
๐๐ ๐๐ ๐๐๐๐ ๐๐ ๐๐
Load
๐๐ ๐๐ ๐๐
VarPointsTo
๐๐ ๐๐1๐๐ ๐๐3๐๐ ๐๐1๐๐ ๐๐3
VarPointsTo(v:V, o:O)
FieldPointsTo(oi:O, f:F, oj:O)
New(x:V, o:O)
Assign(x:V, y:V) Load(x:V, y:V, f:F)
Store(x:V, f:F, y:V)
FieldPointsTo
๐๐3 ๐๐ ๐๐1๐๐3 ๐๐ ๐๐3
An Example
Tian Tan @ Nanjing University 80
1 b = new C();2 a = b;3 c = new C();4 c.f = a;5 d = c;6 c.f = d;7 e = d.f;
VarPointsTo(x, o) <-New(x, o).
VarPointsTo(x, o) <-Assign(x, y),VarPointsTo(y, o).
FieldPointsTo(oi, f, oj) <-Store(x, f, y),VarPointsTo(x, oi),VarPointsTo(y, oj).
VarPointsTo(y, oj) <-Load(y, x, f),VarPointsTo(x, oi),FieldPointsTo(oi, f, oj).
New
๐๐ ๐๐1๐๐ ๐๐3
Assign
๐๐ ๐๐๐๐ ๐๐
Store
๐๐ ๐๐ ๐๐๐๐ ๐๐ ๐๐
Load
๐๐ ๐๐ ๐๐
VarPointsTo
๐๐ ๐๐1๐๐ ๐๐3๐๐ ๐๐1๐๐ ๐๐3๐๐ ๐๐1๐๐ ๐๐3
VarPointsTo(v:V, o:O)
FieldPointsTo(oi:O, f:F, oj:O)
New(x:V, o:O)
Assign(x:V, y:V) Load(x:V, y:V, f:F)
Store(x:V, f:F, y:V)
FieldPointsTo
๐๐3 ๐๐ ๐๐1๐๐3 ๐๐ ๐๐3
An Example
Tian Tan @ Nanjing University 81
1 b = new C();2 a = b;3 c = new C();4 c.f = a;5 d = c;6 c.f = d;7 e = d.f;
New
๐๐ ๐๐1๐๐ ๐๐3
Assign
๐๐ ๐๐๐๐ ๐๐
Store
๐๐ ๐๐ ๐๐๐๐ ๐๐ ๐๐
Load
๐๐ ๐๐ ๐๐
VarPointsTo
๐๐ ๐๐1๐๐ ๐๐3๐๐ ๐๐1๐๐ ๐๐3๐๐ ๐๐1๐๐ ๐๐3
FieldPointsTo
๐๐3 ๐๐ ๐๐1๐๐3 ๐๐ ๐๐3
VarPointsTo(x, o) <-New(x, o).
VarPointsTo(x, o) <-Assign(x, y),VarPointsTo(y, o).
FieldPointsTo(oi, f, oj) <-Store(x, f, y),VarPointsTo(x, oi),VarPointsTo(y, oj).
VarPointsTo(y, oj) <-Load(y, x, f),VarPointsTo(x, oi),FieldPointsTo(oi, f, oj).
VarPointsTo(v:V, o:O)
FieldPointsTo(oi:O, f:F, oj:O)
New(x:V, o:O)
Assign(x:V, y:V) Load(x:V, y:V, f:F)
Store(x:V, f:F, y:V)
Handle Method Calls
Tian Tan @ Nanjing University 82
Kind Statement Rule
Call l: r = x.k(a1,โฆ,an)
๐๐๐๐ โ ๐๐๐๐ ๐ฅ๐ฅ , ๐๐ = Dispatch(๐๐๐๐ , k)๐๐๐ข๐ข โ ๐๐๐๐ ๐๐๐๐ , 1 โค ๐๐ โค ๐๐
๐๐๐ฃ๐ฃ โ ๐๐๐๐(๐๐๐๐๐๐๐๐)๐๐๐๐ โ ๐๐๐๐(๐๐๐๐๐ก๐๐๐ก๐ก)
๐๐๐ข๐ข โ ๐๐๐๐ ๐๐๐๐๐๐ , 1 โค ๐๐ โค ๐๐๐๐๐ฃ๐ฃ โ ๐๐๐๐(๐๐)
EDBโข VCall(l:S, x:V, k:M)โข Dispatch(o:O, k:M, m:M)โข ThisVar(m:M, this:V)IDBโข Reachable(m:M)โข CallGraph(l:S, m:M)
Statements (Labels): S
Methods: M
Handle Method Calls
Tian Tan @ Nanjing University 83
Kind Statement Rule
Call l: r = x.k(a1,โฆ,an)
๐๐๐๐ โ ๐๐๐๐ ๐ฅ๐ฅ , ๐๐ = Dispatch(๐๐๐๐ , k)๐๐๐ข๐ข โ ๐๐๐๐ ๐๐๐๐ , 1 โค ๐๐ โค ๐๐
๐๐๐ฃ๐ฃ โ ๐๐๐๐(๐๐๐๐๐๐๐๐)๐๐๐๐ โ ๐๐๐๐(๐๐๐๐๐ก๐๐๐ก๐ก)
๐๐๐ข๐ข โ ๐๐๐๐ ๐๐๐๐๐๐ , 1 โค ๐๐ โค ๐๐๐๐๐ฃ๐ฃ โ ๐๐๐๐(๐๐)
VarPointsTo(this, o),Reachable(m),CallGraph(l, m) <-
VCall(l, x, k),VarPointsTo(x, o),Dispatch(o, k, m),ThisVar(m, this).
EDBโข VCall(l:S, x:V, k:M)โข Dispatch(o:O, k:M, m:M)โข ThisVar(m:M, this:V)IDBโข Reachable(m:M)โข CallGraph(l:S, m:M)
Statements (Labels): S
Methods: M
Handle Method Calls
Tian Tan @ Nanjing University 84
Kind Statement Rule
Call l: r = x.k(a1,โฆ,an)
๐๐๐๐ โ ๐๐๐๐ ๐ฅ๐ฅ , ๐๐ = Dispatch(๐๐๐๐ , k)๐๐๐ข๐ข โ ๐๐๐๐ ๐๐๐๐ , 1 โค ๐๐ โค ๐๐
๐๐๐ฃ๐ฃ โ ๐๐๐๐(๐๐๐๐๐๐๐๐)๐๐๐๐ โ ๐๐๐๐(๐๐๐๐๐ก๐๐๐ก๐ก)
๐๐๐ข๐ข โ ๐๐๐๐ ๐๐๐๐๐๐ , 1 โค ๐๐ โค ๐๐๐๐๐ฃ๐ฃ โ ๐๐๐๐(๐๐)
VarPointsTo(pi, o) <-CallGraph(l, m),Argument(l, i, ai),Parameter(m, i, pi),VarPointsTo(ai, o).
EDBโข Argument(l:S, i:N, ai:V)โข Parameter(m:M, i:N, pi:V)
Statements (Labels): S
Methods: M
Nature numbers(indexes)
N
Handle Method Calls
Tian Tan @ Nanjing University 85
Kind Statement Rule
Call l: r = x.k(a1,โฆ,an)
๐๐๐๐ โ ๐๐๐๐ ๐ฅ๐ฅ , ๐๐ = Dispatch(๐๐๐๐ , k)๐๐๐ข๐ข โ ๐๐๐๐ ๐๐๐๐ , 1 โค ๐๐ โค ๐๐
๐๐๐ฃ๐ฃ โ ๐๐๐๐(๐๐๐๐๐๐๐๐)๐๐๐๐ โ ๐๐๐๐(๐๐๐๐๐ก๐๐๐ก๐ก)
๐๐๐ข๐ข โ ๐๐๐๐ ๐๐๐๐๐๐ , 1 โค ๐๐ โค ๐๐๐๐๐ฃ๐ฃ โ ๐๐๐๐(๐๐)
VarPointsTo(r, o) <-CallGraph(l, m),MethodReturn(m, ret),VarPointsTo(ret, o),CallReturn(l, r).
EDBโข MethodReturn(m:M, ret:V)โข CallReturn(l:S, r:V)
Statements (Labels): S
Methods: M
Handle Method Calls
Tian Tan @ Nanjing University 86
Kind Statement Rule
Call l: r = x.k(a1,โฆ,an)
๐๐๐๐ โ ๐๐๐๐ ๐ฅ๐ฅ , ๐๐ = Dispatch(๐๐๐๐ , k)๐๐๐ข๐ข โ ๐๐๐๐ ๐๐๐๐ , 1 โค ๐๐ โค ๐๐
๐๐๐ฃ๐ฃ โ ๐๐๐๐(๐๐๐๐๐๐๐๐)๐๐๐๐ โ ๐๐๐๐(๐๐๐๐๐ก๐๐๐ก๐ก)
๐๐๐ข๐ข โ ๐๐๐๐ ๐๐๐๐๐๐ , 1 โค ๐๐ โค ๐๐๐๐๐ฃ๐ฃ โ ๐๐๐๐(๐๐)
VarPointsTo(this, o),Reachable(m),CallGraph(l, m) <-
VCall(l, x, k),VarPointsTo(x, o),Dispatch(o, k, m),ThisVar(m, this).
VarPointsTo(pi, o) <-CallGraph(l, m),Argument(l, i, ai),Parameter(m, i, pi),VarPointsTo(ai, o).
VarPointsTo(r, o) <-CallGraph(l, m),MethodReturn(m, ret),VarPointsTo(ret, o),CallReturn(l, r).
Whole-Program Pointer Analysis
Tian Tan @ Nanjing University 87
Reachable(m) <-EntryMethod(m).
VarPointsTo(x, o) <-Reachable(m),New(x, o, m).
VarPointsTo(x, o) <-Assign(x, y),VarPointsTo(y, o).
FieldPointsTo(oi, f, oj) <-Store(x, f, y),VarPointsTo(x, oi),VarPointsTo(y, oj).
VarPointsTo(y, oj) <-Load(y, x, f),VarPointsTo(x, oi),FieldPointsTo(oi, f, oj).
VarPointsTo(this, o),Reachable(m),CallGraph(l, m) <-
VCall(l, x, k),VarPointsTo(x, o),Dispatch(o, k, m),ThisVar(m, this).
VarPointsTo(pi, o) <-CallGraph(l, m),Argument(l, i, ai),Parameter(m, i, pi),VarPointsTo(ai, o).
VarPointsTo(r, o) <-CallGraph(l, m),MethodReturn(m, ret),VarPointsTo(ret, o),CallReturn(l, r).
1. Motivation2. Introduction to Datalog3. Pointer Analysis via Datalog4. Taint Analysis via Datalog
88
Contents
Tian Tan @ Nanjing University
Datalog Model for Taint Analysis
On top of pointer analysisโข EDB predicates
โข Source(m : M) // source methodsโข Sink(m : M) // sink methodsโข Taint(l : S, t : T) // associates each call site to
the tainted data from the call site
โข IDB predicateโข TaintFlow(t : T, m : M) // detected taint flows,
e.g., TaintFlow(๐๐,๐๐) denotes that tainted data ๐๐ may flow to sink method ๐๐
Tian Tan @ Nanjing University 89
Taint Analysis via Datalog
Tian Tan @ Nanjing University 90
Kind Statement Rule
Call l: r = x.k(a1,โฆ,an)๐๐ โ ๐๐ โ ๐ถ๐ถ๐ถ๐ถ๐๐ โ ๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐ โ ๐๐๐๐(๐๐)
โข Handles sources (generates tainted data)
โข Handles sinks (generates taint flow information)Kind Statement Rule
Call l: r = x.k(a1,โฆ,an)
๐๐ โ ๐๐ โ ๐ถ๐ถ๐ถ๐ถ๐๐ โ ๐๐๐๐๐๐๐๐๐๐
โ๐๐, 1 โค ๐๐ โค ๐๐: ๐๐๐๐ โ ๐๐๐๐(๐๐๐๐)๐๐๐๐ ,๐๐ โ ๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐
VarPointsTo(r, t) <-CallGraph(l, m),Source(m),CallReturn(l, r),Taint(l, t).
TaintFlow(t, m) <-CallGraph(l, m),Sink(m),Argument(l, _, ai),VarPointsTo(ai, t),Taint(_, t).
Datalog-Based Program Analysis
โข Prosโข Succinct and readableโข Easy to implementโข Benefit from off-the-shelf optimized Datalog engines
โข Consโข Restricted expressiveness, i.e., it is impossible or
inconvenient to express some logicsโข Cannot fully control performance
Tian Tan @ Nanjing University 91
The X You Need To Understand in This Lecture
โข Datalog language
โข How to implement pointer analysis via Datalog
โข How to implement taint analysis via Datalog