Logic and Probability - Stanford Universityweb.stanford.edu/~icard/esslli2014/lecture3.pdf · 2014....
Transcript of Logic and Probability - Stanford Universityweb.stanford.edu/~icard/esslli2014/lecture3.pdf · 2014....
Logic and ProbabilityLecture 3: Beyond Boolean Logic
Wesley Holliday & Thomas IcardUC Berkeley & Stanford
August 13, 2014ESSLLI, Tubingen
Wesley Holliday & Thomas Icard: Logic and Probability, Lecture 3: Beyond Boolean Logic 1
Overview
I Last time: Nonmonotonic logic, Bayes nets, graphical models, . . .
I Today:
• Stochastic lambda calculus
• Probabilistic First-Order Logic
Wesley Holliday & Thomas Icard: Logic and Probability, Lecture 3: Beyond Boolean Logic 2
Part I:Stochastic Lambda Calculus
Wesley Holliday & Thomas Icard: Logic and Probability, Lecture 3: Beyond Boolean Logic 3
Stochastic Lambda Calculus
I Probabilistic programming languages give rise to more generalformalisms for defining probability distributions, with graphicalmodels as a basic special case.
I Stochastic Lambda Calculus: a perspicuous, simple formalism.
Wesley Holliday & Thomas Icard: Logic and Probability, Lecture 3: Beyond Boolean Logic 4
Stochastic Lambda Calculus
Review of basic Lambda Calculus
The λ-terms are given by a set of variables V and defined inductively:
I Each variable x ∈ V is a λ-term.
I If M and N are λ-terms, then so is M(N).
I If x ∈ V and M is a λ-term, then λx .M is a λ-term.
Main reduction rule, β-conversation:
(λx .M)(N)→β M [N/x]
Let →∗β be the reflexive, transitive closure of →β.
Wesley Holliday & Thomas Icard: Logic and Probability, Lecture 3: Beyond Boolean Logic 5
Stochastic Lambda Calculus
Church-Rosser for →∗β
M
N Q
Wesley Holliday & Thomas Icard: Logic and Probability, Lecture 3: Beyond Boolean Logic 6
Stochastic Lambda Calculus
Church-Rosser for →∗β
M
N Q
R
Wesley Holliday & Thomas Icard: Logic and Probability, Lecture 3: Beyond Boolean Logic 7
Stochastic Lambda Calculus
CorollaryIf a λ-term has a β-normal form, it is unique.
Wesley Holliday & Thomas Icard: Logic and Probability, Lecture 3: Beyond Boolean Logic 8
Stochastic Lambda Calculus
Boolean Logic in Lambda Calculus
> ≡ λx .λy .x ⊥ ≡ λx .λy .y
and ≡ λa.λb.bab if then else ≡ λa.λb.λc .abc
not ≡ λa. a⊥>
For M a boolean function of v1, . . . , vn, and T1, . . . ,Tn ∈ {>,⊥}:
(λv1 . . . λvn.M)(T1) . . . (Tn) →∗β
{>⊥
depending on whether the boolean formula is true or false with this input.
Wesley Holliday & Thomas Icard: Logic and Probability, Lecture 3: Beyond Boolean Logic 9
Stochastic Lambda Calculus
Numbers in Lambda Calculus
0 ≡ λf .λx .x
1 ≡ λf .λx .f (x)
2 ≡ λf .λx .f (f (x))
...
n ≡ λf .λx .f n(x)
succ ≡ λn.λf .λx .f (n(f (x)))
And so on . . .
Wesley Holliday & Thomas Icard: Logic and Probability, Lecture 3: Beyond Boolean Logic 10
Stochastic Lambda Calculus
Fixed-Point Combinators
Θ ≡(λx .λy .y(xxy)
)(λx .λy .y(xxy)
)‘Turing’s combinator’ Θ has the feature that, for any term M,
ΘM →∗β M(ΘM) .
This allows great flexibility, e.g., in defining recursive functions. But italso means trouble for logic, viz. Curry’s Paradox.
Wesley Holliday & Thomas Icard: Logic and Probability, Lecture 3: Beyond Boolean Logic 11
Stochastic Lambda Calculus
Curry’s Paradox
Suppose we devise a proof system in λ-calculus with the following rules:
A ⊃ B AB
(A ⊃ (A ⊃ B)
)⊃ (A ⊃ B)
A A =β B
B
Define M ≡ λx .x ⊃ (x ⊃ ⊥), and let N ≡ ΘM. It is easy to show
N =β N ⊃ (N ⊃ ⊥) .
But then we can evidently prove ⊥:
1.(N ⊃ (N ⊃ ⊥)) ⊃ (N ⊃ ⊥)
2. N ⊃ (N ⊃ ⊥)3. N
4. N ⊃ ⊥5. ⊥
Wesley Holliday & Thomas Icard: Logic and Probability, Lecture 3: Beyond Boolean Logic 12
Stochastic Lambda Calculus
Adding Probability
Stochastic lambda calculus adds to lambda calculus a new operator:
M ⊕N
with intended interpretation of probabilistic choice between M and N.
Write Mp7→ N for: M reduces to normal form N with probability p.
In particular, then, M ⊕N127→ M and M ⊕N
127→ N if M and N are in
normal form, and more generally,
∑M
p7→N
p ≤ 1
Stochastic λ-terms are themselves random variables!
Wesley Holliday & Thomas Icard: Logic and Probability, Lecture 3: Beyond Boolean Logic 13
Stochastic Lambda Calculus
I Evidently, this allows us to define term denoting a 50/50 ‘coin flip’:
flip ≡ >⊕⊥
I What about other types of coin flips, i.e., Bernoulli variables?
Wesley Holliday & Thomas Icard: Logic and Probability, Lecture 3: Beyond Boolean Logic 14
Stochastic Lambda Calculus
Theorem (von Neumann)Suppose p ∈ [0, 1] is computable, i.e., there is a λ-term N such that
N(k) →∗β
{> if the kth digit in the binary expansion of p is 1
⊥ if the kth digit is 0
Then there is a stochastic λ-term M such that: Mp7→ > and M
1−p7→ ⊥.
Proof.
Define E to be the following term:
E ≡ λe.λk .flip(N(k) e
(succ(k)
)⊥) (
N(k) > e(succ(k)
)).
and let M ≡ ΘE , which returns > with probability p and ⊥ with 1− p.
Wesley Holliday & Thomas Icard: Logic and Probability, Lecture 3: Beyond Boolean Logic 15
Stochastic Lambda Calculus
Theorem (von Neumann)The same result holds for arbitrary domains, not just > and ⊥.
Wesley Holliday & Thomas Icard: Logic and Probability, Lecture 3: Beyond Boolean Logic 16
Stochastic Lambda Calculus
Defining Graphical Models
For exposition, assume we are working with binary variables, and let Tp
be a term that returns > with probably p, and ⊥ with probability 1− p.
Wesley Holliday & Thomas Icard: Logic and Probability, Lecture 3: Beyond Boolean Logic 17
Stochastic Lambda Calculus
Defining Graphical Models
X Y
Z
P(X ) = 0.1 P(Y ) = 0.2
P(Z |X ,Y ) = 0.9
P(Z |X ,Y ) = 0.8
P(Z |X ,Y ) = 0.6
P(Z |X ,Y ) = 0.2
Wesley Holliday & Thomas Icard: Logic and Probability, Lecture 3: Beyond Boolean Logic 18
Stochastic Lambda Calculus
Defining Graphical Models
X := T0.1
Y := T0.2
Z := if (X and Y ) then T0.9
elif (X and not Y ) then T0.8
elif (not X and Y ) then T0.6
elif (not X and not Y ) then T0.2
Wesley Holliday & Thomas Icard: Logic and Probability, Lecture 3: Beyond Boolean Logic 19
Stochastic Lambda Calculus
PCFGs
S := 〈NP,VP〉VP := if T0.75 then 〈V ,NP〉 else 〈V ,NP,PP〉NP := if T0.7 then 〈det,N〉 else 〈NP,PP〉PP := 〈P,NP〉
...
N := if T0.02 then cat else . . .
Wesley Holliday & Thomas Icard: Logic and Probability, Lecture 3: Beyond Boolean Logic 20
Stochastic Lambda Calculus
Given (meta-)variable names for programs,
X1, . . . ,Xn ,
which we can think of as random variables, we can ask questions such as:
P(Xi = v)? P(Xi = v |Y1 = u1, . . . ,Ym = um)? . . .
for an extremely general class of random variables.
In principle one could import the same tools on probability-preservingdeduction—of Suppes, Adams, etc., from Lecture 1—to this setting. Butone must be careful, witness Curry’s Paradox, etc.
(N.B. For more on actual programming languages based on these ideas,viz. Church, etc., see Noah Goodman’s class next week.)
Wesley Holliday & Thomas Icard: Logic and Probability, Lecture 3: Beyond Boolean Logic 21
Part II:Probabilistic First-Order Logic
Wesley Holliday & Thomas Icard: Logic and Probability, Lecture 3: Beyond Boolean Logic 22
Probabilistic First-Order Logic
Let L be a first-order logical language, given by:
I a set C of individual constants ;
I a set V of individual variables,
I a set P of predicate variables .
Terms and formulas of L are defined as usual:
ϕ ::= R(t1, . . . , tn) | ϕ ∧ ϕ | ¬ϕ | ∃xϕ | ∀xϕ
Wesley Holliday & Thomas Icard: Logic and Probability, Lecture 3: Beyond Boolean Logic 23
Probabilistic First-Order Logic
Define SL to be the set of sentences of L, i.e., formulas with no freevariables, and S0L to be the set of quantifier-free sentences of L.
As before, a probability on L′ ⊆ SL is a function P : L′ → [0, 1], with
I P(ϕ) = 1, for any first-order tautology ϕ ;
I P(ϕ ∨ ψ) = P(ϕ) + P(ψ), whenever � ¬(ϕ ∧ ψ) .
Question: Given a probability P : S0L → [0, 1], is there a naturalextension of P to all of SL, including quantified sentences?
(Cf. Markov Logic.)
Wesley Holliday & Thomas Icard: Logic and Probability, Lecture 3: Beyond Boolean Logic 24
Probabilistic First-Order Logic
Question: Given a probability P : S0L → [0, 1], is there a naturalextension of P to all of SL, including quantified sentences?
If there are only finitely many constants c such that P(R(c)
)> 0, then:
P(∃xR(x)
)= P
(∨R(c)
)What about in the case where the number of c is infinite?
Wesley Holliday & Thomas Icard: Logic and Probability, Lecture 3: Beyond Boolean Logic 25
Probabilistic First-Order Logic
ExampleConsider a simple first-order arithmetical language L, with a constant nfor each n ∈N+. Let R(x) be a one-place predicate. Define a probabilityfunction P : S0L → [0, 1] on the quantifier-free sentences as follows:
I P(R(n)
)= 1/2n+1, for all n ∈N ;
I P(∧
i≤k R(ni ))= ∏i≤k P
(R(ni )
).
Then, intuitively,
P(∃xP(x)
)=
∞
∑n=2
1
2n=
1
2.
Wesley Holliday & Thomas Icard: Logic and Probability, Lecture 3: Beyond Boolean Logic 26
Probabilistic First-Order Logic
Definition (Gaifman’s Condition)A probability P∗ : SL → [0, 1] satisfies the Gaifman condition if for allformulas with one free variable ϕ(x):
P∗(∃xϕ(x)
)= sup {P∗
( n∨i=1
ϕ(ci ))| c1, . . . , cn ∈ C} ,
or equivalently,
P∗(∀xϕ(x)
)= inf {P∗
( n∧i=1
ϕ(ci ))| c1, . . . , cn ∈ C} .
Theorem (Gaifman 1964)Given P : S0L → [0, 1], there is exactly one extension P∗ of P to all ofSL that satisfies the Gaifman condition.
Wesley Holliday & Thomas Icard: Logic and Probability, Lecture 3: Beyond Boolean Logic 27
Probabilistic First-Order Logic
Probabilities on Models
I So far we have been focusing on probability functions P defined on alanguage L. But we can also start with a probability measure µ overmodels and induce a probability on formulas.
I Recall models of first-order languages:
M = 〈D, I , γ〉
• D is a set of individuals
• I is a function interpreting predicates, constants, etc.
• γ is an assignment function for variables.
Wesley Holliday & Thomas Icard: Logic and Probability, Lecture 3: Beyond Boolean Logic 28
Probabilistic First-Order Logic
I Consider a set of models M =⋃i∈I {Mi}, and σ-algebra E on M.
We will require that each set
Mϕ := {M ∈M :M |= ϕ} ,
for ϕ ∈ SL is measurable, i.e, an element of E .
I We can define a measure µ : E → [0, 1], which induces an obviousprobability Pµ : SL → [0, 1] on our first-order language L:
Pµ(ϕ) = µ(Mϕ) .
It is then easy to show that Pµ is a probability in our previous sense.
Question: Does Pµ always satisfy the Gaifman condition?
Wesley Holliday & Thomas Icard: Logic and Probability, Lecture 3: Beyond Boolean Logic 29
Probabilistic First-Order Logic
Suppose that every element of every domain of every model is named bya constant, according to all interpretation functions. Then:
FactGiven µ, the probability Pµ satisfies the Gaifman condition.
Proof.
Pµ(∃xϕx) = µ({M ∈M :M |= ∃xϕx}
)= µ
({M ∈M :M |= ϕ(c), for some c ∈ C}
)= sup
{µ{M ∈M :M |=
∨i≤n
ϕ(ci )} | c1, . . . , cn ∈ C}
= sup{Pµ
( ∨i≤n
ϕ(ci ))| c1, . . . , cn ∈ C
}Note that one may need to use countable additivity in this argument! a
Wesley Holliday & Thomas Icard: Logic and Probability, Lecture 3: Beyond Boolean Logic 30
Probabilistic First-Order Logic
First-Order Probability Logic
I Add probability operators to the first-order language:
π(ϕ)
and corresponding atomic formulas, for a, b, c , . . . ∈ R, e.g.:
aπ(ϕ)2 + bπ(ψ) = c
Wesley Holliday & Thomas Icard: Logic and Probability, Lecture 3: Beyond Boolean Logic 31
Probabilistic First-Order Logic
I We can also express, e.g., certain facts about conditional probability:
π(ϕ | ψ) ≥ 2/3 ≡ 3π(ϕ ∧ ψ) ≥ 2π(ψ)
...
I Models of this language are tuples:
〈M, E , µ,M〉
• M is a set of models with E a σ-algebra over M ;• (M, E , µ) is a probability space ;• M ∈M is a distinguished model in M .
I But note that in the background there are also real numbers!
Wesley Holliday & Thomas Icard: Logic and Probability, Lecture 3: Beyond Boolean Logic 32
Probabilistic First-Order Logic
Note on Terminology
Beware, we now have three different types of probabilities. We havemostly been using the following terminology:
µ(E ) — measure of event E , as part of a probability space
P(ϕ) — meta-language symbol for probability of a sentence ϕ
π(ϕ) — object language symbol for probability of a sentence ϕ
Tomorrow we will add a sentence forming modal operator P>r in the
object language, where P>r ϕ is not a term but a sentence.
Wesley Holliday & Thomas Icard: Logic and Probability, Lecture 3: Beyond Boolean Logic 33
Probabilistic First-Order Logic
Language and InterpretationI Two sorts of variables: object variables x , y , z , . . . , and field variables
x , y , z , . . . . Plus field constants 0 and 1 and plus and times.
I Field terms:
t ::= x | y | · · · | 0 | 1 | π(ϕ) | t ∗ t | t + t
I Then the language is given as follows:
ϕ ::= P(x1, . . . , xn) | x = y | t1 = t2 | ϕ ∧ ϕ | ¬ϕ | ∀xϕ | ∀xϕ
I Interpretation of field terms:
[[π(ϕ)]]〈M,E ,µ,M〉 = µ({M′ ∈M :M′ |= ϕ})
Wesley Holliday & Thomas Icard: Logic and Probability, Lecture 3: Beyond Boolean Logic 34
Probabilistic First-Order Logic
I This is clearly a very expressive language. For instance, all of theprobability axioms come out as validities in the object language.
I If Γ � ϕ in this language, that means any probability distributionthat satisfies all the probabilistic (and other) requirements in Γ, alsosatisfies whatever requirement ϕ specifies.
I One might also like to study maximum entropy inference in thissetting. See, e.g., Plaskin (2002).
Wesley Holliday & Thomas Icard: Logic and Probability, Lecture 3: Beyond Boolean Logic 35
Probabilistic First-Order Logic
Complexity
I In general, the problem of determining validity in this language isundecidable. Indeed, the theory of models in this language is noteven axiomatizable. (See Abadi & Halpern 1989.)
I However, if we assume the domain is finite, it is possible to give anaxiomatization, and prove decidability.
I One can also obtain a complete axiomatization by allowingnon-standard reals (see Bacchus 1988).
Wesley Holliday & Thomas Icard: Logic and Probability, Lecture 3: Beyond Boolean Logic 36
Probabilistic First-Order Logic
Axiomatization
I The axioms and rules of classical first-order logic.
I The axioms for real closed fields, e.g., that multiplication andaddition are commutative and associated, that every odd degreepolynomial has a root, etc. (Tarski)
I ϕ→(π(ϕ) = 1
), provided the only predicate symbols occurring in
ϕ occur only as subformulas of some π(ψ).
I π(ϕ) ≥ 0
I π(ϕ) = π(ϕ ∧ ¬ψ) + π(ϕ ∧ ψ)
I From ϕ↔ ψ, infer π(ϕ) = π(ψ)
I ∃x1 . . . xn∀y(y = x1 ∨ . . . ∨ y = xn)
Wesley Holliday & Thomas Icard: Logic and Probability, Lecture 3: Beyond Boolean Logic 37
Probabilistic First-Order Logic
Theorem (Halpern 1990)This logic is sound and complete with respect to probability models ofdomain size at most n.
Wesley Holliday & Thomas Icard: Logic and Probability, Lecture 3: Beyond Boolean Logic 38
Probabilistic First-Order Logic
Another way of curtailing complexity:
I Still add probability operators to the first-order language:
π(ϕ)
but corresponding atomic formulas, for a1, . . . , an, b ∈ R:
a1π(ϕ1) + . . . + anπ(ϕn) > b
I Abbreviations:
π(ϕ)− π(ψ) > b ≡ π(ϕ) + (−1)π(ψ) > b
π(ϕ) > π(ψ) ≡ π(ϕ)− π(ψ) > 0
π(ϕ) ≤ b ≡ ¬(π(ϕ) > b
)π(ϕ) ≥ b ≡ (−1)π(ϕ) ≤ −bπ(ϕ) = b ≡
(π(ϕ) ≥ b
)∧(π(ϕ) ≤ b
)Wesley Holliday & Thomas Icard: Logic and Probability, Lecture 3: Beyond Boolean Logic 39
Probabilistic First-Order Logic
I Then the language is given as follows:
ϕ ::= P(x1, . . . , xn) | a1π(ϕ) + . . . + anπ(ϕ) > b |
a = b | x = y | ϕ ∧ ϕ | ¬ϕ | ∀xϕ | ∀xϕ
I The crucial truth clause:
〈M, E , µ,M〉 |= a1π(ϕ1) + . . . + anπ(ϕn) > b iff
[[a1]] ∗ µ([[ϕ1]]) + · · ·+ [[a1]] ∗ µ([[ϕn]]) > b in R
Theorem (Halpern 2003)Adding axioms for linear inequalities gives a sound and completeaxiomatization for this language.
Wesley Holliday & Thomas Icard: Logic and Probability, Lecture 3: Beyond Boolean Logic 40
Probabilistic First-Order Logic
Measures on the Domain
I This language seem ill-suited for reasoning about what might becalled a chance setup, e.g., the probability that a randomly chosenstudent in Tubingen being German is high.
I E.g., we should not expect this formula to have high probability:
∀x(StudentTbg(x)→ German(x)
)I To deal with these kinds of statements, Bacchus (1988) proposed a
slightly different language, with operators
πx(
ϕ(x))> b ,
meaning, roughly, the probability of a randomly chosen x satisfyingϕ is greater than b.
Wesley Holliday & Thomas Icard: Logic and Probability, Lecture 3: Beyond Boolean Logic 41
Probabilistic First-Order Logic
I Otherwise the language is just the same as before.
I Models, however, are simpler:
M = 〈D, I , νD , γ〉
with ν : D → [0, 1] a map such that ∑d∈D ν(d) = 1.
I The interpretation of a probability subformula πx (ϕ) is given as:
[[πx (ϕ)]]M = ν({d ∈ D :M[d/x] |= ϕ}
)I The rest of interpretation is as before.
Wesley Holliday & Thomas Icard: Logic and Probability, Lecture 3: Beyond Boolean Logic 42
Probabilistic First-Order Logic
Complexity
I This logic, too, is undecidable and unaxiomatizable (Abadi &Halpern 1989).
I However, once again, by restricting to models of bounded size, theclass becomes axiomatizable.
I Alternatively, we can restrict the language to linear inequalities andinclude a truth clause:
M |= a1πx (ϕ1) + . . . + anπx (ϕn) > b iff
[[a1]] ∗ [[πx (ϕ1)]] + · · ·+ [[an]] ∗ [[πx (ϕn)]] > b in R
This system is again axiomatizable.
Wesley Holliday & Thomas Icard: Logic and Probability, Lecture 3: Beyond Boolean Logic 43
Probabilistic First-Order Logic
Theorem (Halpern 1990)This class of models of size at most n is axiomatized by the principlesand rules of first order logic, those of real closed fields, and:
I ∀xϕ→(πx (ϕ) = 1
)I πx (ϕ) ≥ 0
I πx (ϕ) = πx (ϕ ∧ ψ) + πx (ϕ ∧ ¬ψ)
I πx (ϕ) = πy (ϕ[y/x])
I From ϕ↔ ψ, infer πx (ϕ) = πx (ψ)
I ∃x1 . . . xn∀y(y = x1 ∨ . . . ∨ y = xn).
Wesley Holliday & Thomas Icard: Logic and Probability, Lecture 3: Beyond Boolean Logic 44
Probabilistic First-Order Logic
I Of course, these two languages (and classes of models) can also becombined, so that one can reason, e.g., about such statements:
π(
πx(German(x) | StudentTbg(x)
)≥ 1
2
)≥ 3
4
I Combining the two axiomatizations gives completeness for thecombined language, again for models of bounded size.
I See Halpern (1990) for details.
Wesley Holliday & Thomas Icard: Logic and Probability, Lecture 3: Beyond Boolean Logic 45
Probabilistic First-Order Logic
Summary and Preview
I Untyped lambda calculus with a coin flip operation provides arepresentation language for defining a very rich class of probabilisticprocesses and models, even if it does not straightforwardly give riseto a logical system as we would typically think of it.
I There is a canonical way of extending a measure on the quantifierfree sentences of a first-order language to all sentences.
I An important theme is the difference between defining a probabilityP on a logical language and defining a measure µ on a class oflogical models. Today we focused on the latter, including richfirst-order languages for talking about these measures.
I These logics are highly complex. Tomorrow and Friday we will lookat much simpler languages for logical reasoning about probability.
Wesley Holliday & Thomas Icard: Logic and Probability, Lecture 3: Beyond Boolean Logic 46