Logic and Probability - Stanford Universityweb.stanford.edu/~icard/esslli2014/lecture3.pdf · 2014....

Logic and ProbabilityLecture 3: Beyond Boolean Logic

Wesley Holliday & Thomas IcardUC Berkeley & Stanford

August 13, 2014ESSLLI, Tubingen

Wesley Holliday & Thomas Icard: Logic and Probability, Lecture 3: Beyond Boolean Logic 1

Overview

I Last time: Nonmonotonic logic, Bayes nets, graphical models, . . .

I Today:

• Stochastic lambda calculus

• Probabilistic First-Order Logic


Part I:Stochastic Lambda Calculus


Stochastic Lambda Calculus

I Probabilistic programming languages give rise to more generalformalisms for defining probability distributions, with graphicalmodels as a basic special case.

I Stochastic Lambda Calculus: a perspicuous, simple formalism.



Review of basic Lambda Calculus

The λ-terms are given by a set of variables V and defined inductively:

I Each variable x ∈ V is a λ-term.

I If M and N are λ-terms, then so is M(N).

I If x ∈ V and M is a λ-term, then λx .M is a λ-term.

Main reduction rule, β-conversation:

(λx .M)(N)→β M [N/x]

Let →∗β be the reflexive, transitive closure of →β.



Church-Rosser for →∗β

M

N Q



Church-Rosser for →∗β

M

N Q

R



CorollaryIf a λ-term has a β-normal form, it is unique.



Boolean Logic in Lambda Calculus

> ≡ λx .λy .x ⊥ ≡ λx .λy .y

and ≡ λa.λb.bab if then else ≡ λa.λb.λc .abc

not ≡ λa. a⊥>

For M a boolean function of v1, . . . , vn, and T1, . . . ,Tn ∈ {>,⊥}:

(λv1 . . . λvn.M)(T1) . . . (Tn) →∗β

{>⊥

depending on whether the boolean formula is true or false with this input.



Numbers in Lambda Calculus

0 ≡ λf .λx .x

1 ≡ λf .λx .f (x)

2 ≡ λf .λx .f (f (x))

...

n ≡ λf .λx .f n(x)

succ ≡ λn.λf .λx .f (n(f (x)))

And so on . . .



Fixed-Point Combinators

Θ ≡(λx .λy .y(xxy)

)(λx .λy .y(xxy)

)‘Turing’s combinator’ Θ has the feature that, for any term M,

ΘM →∗β M(ΘM) .

This allows great flexibility, e.g., in defining recursive functions. But italso means trouble for logic, viz. Curry’s Paradox.



Curry’s Paradox

Suppose we devise a proof system in λ-calculus with the following rules:

A ⊃ B AB

(A ⊃ (A ⊃ B)

)⊃ (A ⊃ B)

A A =β B

B

Define M ≡ λx .x ⊃ (x ⊃ ⊥), and let N ≡ ΘM. It is easy to show

N =β N ⊃ (N ⊃ ⊥) .

But then we can evidently prove ⊥:

1.(N ⊃ (N ⊃ ⊥)) ⊃ (N ⊃ ⊥)

2. N ⊃ (N ⊃ ⊥)3. N

4. N ⊃ ⊥5. ⊥



Adding Probability

Stochastic lambda calculus adds to lambda calculus a new operator:

M ⊕N

with intended interpretation of probabilistic choice between M and N.

Write Mp7→ N for: M reduces to normal form N with probability p.

In particular, then, M ⊕N127→ M and M ⊕N

127→ N if M and N are in

normal form, and more generally,

∑M

p7→N

p ≤ 1

Stochastic λ-terms are themselves random variables!



I Evidently, this allows us to define term denoting a 50/50 ‘coin flip’:

flip ≡ >⊕⊥

I What about other types of coin flips, i.e., Bernoulli variables?



Theorem (von Neumann)Suppose p ∈ [0, 1] is computable, i.e., there is a λ-term N such that

N(k) →∗β

{> if the kth digit in the binary expansion of p is 1

⊥ if the kth digit is 0

Then there is a stochastic λ-term M such that: Mp7→ > and M

1−p7→ ⊥.

Proof.

Define E to be the following term:

E ≡ λe.λk .flip(N(k) e

(succ(k)

)⊥) (

N(k) > e(succ(k)

)).

and let M ≡ ΘE , which returns > with probability p and ⊥ with 1− p.



Theorem (von Neumann)The same result holds for arbitrary domains, not just > and ⊥.



Defining Graphical Models

For exposition, assume we are working with binary variables, and let Tp

be a term that returns > with probably p, and ⊥ with probability 1− p.




X Y

Z

P(X ) = 0.1 P(Y ) = 0.2

P(Z |X ,Y ) = 0.9

P(Z |X ,Y ) = 0.8

P(Z |X ,Y ) = 0.6

P(Z |X ,Y ) = 0.2




X := T0.1

Y := T0.2

Z := if (X and Y ) then T0.9

elif (X and not Y ) then T0.8

elif (not X and Y ) then T0.6

elif (not X and not Y ) then T0.2



PCFGs

S := 〈NP,VP〉VP := if T0.75 then 〈V ,NP〉 else 〈V ,NP,PP〉NP := if T0.7 then 〈det,N〉 else 〈NP,PP〉PP := 〈P,NP〉

...

N := if T0.02 then cat else . . .



Given (meta-)variable names for programs,

X1, . . . ,Xn ,

which we can think of as random variables, we can ask questions such as:

P(Xi = v)? P(Xi = v |Y1 = u1, . . . ,Ym = um)? . . .

for an extremely general class of random variables.

In principle one could import the same tools on probability-preservingdeduction—of Suppes, Adams, etc., from Lecture 1—to this setting. Butone must be careful, witness Curry’s Paradox, etc.

(N.B. For more on actual programming languages based on these ideas,viz. Church, etc., see Noah Goodman’s class next week.)


Part II:Probabilistic First-Order Logic


Probabilistic First-Order Logic

Let L be a first-order logical language, given by:

I a set C of individual constants ;

I a set V of individual variables,

I a set P of predicate variables .

Terms and formulas of L are defined as usual:

ϕ ::= R(t1, . . . , tn) | ϕ ∧ ϕ | ¬ϕ | ∃xϕ | ∀xϕ



Define SL to be the set of sentences of L, i.e., formulas with no freevariables, and S0L to be the set of quantifier-free sentences of L.

As before, a probability on L′ ⊆ SL is a function P : L′ → [0, 1], with

I P(ϕ) = 1, for any first-order tautology ϕ ;

I P(ϕ ∨ ψ) = P(ϕ) + P(ψ), whenever � ¬(ϕ ∧ ψ) .

Question: Given a probability P : S0L → [0, 1], is there a naturalextension of P to all of SL, including quantified sentences?

(Cf. Markov Logic.)



Question: Given a probability P : S0L → [0, 1], is there a naturalextension of P to all of SL, including quantified sentences?

If there are only finitely many constants c such that P(R(c)

)> 0, then:

P(∃xR(x)

)= P

(∨R(c)

)What about in the case where the number of c is infinite?



ExampleConsider a simple first-order arithmetical language L, with a constant nfor each n ∈N+. Let R(x) be a one-place predicate. Define a probabilityfunction P : S0L → [0, 1] on the quantifier-free sentences as follows:

I P(R(n)

)= 1/2n+1, for all n ∈N ;

I P(∧

i≤k R(ni ))= ∏i≤k P

(R(ni )

).

Then, intuitively,

P(∃xP(x)

)=

∞

∑n=2

1

2n=

1

2.



Definition (Gaifman’s Condition)A probability P∗ : SL → [0, 1] satisfies the Gaifman condition if for allformulas with one free variable ϕ(x):

P∗(∃xϕ(x)

)= sup {P∗

( n∨i=1

ϕ(ci ))| c1, . . . , cn ∈ C} ,

or equivalently,

P∗(∀xϕ(x)

)= inf {P∗

( n∧i=1

ϕ(ci ))| c1, . . . , cn ∈ C} .

Theorem (Gaifman 1964)Given P : S0L → [0, 1], there is exactly one extension P∗ of P to all ofSL that satisfies the Gaifman condition.



Probabilities on Models

I So far we have been focusing on probability functions P defined on alanguage L. But we can also start with a probability measure µ overmodels and induce a probability on formulas.

I Recall models of first-order languages:

M = 〈D, I , γ〉

• D is a set of individuals

• I is a function interpreting predicates, constants, etc.

• γ is an assignment function for variables.



I Consider a set of models M =⋃i∈I {Mi}, and σ-algebra E on M.

We will require that each set

Mϕ := {M ∈M :M |= ϕ} ,

for ϕ ∈ SL is measurable, i.e, an element of E .

I We can define a measure µ : E → [0, 1], which induces an obviousprobability Pµ : SL → [0, 1] on our first-order language L:

Pµ(ϕ) = µ(Mϕ) .

It is then easy to show that Pµ is a probability in our previous sense.

Question: Does Pµ always satisfy the Gaifman condition?



Suppose that every element of every domain of every model is named bya constant, according to all interpretation functions. Then:

FactGiven µ, the probability Pµ satisfies the Gaifman condition.

Proof.

Pµ(∃xϕx) = µ({M ∈M :M |= ∃xϕx}

)= µ

({M ∈M :M |= ϕ(c), for some c ∈ C}

)= sup

{µ{M ∈M :M |=

∨i≤n

ϕ(ci )} | c1, . . . , cn ∈ C}

= sup{Pµ

( ∨i≤n

ϕ(ci ))| c1, . . . , cn ∈ C

}Note that one may need to use countable additivity in this argument! a



First-Order Probability Logic

I Add probability operators to the first-order language:

π(ϕ)

and corresponding atomic formulas, for a, b, c , . . . ∈ R, e.g.:

aπ(ϕ)2 + bπ(ψ) = c



I We can also express, e.g., certain facts about conditional probability:

π(ϕ | ψ) ≥ 2/3 ≡ 3π(ϕ ∧ ψ) ≥ 2π(ψ)

...

I Models of this language are tuples:

〈M, E , µ,M〉

• M is a set of models with E a σ-algebra over M ;• (M, E , µ) is a probability space ;• M ∈M is a distinguished model in M .

I But note that in the background there are also real numbers!



Note on Terminology

Beware, we now have three different types of probabilities. We havemostly been using the following terminology:

µ(E ) — measure of event E , as part of a probability space

P(ϕ) — meta-language symbol for probability of a sentence ϕ

π(ϕ) — object language symbol for probability of a sentence ϕ

Tomorrow we will add a sentence forming modal operator P>r in the

object language, where P>r ϕ is not a term but a sentence.



Language and InterpretationI Two sorts of variables: object variables x , y , z , . . . , and field variables

x , y , z , . . . . Plus field constants 0 and 1 and plus and times.

I Field terms:

t ::= x | y | · · · | 0 | 1 | π(ϕ) | t ∗ t | t + t

I Then the language is given as follows:

ϕ ::= P(x1, . . . , xn) | x = y | t1 = t2 | ϕ ∧ ϕ | ¬ϕ | ∀xϕ | ∀xϕ

I Interpretation of field terms:

[[π(ϕ)]]〈M,E ,µ,M〉 = µ({M′ ∈M :M′ |= ϕ})



I This is clearly a very expressive language. For instance, all of theprobability axioms come out as validities in the object language.

I If Γ � ϕ in this language, that means any probability distributionthat satisfies all the probabilistic (and other) requirements in Γ, alsosatisfies whatever requirement ϕ specifies.

I One might also like to study maximum entropy inference in thissetting. See, e.g., Plaskin (2002).



Complexity

I In general, the problem of determining validity in this language isundecidable. Indeed, the theory of models in this language is noteven axiomatizable. (See Abadi & Halpern 1989.)

I However, if we assume the domain is finite, it is possible to give anaxiomatization, and prove decidability.

I One can also obtain a complete axiomatization by allowingnon-standard reals (see Bacchus 1988).



Axiomatization

I The axioms and rules of classical first-order logic.

I The axioms for real closed fields, e.g., that multiplication andaddition are commutative and associated, that every odd degreepolynomial has a root, etc. (Tarski)

I ϕ→(π(ϕ) = 1

), provided the only predicate symbols occurring in

ϕ occur only as subformulas of some π(ψ).

I π(ϕ) ≥ 0

I π(ϕ) = π(ϕ ∧ ¬ψ) + π(ϕ ∧ ψ)

I From ϕ↔ ψ, infer π(ϕ) = π(ψ)

I ∃x1 . . . xn∀y(y = x1 ∨ . . . ∨ y = xn)



Theorem (Halpern 1990)This logic is sound and complete with respect to probability models ofdomain size at most n.



Another way of curtailing complexity:

I Still add probability operators to the first-order language:

π(ϕ)

but corresponding atomic formulas, for a1, . . . , an, b ∈ R:

a1π(ϕ1) + . . . + anπ(ϕn) > b

I Abbreviations:

π(ϕ)− π(ψ) > b ≡ π(ϕ) + (−1)π(ψ) > b

π(ϕ) > π(ψ) ≡ π(ϕ)− π(ψ) > 0

π(ϕ) ≤ b ≡ ¬(π(ϕ) > b

)π(ϕ) ≥ b ≡ (−1)π(ϕ) ≤ −bπ(ϕ) = b ≡

(π(ϕ) ≥ b

)∧(π(ϕ) ≤ b

)Wesley Holliday & Thomas Icard: Logic and Probability, Lecture 3: Beyond Boolean Logic 39


I Then the language is given as follows:

ϕ ::= P(x1, . . . , xn) | a1π(ϕ) + . . . + anπ(ϕ) > b |

a = b | x = y | ϕ ∧ ϕ | ¬ϕ | ∀xϕ | ∀xϕ

I The crucial truth clause:

〈M, E , µ,M〉 |= a1π(ϕ1) + . . . + anπ(ϕn) > b iff

[[a1]] ∗ µ([[ϕ1]]) + · · ·+ [[a1]] ∗ µ([[ϕn]]) > b in R

Theorem (Halpern 2003)Adding axioms for linear inequalities gives a sound and completeaxiomatization for this language.



Measures on the Domain

I This language seem ill-suited for reasoning about what might becalled a chance setup, e.g., the probability that a randomly chosenstudent in Tubingen being German is high.

I E.g., we should not expect this formula to have high probability:

∀x(StudentTbg(x)→ German(x)

)I To deal with these kinds of statements, Bacchus (1988) proposed a

slightly different language, with operators

πx(

ϕ(x))> b ,

meaning, roughly, the probability of a randomly chosen x satisfyingϕ is greater than b.



I Otherwise the language is just the same as before.

I Models, however, are simpler:

M = 〈D, I , νD , γ〉

with ν : D → [0, 1] a map such that ∑d∈D ν(d) = 1.

I The interpretation of a probability subformula πx (ϕ) is given as:

[[πx (ϕ)]]M = ν({d ∈ D :M[d/x] |= ϕ}

)I The rest of interpretation is as before.



Complexity

I This logic, too, is undecidable and unaxiomatizable (Abadi &Halpern 1989).

I However, once again, by restricting to models of bounded size, theclass becomes axiomatizable.

I Alternatively, we can restrict the language to linear inequalities andinclude a truth clause:

M |= a1πx (ϕ1) + . . . + anπx (ϕn) > b iff

[[a1]] ∗ [[πx (ϕ1)]] + · · ·+ [[an]] ∗ [[πx (ϕn)]] > b in R

This system is again axiomatizable.



Theorem (Halpern 1990)This class of models of size at most n is axiomatized by the principlesand rules of first order logic, those of real closed fields, and:

I ∀xϕ→(πx (ϕ) = 1

)I πx (ϕ) ≥ 0

I πx (ϕ) = πx (ϕ ∧ ψ) + πx (ϕ ∧ ¬ψ)

I πx (ϕ) = πy (ϕ[y/x])

I From ϕ↔ ψ, infer πx (ϕ) = πx (ψ)

I ∃x1 . . . xn∀y(y = x1 ∨ . . . ∨ y = xn).



I Of course, these two languages (and classes of models) can also becombined, so that one can reason, e.g., about such statements:

π(

πx(German(x) | StudentTbg(x)

)≥ 1

2

)≥ 3

4

I Combining the two axiomatizations gives completeness for thecombined language, again for models of bounded size.

I See Halpern (1990) for details.



Summary and Preview

I Untyped lambda calculus with a coin flip operation provides arepresentation language for defining a very rich class of probabilisticprocesses and models, even if it does not straightforwardly give riseto a logical system as we would typically think of it.

I There is a canonical way of extending a measure on the quantifierfree sentences of a first-order language to all sentences.

I An important theme is the difference between defining a probabilityP on a logical language and defining a measure µ on a class oflogical models. Today we focused on the latter, including richfirst-order languages for talking about these measures.

I These logics are highly complex. Tomorrow and Friday we will lookat much simpler languages for logical reasoning about probability.


Logic and Probability - Stanford Universityweb.stanford.edu/~icard/esslli2014/lecture3.pdf · 2014....

Documents

Transcript of Logic and Probability - Stanford Universityweb.stanford.edu/~icard/esslli2014/lecture3.pdf · 2014....