Lambda Calculus by Dustin Mulcahey

Post on 10-May-2015

1.900 views 2 download

Tags:

description

This is a friendly Lambda Calculus Introduction by Dustin Mulcahey. LISP has its syntactic roots in a formal system called the lambda calculus. After a brief discussion of formal systems and logic in general, Dustin will dive in to the lambda calculus and make enough constructions to convince you that it really is capable of expressing anything that is "computable". Dustin then talks about the simply typed lambda calculus and the Curry-Howard-Lambek correspondence, which asserts that programs and mathematical proofs are "the same thing".

Transcript of Lambda Calculus by Dustin Mulcahey

Lambda CalculusDustin Mulcahey

First, a crash course in mathematicallogic...

First, a crash course in mathematicallogic...

First, a crash course in mathematicallogic...

First, a crash course in mathematicallogic...

For computer scientists, the most interesting part ofthis discussion is Hilbert’s Entscheidungsproblem.

Entscheidungsproblem: Given a mathematicalstatement, is there an algorithm that will compute aproof or a refutation of that statement?

At the time of its statement by Hilbert, there wasno formalization of “algorithm”.

For computer scientists, the most interesting part ofthis discussion is Hilbert’s Entscheidungsproblem.

Entscheidungsproblem: Given a mathematicalstatement, is there an algorithm that will compute aproof or a refutation of that statement?

At the time of its statement by Hilbert, there wasno formalization of “algorithm”.

For computer scientists, the most interesting part ofthis discussion is Hilbert’s Entscheidungsproblem.

Entscheidungsproblem: Given a mathematicalstatement, is there an algorithm that will compute aproof or a refutation of that statement?

At the time of its statement by Hilbert, there wasno formalization of “algorithm”.

For computer scientists, the most interesting part ofthis discussion is Hilbert’s Entscheidungsproblem.

Entscheidungsproblem: Given a mathematicalstatement, is there an algorithm that will compute aproof or a refutation of that statement?

At the time of its statement by Hilbert, there wasno formalization of “algorithm”.

Fast-forward to the late 1920s...

Schoenfinkel: “Bound variables are bad. (or, atleast, unnecessary)”

Fast-forward to the late 1920s...

Schoenfinkel: “Bound variables are bad. (or, atleast, unnecessary)”

Schoenfinkel defined the basic combinators thatform the “combinatory logic” (SKI). We’ll definethese in terms of the lambda calculus, once we’vedefined that.

Haskell Curry also formulated the concept of“combinator” in his efforts to unambiguously definesubstitution, which had been rather looselydescribed up until his time (and continues to beloosely described up to this day).

Schoenfinkel defined the basic combinators thatform the “combinatory logic” (SKI). We’ll definethese in terms of the lambda calculus, once we’vedefined that.

Haskell Curry also formulated the concept of“combinator” in his efforts to unambiguously definesubstitution, which had been rather looselydescribed up until his time (and continues to beloosely described up to this day).

Schoenfinkel also seems to have originated thenotion of “currying” (named after Haskell Curry).This is the idea that you can take a two argumentfunction

F (x , y)

and express it as a one argument function that isvalued in functions:

(f (x))(y)

Finally, on to Alonzo Church!

Goal: a new formal system for logic based upon thenotion of function application. He wantedsomething “more natural” than Russell-Whiteheador ZF.

The formal system that he developed is called thelambda calculus.Here is the identity function expressed in thelambda calculus:

λx .x

The formal system that he developed is called thelambda calculus.Here is the identity function expressed in thelambda calculus:

λx .x

Why use λ for function abstraction?

Whitehead and Russell used x̂ for class abstraction.If you move the hat off the x , you get ∧x .Apparently, λx was easier to print than ∧x .

At least, that’s how Church told it at one point.Later in life, he claimed that he needed a symboland he just happened to choose λ.

Why use λ for function abstraction?

Whitehead and Russell used x̂ for class abstraction.If you move the hat off the x , you get ∧x .Apparently, λx was easier to print than ∧x .

At least, that’s how Church told it at one point.Later in life, he claimed that he needed a symboland he just happened to choose λ.

Why use λ for function abstraction?

Whitehead and Russell used x̂ for class abstraction.If you move the hat off the x , you get ∧x .Apparently, λx was easier to print than ∧x .

At least, that’s how Church told it at one point.Later in life, he claimed that he needed a symboland he just happened to choose λ.

In a formal system, we must give clear rules aboutwhat sequence of symbols can be produced and howthey can be transformed. It’s very similar todesigning a programming language.

To formulate the lambda calculus, we must first fixa set of letters that we will use for variables.Typically, we denote these by x , y (we rarely needmore than two). Once this has been done, weinductively define valid lambda terms:

I If x is a variable, then x is a valid lambda term.

I If t is a valid lambda term and x is a variable,then (λx .t) is a valid lambda term. (LambdaAbstraction)

I If t, s are valid lambda terms, then (t s) is avalid lambda term. (Application)

That’s it! We can now construct all sorts of lambdaterms:

x (variable)(λx .x) (lambda abstraction)

y (variable)(y y) (application)

((λx .x)(y y)) (application)(λy .((λx .x)(y y))) (lambda abstraction)

While I have given the intuition behind the aboveconstructs, they are mere scribblings on paper untilwe give rules for manipulating the terms. From aproof-theoretic perspective, meaning arises from thereduction rules of the language.

This is quite different from other notions ofmeaning, such as Tarski’s definition of truth ordenotionation semantics (in fact, we shall see thatdenotational semantics turns out to be aninteresting problem for the lambda calculus).

While I have given the intuition behind the aboveconstructs, they are mere scribblings on paper untilwe give rules for manipulating the terms. From aproof-theoretic perspective, meaning arises from thereduction rules of the language.

This is quite different from other notions ofmeaning, such as Tarski’s definition of truth ordenotionation semantics (in fact, we shall see thatdenotational semantics turns out to be aninteresting problem for the lambda calculus).

There are three rules for manipulating lambdaterms:

I α-equivalence: renaming variables

I β-reduction: how function application “works”

I η-conversion: two functions are “the same” ifthey do the same thing (extensionality)

α-equivalence lets us convert λx .x to λy .y .

Makes

sense, right? They are both the identity function.Generally, α-equivalence lets us rename any bound

variables.

α-equivalence lets us convert λx .x to λy .y . Makes

sense, right? They are both the identity function.Generally, α-equivalence lets us rename any bound

variables.

As programmers, we use α-equivalence to reasonabout lexical scoping:

x = 0

f = function(x, y) {

return x + y;

}

print f(3,4);

is equivalent to:

x = 0;

f = function(a, b) {

return a + b;

}

print f(3,4);

As you can imagine, formally defining α-equivalenceis a bit tricky. We want λx .x α-equivalent to λy .y ,but we do not want λx .(λy .x) α-equivalent toλy .(λy .y).

(The first takes a value and produces the constantfunction at that value, while the second returns theidentity function no matter what’s passed to it.)

As you can imagine, formally defining α-equivalenceis a bit tricky. We want λx .x α-equivalent to λy .y ,but we do not want λx .(λy .x) α-equivalent toλy .(λy .y).

(The first takes a value and produces the constantfunction at that value, while the second returns theidentity function no matter what’s passed to it.)

β-reduction captures the notion of functionapplication. However, to formally define it, we runin to the substitution problem again!

Intuitively, we would like (f x) to denote theapplication of a function f to an input x . Of course,in this world, everything has the same “type”, so weare really applying one lambda term to another.

β-reduction captures the notion of functionapplication. However, to formally define it, we runin to the substitution problem again!

Intuitively, we would like (f x) to denote theapplication of a function f to an input x . Of course,in this world, everything has the same “type”, so weare really applying one lambda term to another.

For a simple example of β-reduction, let’s apply theidentity function to something.

((λx .x)(λy .(y y))

ought to reduce to

(λy .(y y))

For a simple example of β-reduction, let’s apply theidentity function to something.

((λx .x)(λy .(y y))

ought to reduce to

(λy .(y y))

How about the other way around?

((λy .(y y))(λx .x)

((λx .x)(λx .x))(λx .x)

How about the other way around?

((λy .(y y))(λx .x)((λx .x)(λx .x))

(λx .x)

How about the other way around?

((λy .(y y))(λx .x)((λx .x)(λx .x))

(λx .x)

We define β-reduction as follows. Let

((λx .t) s)

be a valid lambda term with t and s lambda termsand x a variable. The above reduces to

t[s/x ]

where t[s/x ] denotes the result of replacing everyoccurrence of x in t by s.

Problem: what if our usage of variables is a bit tooincestuous?

Example:

t = (λz .(x y))s = z

Now apply β-reduction:

((λx .t) s)((λx .(λz .(x y))) z)

(λz .(z y))

Problem: what if our usage of variables is a bit tooincestuous?

Example:

t = (λz .(x y))s = z

Now apply β-reduction:

((λx .t) s)((λx .(λz .(x y))) z)

(λz .(z y))

Problem: what if our usage of variables is a bit tooincestuous?

Example:

t = (λz .(x y))s = z

Now apply β-reduction:

((λx .t) s)((λx .(λz .(x y))) z)

(λz .(z y))

Whereas if we first did α-equivalence:

t = (λw .(x y))s = z

And then apply β-reduction:

((λx .t) s)((λx .(λw .(x y))) z)

(λw .(z y))

The function on the previous slide applies itsparameter to the free variable y , whereas thefunction on this slide does nothing with itsparameter!

Whereas if we first did α-equivalence:

t = (λw .(x y))s = z

And then apply β-reduction:

((λx .t) s)((λx .(λw .(x y))) z)

(λw .(z y))

The function on the previous slide applies itsparameter to the free variable y , whereas thefunction on this slide does nothing with itsparameter!

Whereas if we first did α-equivalence:

t = (λw .(x y))s = z

And then apply β-reduction:

((λx .t) s)((λx .(λw .(x y))) z)

(λw .(z y))

The function on the previous slide applies itsparameter to the free variable y , whereas thefunction on this slide does nothing with itsparameter!

So, obviously some care is needed when definingsubstitution.

We need to ensure that in

((λx .t) s)

that s does not contain a free variable that becomesbound when s is substituted for x in t.

The next and final reduction expresses themathematical principle of extensionality.

Informally, we say that two functions areextensionally equal if they do the same thing.That is,

f (x) = x + 2g(x) = x + 1 + 1

are two different functions as I have written them,but extensionally equal.

The next and final reduction expresses themathematical principle of extensionality.Informally, we say that two functions areextensionally equal if they do the same thing.

That is,

f (x) = x + 2g(x) = x + 1 + 1

are two different functions as I have written them,but extensionally equal.

The next and final reduction expresses themathematical principle of extensionality.Informally, we say that two functions areextensionally equal if they do the same thing.That is,

f (x) = x + 2g(x) = x + 1 + 1

are two different functions as I have written them,but extensionally equal.

However (as an aside),

f (x) = x2−4x−2

g(x) = x + 2

Are neither equal nor extensionally equal, butalgebraically reduce to the same thing.

η-conversion captures this notion by stating that, forlambda expressions f not containing the variable x ,

(λx .(f x))

is equivalent to

f

That’s enough math! Let’s do some programming.

Well, I can do what any beginning (or intermediate,or advanced) programmer does:

((λx .(x x))(λx .(x x)))

Let’s apply β-reduction:

((λx .(x x))(λx .(x x)))

Yay! An infinite loop!So we see that the true strength of the lambdacalculus is the speed at which we can write downinfinite computations.

Well, I can do what any beginning (or intermediate,or advanced) programmer does:

((λx .(x x))(λx .(x x)))

Let’s apply β-reduction:

((λx .(x x))(λx .(x x)))

Yay! An infinite loop!So we see that the true strength of the lambdacalculus is the speed at which we can write downinfinite computations.

Well, I can do what any beginning (or intermediate,or advanced) programmer does:

((λx .(x x))(λx .(x x)))

Let’s apply β-reduction:

((λx .(x x))(λx .(x x)))

Yay! An infinite loop!So we see that the true strength of the lambdacalculus is the speed at which we can write downinfinite computations.

Or, even better:

(λx .((x x) x)(λx .((x x) x)(((λx .((x x) x) (λx .((x x) x)) (λx .((x x) x))

((((λx .((x x) x) (λx .((x x) x)) (λx .((x x) x)) (λx .((x x) x))...

This example shows that not all lambda termsnormalize. That is, given a lambda term, you can’talways just whack it with β-reduction until it settlesinto something!

Or, even better:

(λx .((x x) x)(λx .((x x) x)

(((λx .((x x) x) (λx .((x x) x)) (λx .((x x) x))((((λx .((x x) x) (λx .((x x) x)) (λx .((x x) x)) (λx .((x x) x))

...

This example shows that not all lambda termsnormalize. That is, given a lambda term, you can’talways just whack it with β-reduction until it settlesinto something!

Or, even better:

(λx .((x x) x)(λx .((x x) x)(((λx .((x x) x) (λx .((x x) x)) (λx .((x x) x))

((((λx .((x x) x) (λx .((x x) x)) (λx .((x x) x)) (λx .((x x) x))...

This example shows that not all lambda termsnormalize. That is, given a lambda term, you can’talways just whack it with β-reduction until it settlesinto something!

Or, even better:

(λx .((x x) x)(λx .((x x) x)(((λx .((x x) x) (λx .((x x) x)) (λx .((x x) x))

((((λx .((x x) x) (λx .((x x) x)) (λx .((x x) x)) (λx .((x x) x))

...

This example shows that not all lambda termsnormalize. That is, given a lambda term, you can’talways just whack it with β-reduction until it settlesinto something!

Or, even better:

(λx .((x x) x)(λx .((x x) x)(((λx .((x x) x) (λx .((x x) x)) (λx .((x x) x))

((((λx .((x x) x) (λx .((x x) x)) (λx .((x x) x)) (λx .((x x) x))...

This example shows that not all lambda termsnormalize. That is, given a lambda term, you can’talways just whack it with β-reduction until it settlesinto something!

To make things that are more interesting thannon-terminating programs, we need to define somebasic things. I will now define the following:

I numbers

I booleans and conditionals

I recursion

To make things that are more interesting thannon-terminating programs, we need to define somebasic things. I will now define the following:

I numbers

I booleans and conditionals

I recursion

To make things that are more interesting thannon-terminating programs, we need to define somebasic things. I will now define the following:

I numbers

I booleans and conditionals

I recursion

The standard formulation of the natural numbers iscalled the system of Church Numerals.

Intuition: The number n is n-fold composition.

(Speaking of non-termination...)

The standard formulation of the natural numbers iscalled the system of Church Numerals.

Intuition: The number n is n-fold composition.

(Speaking of non-termination...)

The standard formulation of the natural numbers iscalled the system of Church Numerals.

Intuition: The number n is n-fold composition.

(Speaking of non-termination...)

Less cyclic: The number n is a function that takes afunction and returns the nth-fold composite of thatfunction.

(Hmm, still looks cyclic to me.)That is,

n(f ) = f ◦ f ◦ f ◦ . . . ◦ f

which we can denote as f ◦n.

Less cyclic: The number n is a function that takes afunction and returns the nth-fold composite of thatfunction.

(Hmm, still looks cyclic to me.)

That is,

n(f ) = f ◦ f ◦ f ◦ . . . ◦ f

which we can denote as f ◦n.

Less cyclic: The number n is a function that takes afunction and returns the nth-fold composite of thatfunction.

(Hmm, still looks cyclic to me.)That is,

n(f ) = f ◦ f ◦ f ◦ . . . ◦ f

which we can denote as f ◦n.

Less cyclic: The number n is a function that takes afunction and returns the nth-fold composite of thatfunction.

(Hmm, still looks cyclic to me.)That is,

n(f ) = f ◦ f ◦ f ◦ . . . ◦ f

which we can denote as f ◦n.

Formally,

0 ≡ f 7→ id ≡ λf .(λx .x)

1 ≡ f 7→ f ≡ λf .(λx .(f x))2 ≡ f 7→ f ◦ f ≡ λf .(λx .(f (f x))

and so on...

Formally,

0 ≡ f 7→ id ≡ λf .(λx .x)1 ≡ f 7→ f ≡ λf .(λx .(f x))

2 ≡ f 7→ f ◦ f ≡ λf .(λx .(f (f x))and so on...

Formally,

0 ≡ f 7→ id ≡ λf .(λx .x)1 ≡ f 7→ f ≡ λf .(λx .(f x))2 ≡

f 7→ f ◦ f ≡ λf .(λx .(f (f x))and so on...

Formally,

0 ≡ f 7→ id ≡ λf .(λx .x)1 ≡ f 7→ f ≡ λf .(λx .(f x))2 ≡ f 7→ f ◦ f

≡ λf .(λx .(f (f x))and so on...

Formally,

0 ≡ f 7→ id ≡ λf .(λx .x)1 ≡ f 7→ f ≡ λf .(λx .(f x))2 ≡ f 7→ f ◦ f ≡ λf .(λx .(f (f x))

and so on...

Formally,

0 ≡ f 7→ id ≡ λf .(λx .x)1 ≡ f 7→ f ≡ λf .(λx .(f x))2 ≡ f 7→ f ◦ f ≡ λf .(λx .(f (f x))

and so on...

There are two rules for constructing naturalnumbers:

0 ≡ λf .(λx .x)

and if n is a natural number, then

n + 1 ≡ succ(n) ≡ λf .λx .(f ((n f ) x)

There are two rules for constructing naturalnumbers:

0 ≡ λf .(λx .x)

and if n is a natural number, then

n + 1 ≡ succ(n) ≡ λf .λx .(f ((n f ) x)

There are two rules for constructing naturalnumbers:

0 ≡ λf .(λx .x)

and if n is a natural number, then

n + 1 ≡

succ(n) ≡ λf .λx .(f ((n f ) x)

There are two rules for constructing naturalnumbers:

0 ≡ λf .(λx .x)

and if n is a natural number, then

n + 1 ≡ succ(n) ≡

λf .λx .(f ((n f ) x)

There are two rules for constructing naturalnumbers:

0 ≡ λf .(λx .x)

and if n is a natural number, then

n + 1 ≡ succ(n) ≡ λf .λx .(f ((n f ) x)

Is this definition consistent with what I’ve shownyou?

1 ≡

succ(0)λf .λx .(f ((0 f ) x))λf .λx .(f ((λf .λx .x) f ) x))λf .λx .(f ((λx .x) x))λf .λx .(f x)

Is this definition consistent with what I’ve shownyou?

1 ≡ succ(0)

λf .λx .(f ((0 f ) x))λf .λx .(f ((λf .λx .x) f ) x))λf .λx .(f ((λx .x) x))λf .λx .(f x)

Is this definition consistent with what I’ve shownyou?

1 ≡ succ(0)λf .λx .(f ((0 f ) x))

λf .λx .(f ((λf .λx .x) f ) x))λf .λx .(f ((λx .x) x))λf .λx .(f x)

Is this definition consistent with what I’ve shownyou?

1 ≡ succ(0)λf .λx .(f ((0 f ) x))λf .λx .(f ((λf .λx .x) f ) x))

λf .λx .(f ((λx .x) x))λf .λx .(f x)

Is this definition consistent with what I’ve shownyou?

1 ≡ succ(0)λf .λx .(f ((0 f ) x))λf .λx .(f ((λf .λx .x) f ) x))λf .λx .(f ((λx .x) x))

λf .λx .(f x)

Is this definition consistent with what I’ve shownyou?

1 ≡ succ(0)λf .λx .(f ((0 f ) x))λf .λx .(f ((λf .λx .x) f ) x))λf .λx .(f ((λx .x) x))λf .λx .(f x)

You’ll notice that I’ve suddenly started using thesymbol ≡

If a ≡ b, I’m declaring by the powers of notationthat wherever you write a, you can also write b (andvice versa).

You’ll notice that I’ve suddenly started using thesymbol ≡If a ≡ b, I’m declaring by the powers of notationthat wherever you write a, you can also write b (andvice versa).

Also note that succ is itself a lambda term:

succ ≡ λn.λf .λx .(f ((n f ) x))

Here, n is not boldface because I’m using it as avariable. The user of our succ function could putanything there! Of course, we only gaurantee goodbehavior on an input that is equivalent to a naturalnumber (as we have defined them).

Okay, we have natural numbers. How aboutaddition?

Intuition: n + m takes a function and composes itn + m times. Strategy: Let’s write a lambda termthat applies f m times, “and then”applies it ntimes. In the world of functions, “and then” meanscomposition! So addition corresponds tocomposition.

add ≡ (λn.λm.λf .λx .((n f ) ((m f ) x)))

Okay, we have natural numbers. How aboutaddition?

Intuition: n + m takes a function and composes itn + m times.

Strategy: Let’s write a lambda termthat applies f m times, “and then”applies it ntimes. In the world of functions, “and then” meanscomposition! So addition corresponds tocomposition.

add ≡ (λn.λm.λf .λx .((n f ) ((m f ) x)))

Okay, we have natural numbers. How aboutaddition?

Intuition: n + m takes a function and composes itn + m times. Strategy: Let’s write a lambda termthat applies f m times, “and then”applies it ntimes.

In the world of functions, “and then” meanscomposition! So addition corresponds tocomposition.

add ≡ (λn.λm.λf .λx .((n f ) ((m f ) x)))

Okay, we have natural numbers. How aboutaddition?

Intuition: n + m takes a function and composes itn + m times. Strategy: Let’s write a lambda termthat applies f m times, “and then”applies it ntimes. In the world of functions, “and then” meanscomposition! So addition corresponds tocomposition.

add ≡ (λn.λm.λf .λx .((n f ) ((m f ) x)))

Okay, we have natural numbers. How aboutaddition?

Intuition: n + m takes a function and composes itn + m times. Strategy: Let’s write a lambda termthat applies f m times, “and then”applies it ntimes. In the world of functions, “and then” meanscomposition! So addition corresponds tocomposition.

add ≡ (λn.λm.λf .λx .((n f ) ((m f ) x)))

Theorem: ((add 2) 2) is equivalent to 4Proof: (I’m going to use a mixture of definitionalequality and reductions)

((add 2) 2)

(((λn.λm.λf .λx .((n f ) ((m f ) x))) 2) 2)((λm.λf .λx .((2 f ) ((m f ) x))) 2)

(λf .λx .((2 f ) ((2 f ) x)))(λf .λx .(((λf .λx .(f (f x)) f ) ((λf .(λx .(f (f x)) f ) x)))

(λf .λx .((λx .(f (f x)) (λx .(f (f x)) x)))(λf .λx .((λx .(f (f x)) (f (f x))))

(λf .λx .(λf .(f (f (f (f x))))))4

Theorem: ((add 2) 2) is equivalent to 4Proof: (I’m going to use a mixture of definitionalequality and reductions)

((add 2) 2)(((λn.λm.λf .λx .((n f ) ((m f ) x))) 2) 2)

((λm.λf .λx .((2 f ) ((m f ) x))) 2)(λf .λx .((2 f ) ((2 f ) x)))

(λf .λx .(((λf .λx .(f (f x)) f ) ((λf .(λx .(f (f x)) f ) x)))(λf .λx .((λx .(f (f x)) (λx .(f (f x)) x)))

(λf .λx .((λx .(f (f x)) (f (f x))))(λf .λx .(λf .(f (f (f (f x))))))

4

Theorem: ((add 2) 2) is equivalent to 4Proof: (I’m going to use a mixture of definitionalequality and reductions)

((add 2) 2)(((λn.λm.λf .λx .((n f ) ((m f ) x))) 2) 2)

((λm.λf .λx .((2 f ) ((m f ) x))) 2)

(λf .λx .((2 f ) ((2 f ) x)))(λf .λx .(((λf .λx .(f (f x)) f ) ((λf .(λx .(f (f x)) f ) x)))

(λf .λx .((λx .(f (f x)) (λx .(f (f x)) x)))(λf .λx .((λx .(f (f x)) (f (f x))))

(λf .λx .(λf .(f (f (f (f x))))))4

Theorem: ((add 2) 2) is equivalent to 4Proof: (I’m going to use a mixture of definitionalequality and reductions)

((add 2) 2)(((λn.λm.λf .λx .((n f ) ((m f ) x))) 2) 2)

((λm.λf .λx .((2 f ) ((m f ) x))) 2)(λf .λx .((2 f ) ((2 f ) x)))

(λf .λx .(((λf .λx .(f (f x)) f ) ((λf .(λx .(f (f x)) f ) x)))

(λf .λx .((λx .(f (f x)) (λx .(f (f x)) x)))(λf .λx .((λx .(f (f x)) (f (f x))))

(λf .λx .(λf .(f (f (f (f x))))))4

Theorem: ((add 2) 2) is equivalent to 4Proof: (I’m going to use a mixture of definitionalequality and reductions)

((add 2) 2)(((λn.λm.λf .λx .((n f ) ((m f ) x))) 2) 2)

((λm.λf .λx .((2 f ) ((m f ) x))) 2)(λf .λx .((2 f ) ((2 f ) x)))

(λf .λx .(((λf .λx .(f (f x)) f ) ((λf .(λx .(f (f x)) f ) x)))(λf .λx .((λx .(f (f x)) (λx .(f (f x)) x)))

(λf .λx .((λx .(f (f x)) (f (f x))))(λf .λx .(λf .(f (f (f (f x))))))

4

Theorem: ((add 2) 2) is equivalent to 4Proof: (I’m going to use a mixture of definitionalequality and reductions)

((add 2) 2)(((λn.λm.λf .λx .((n f ) ((m f ) x))) 2) 2)

((λm.λf .λx .((2 f ) ((m f ) x))) 2)(λf .λx .((2 f ) ((2 f ) x)))

(λf .λx .(((λf .λx .(f (f x)) f ) ((λf .(λx .(f (f x)) f ) x)))(λf .λx .((λx .(f (f x)) (λx .(f (f x)) x)))

(λf .λx .((λx .(f (f x)) (f (f x))))

(λf .λx .(λf .(f (f (f (f x))))))4

Theorem: ((add 2) 2) is equivalent to 4Proof: (I’m going to use a mixture of definitionalequality and reductions)

((add 2) 2)(((λn.λm.λf .λx .((n f ) ((m f ) x))) 2) 2)

((λm.λf .λx .((2 f ) ((m f ) x))) 2)(λf .λx .((2 f ) ((2 f ) x)))

(λf .λx .(((λf .λx .(f (f x)) f ) ((λf .(λx .(f (f x)) f ) x)))(λf .λx .((λx .(f (f x)) (λx .(f (f x)) x)))

(λf .λx .((λx .(f (f x)) (f (f x))))(λf .λx .(λf .(f (f (f (f x))))))

4

As you can see, doing arithmetic with Churchnumerals is both simple and fun.

What about multiplication?

Intuition: (n ∗m) takes a function and returns then ∗mth fold composite of the function with itself.Strategy: Make the mth composite of f n times.

mult = λn.λm.λf .λx .((n (m f )) x)

Intuition: (n ∗m) takes a function and returns then ∗mth fold composite of the function with itself.Strategy: Make the mth composite of f n times.

mult = λn.λm.λf .λx .((n (m f )) x)

Theorem: ((mult 2) 2) is equivalent to 4

Proof: This is left as an exercise for the reader.

Theorem: ((mult 2) 2) is equivalent to 4Proof: This is left as an exercise for the reader.

Exponentiation is also straightforward:Strategy: To get mn, apply n to m. Remember thatm takes a function and returns the mth foldcomposite. So now we take the nth fold compositeof the function that takes a function and returns themth fold composite. So now we have a functionthat takes a function and returns the mnth foldcomposite.

Clear, right? How about this:

(n m)f = (m ◦m ◦ · · · ◦m)f

(Remember that composition corresponds toaddition.)

Exponentiation is also straightforward:Strategy: To get mn, apply n to m. Remember thatm takes a function and returns the mth foldcomposite. So now we take the nth fold compositeof the function that takes a function and returns themth fold composite. So now we have a functionthat takes a function and returns the mnth foldcomposite. Clear, right? How about this:

(n m)f = (m ◦m ◦ · · · ◦m)f

(Remember that composition corresponds toaddition.)

Exponentiation is also straightforward:Strategy: To get mn, apply n to m. Remember thatm takes a function and returns the mth foldcomposite. So now we take the nth fold compositeof the function that takes a function and returns themth fold composite. So now we have a functionthat takes a function and returns the mnth foldcomposite. Clear, right? How about this:

(n m)f = (m ◦m ◦ · · · ◦m)f

(Remember that composition corresponds toaddition.)

Exponentiation is also straightforward:Strategy: To get mn, apply n to m. Remember thatm takes a function and returns the mth foldcomposite. So now we take the nth fold compositeof the function that takes a function and returns themth fold composite. So now we have a functionthat takes a function and returns the mnth foldcomposite. Clear, right? How about this:

(n m)f = (m ◦m ◦ · · · ◦m)f

(Remember that composition corresponds toaddition.)

In lambda form:

exp ≡ λm.λn.λf λx .(((n m) f ) x)

Subtraction is much trickier. The mostunderstandable way to do it (that I know of) is touse pairing.

Idea: Instead of incrementing x to x + 1, let’s takethe pair (n,m) to the pair (m,m + 1). If we start at(0, 0), we’ll get the following sequence:

(0, 0) 7→ (0, 1) 7→ (1, 2) 7→ (2, 3) · · ·

So, to get the predecessor of n, we just do theabove process n times and then take the firstcoordinate of the result. How’s that for efficiency?

Subtraction is much trickier. The mostunderstandable way to do it (that I know of) is touse pairing.

Idea: Instead of incrementing x to x + 1, let’s takethe pair (n,m) to the pair (m,m + 1). If we start at(0, 0), we’ll get the following sequence:

(0, 0) 7→ (0, 1) 7→ (1, 2) 7→ (2, 3) · · ·

So, to get the predecessor of n, we just do theabove process n times and then take the firstcoordinate of the result. How’s that for efficiency?

Subtraction is much trickier. The mostunderstandable way to do it (that I know of) is touse pairing.

Idea: Instead of incrementing x to x + 1, let’s takethe pair (n,m) to the pair (m,m + 1). If we start at(0, 0), we’ll get the following sequence:

(0, 0) 7→ (0, 1) 7→ (1, 2) 7→ (2, 3) · · ·

So, to get the predecessor of n, we just do theabove process n times and then take the firstcoordinate of the result. How’s that for efficiency?

Okay, how do we make pairs?

Well, it will help to first define booleans andconditionals.

Okay, how do we make pairs?

Well, it will help to first define booleans andconditionals.

A few definitions:

true ≡ λx .λy .x

false ≡ λx .λy .y

cond ≡ λc .λt.λf .((c t) f

A few definitions:

true ≡ λx .λy .x

false ≡ λx .λy .y

cond ≡ λc .λt.λf .((c t) f

A few definitions:

true ≡ λx .λy .x

false ≡ λx .λy .y

cond ≡ λc .λt.λf .((c t) f

A few definitions:

true ≡ λx .λy .x

false ≡ λx .λy .y

cond ≡ λc .λt.λf .((c t) f

A few definitions:

true ≡ λx .λy .x

false ≡ λx .λy .y

cond ≡ λc .λt.λf .((c t) f

To make a pair of lambda terms, we will store themboth in a cond. To get the first, we apply cond totrue. To get the second, we apply cond to false.

pair ≡ λf .λs.λc .(((cond c)s)t)

To make a pair of lambda terms, we will store themboth in a cond. To get the first, we apply cond totrue. To get the second, we apply cond to false.

pair ≡ λf .λs.λc .(((cond c)s)t)

To make a pair of lambda terms, we will store themboth in a cond. To get the first, we apply cond totrue. To get the second, we apply cond to false.

pair ≡ λf .λs.λc .(((cond c)s)t)

What about my pair increment function?

paircrement ≡ λp.((pair (p false))(succ (p true)))

So, the predecessor function looks like:

pred ≡ λn.(((n paircrement) ((pair 0) 0) true)

So, the predecessor function looks like:

pred ≡ λn.(((n paircrement) ((pair 0) 0) true)

Also, we can detect when something is zero:

isZero ≡ λn.((n(λx . false)) true)

Phew! We now have conditionals and arithmetic.

... and with pairs, we could go ahead and define therationals right now. But I’m not going to.

Instead, I want to plunge into recursion!

Phew! We now have conditionals and arithmetic.

... and with pairs, we could go ahead and define therationals right now. But I’m not going to.

Instead, I want to plunge into recursion!

Phew! We now have conditionals and arithmetic.

... and with pairs, we could go ahead and define therationals right now. But I’m not going to.

Instead, I want to plunge into recursion!

Okay, to do recursion, I need a function to callitself.

Except in our formal system of lambda

calculus, there is no notion of variable binding. Allwe have are ways of constructing lambda terms andways of reducing them to other lambda terms. Howdo we do this?

Okay, to do recursion, I need a function to callitself. Except in our formal system of lambda

calculus, there is no notion of variable binding. Allwe have are ways of constructing lambda terms andways of reducing them to other lambda terms.

Howdo we do this?

Okay, to do recursion, I need a function to callitself. Except in our formal system of lambda

calculus, there is no notion of variable binding. Allwe have are ways of constructing lambda terms andways of reducing them to other lambda terms. Howdo we do this?

Yes, this is where we start talking about the Ycombinator.

There are a bunch of explanations of this thing, andwhat follows is one of them.

Yes, this is where we start talking about the Ycombinator.

There are a bunch of explanations of this thing, andwhat follows is one of them.

Let’s start with a recursive function:

fact ≡ λn. ((((cond (isZero n)) 1) ((mult n) (fact(pred n))))

This would only make sense if we could makerecursive definitional equalities. But, if you thinkabout it, if we could, then we would just be writingforever...

Let’s start with a recursive function:

fact ≡ λn. ((((cond (isZero n)) 1) ((mult n) (fact(pred n))))

This would only make sense if we could makerecursive definitional equalities. But, if you thinkabout it, if we could, then we would just be writingforever...

Well, we can’t refer to a function by name (exceptin the very limited sense of ≡). But what if wecould pass a function to itself?

fact ≡ λf . λn. ((((cond (isZero n)) 1) ((mult n) (f(pred n))))

Well, it wouldn’t make much sense to reduce (factfact), since we would have to reduce (fact (pred n)),which doesn’t make sense.

But what if we had a magic function g such that gis equivalent to (fact g)?

Then, the following would happen (for example):

((fact g) 4)((λf . λn. ((((cond (isZero n)) 1) ((mult n) (f (pred

n)))) g) 4)λn. ((((cond (isZero n)) 1) ((mult n) (g (pred n))))

4)((((cond (isZero 4)) 1) ((mult n) (g (pred 4))))

((mult n) (g (pred 4)))((mult n) (g 3))

((mult n) ((fact g) 3))

But what if we had a magic function g such that gis equivalent to (fact g)?

Then, the following would happen (for example):

((fact g) 4)

((λf . λn. ((((cond (isZero n)) 1) ((mult n) (f (predn)))) g) 4)

λn. ((((cond (isZero n)) 1) ((mult n) (g (pred n))))4)

((((cond (isZero 4)) 1) ((mult n) (g (pred 4))))((mult n) (g (pred 4)))

((mult n) (g 3))((mult n) ((fact g) 3))

But what if we had a magic function g such that gis equivalent to (fact g)?

Then, the following would happen (for example):

((fact g) 4)((λf . λn. ((((cond (isZero n)) 1) ((mult n) (f (pred

n)))) g) 4)

λn. ((((cond (isZero n)) 1) ((mult n) (g (pred n))))4)

((((cond (isZero 4)) 1) ((mult n) (g (pred 4))))((mult n) (g (pred 4)))

((mult n) (g 3))((mult n) ((fact g) 3))

But what if we had a magic function g such that gis equivalent to (fact g)?

Then, the following would happen (for example):

((fact g) 4)((λf . λn. ((((cond (isZero n)) 1) ((mult n) (f (pred

n)))) g) 4)λn. ((((cond (isZero n)) 1) ((mult n) (g (pred n))))

4)

((((cond (isZero 4)) 1) ((mult n) (g (pred 4))))((mult n) (g (pred 4)))

((mult n) (g 3))((mult n) ((fact g) 3))

But what if we had a magic function g such that gis equivalent to (fact g)?

Then, the following would happen (for example):

((fact g) 4)((λf . λn. ((((cond (isZero n)) 1) ((mult n) (f (pred

n)))) g) 4)λn. ((((cond (isZero n)) 1) ((mult n) (g (pred n))))

4)((((cond (isZero 4)) 1) ((mult n) (g (pred 4))))

((mult n) (g (pred 4)))((mult n) (g 3))

((mult n) ((fact g) 3))

But what if we had a magic function g such that gis equivalent to (fact g)?

Then, the following would happen (for example):

((fact g) 4)((λf . λn. ((((cond (isZero n)) 1) ((mult n) (f (pred

n)))) g) 4)λn. ((((cond (isZero n)) 1) ((mult n) (g (pred n))))

4)((((cond (isZero 4)) 1) ((mult n) (g (pred 4))))

((mult n) (g (pred 4)))

((mult n) (g 3))((mult n) ((fact g) 3))

But what if we had a magic function g such that gis equivalent to (fact g)?

Then, the following would happen (for example):

((fact g) 4)((λf . λn. ((((cond (isZero n)) 1) ((mult n) (f (pred

n)))) g) 4)λn. ((((cond (isZero n)) 1) ((mult n) (g (pred n))))

4)((((cond (isZero 4)) 1) ((mult n) (g (pred 4))))

((mult n) (g (pred 4)))((mult n) (g 3))

((mult n) ((fact g) 3))

But what if we had a magic function g such that gis equivalent to (fact g)?

Then, the following would happen (for example):

((fact g) 4)((λf . λn. ((((cond (isZero n)) 1) ((mult n) (f (pred

n)))) g) 4)λn. ((((cond (isZero n)) 1) ((mult n) (g (pred n))))

4)((((cond (isZero 4)) 1) ((mult n) (g (pred 4))))

((mult n) (g (pred 4)))((mult n) (g 3))

((mult n) ((fact g) 3))

Such a magic g is the fixed point of fact.

A fixed point of a function f is a value x such thatf (x) = x

For example: if f (x) = x2 then 0, 1 are the fixedpoints of f .

Such a magic g is the fixed point of fact.

A fixed point of a function f is a value x such thatf (x) = x

For example: if f (x) = x2 then 0, 1 are the fixedpoints of f .

Such a magic g is the fixed point of fact.

A fixed point of a function f is a value x such thatf (x) = x

For example: if f (x) = x2 then 0, 1 are the fixedpoints of f .

In the lambda calculus, there is a lambda term thatwill compute the fixed point of any other lambdaterm. This is referred to as the Y -combinator.

Note that there are several flavors of Y combinator.

Here’s one:

Y = λf .((λx .(f (x x)))(λx .(f (x x))))

Theorem: for any lambda term h, (Y h) isequivalent to (h (Y h)).

Proof:

(Y h)(λ f. ((λ x. (f (x x))) (λ x. (f (x x)))) h)

((λ x. (h (x x))) (λ x. (h (x x)))(h ((λ x. (h (x x)) (λ x. (h (x x))))

(h (Y h))

Theorem: for any lambda term h, (Y h) isequivalent to (h (Y h)).

Proof:

(Y h)

(λ f. ((λ x. (f (x x))) (λ x. (f (x x)))) h)((λ x. (h (x x))) (λ x. (h (x x)))

(h ((λ x. (h (x x)) (λ x. (h (x x))))(h (Y h))

Theorem: for any lambda term h, (Y h) isequivalent to (h (Y h)).

Proof:

(Y h)(λ f. ((λ x. (f (x x))) (λ x. (f (x x)))) h)

((λ x. (h (x x))) (λ x. (h (x x)))(h ((λ x. (h (x x)) (λ x. (h (x x))))

(h (Y h))

Theorem: for any lambda term h, (Y h) isequivalent to (h (Y h)).

Proof:

(Y h)(λ f. ((λ x. (f (x x))) (λ x. (f (x x)))) h)

((λ x. (h (x x))) (λ x. (h (x x)))

(h ((λ x. (h (x x)) (λ x. (h (x x))))(h (Y h))

Theorem: for any lambda term h, (Y h) isequivalent to (h (Y h)).

Proof:

(Y h)(λ f. ((λ x. (f (x x))) (λ x. (f (x x)))) h)

((λ x. (h (x x))) (λ x. (h (x x)))(h ((λ x. (h (x x)) (λ x. (h (x x))))

(h (Y h))

Theorem: for any lambda term h, (Y h) isequivalent to (h (Y h)).

Proof:

(Y h)(λ f. ((λ x. (f (x x))) (λ x. (f (x x)))) h)

((λ x. (h (x x))) (λ x. (h (x x)))(h ((λ x. (h (x x)) (λ x. (h (x x))))

(h (Y h))

So really, factorial is defined in two steps:

fact’ ≡ λf . λn. ((((cond (isZero n)) 1) ((mult n) (f(pred n))))

fact ≡ (Y fact’)

Which is definitionally equivalent to this:

((λf . ((λx . (f (x x))) (λx . (f (x x)))) (λf . λn.((((λc .λt.λf . ((c t) f (λn. ((n (λx . (λx .λy .y)))(λx .λy .x)) n)) (λf .λx .(fx))) (((λn.λm.λf . (n (m(f )))) n) (f ((λn. (((n (λp. (((λf .λs.λc .((((λc .λt.λf .((c t) f ) c) s) t)) (p (λx .λy .y)))((λf .λx . (f ((n f ) x)) (p (λx .λy .x))))))(((λf .λs.λc . ((((λc .λt.λf .((c t) f ) c) s) t))(λf .λx .x)) (λf .λx .x)) (λx .λy .x))) n))))))

Now that we’ve defined the lambda calculus andwritten a program in it, I want to discuss someproperties of the system as a whole.

The Church-Turing Thesis

Any algorithm that performs a computation can beexpressed in the λ-calculus, or by a Turing machine,or by a recursive function (in the sense of recursiontheory).

Undecidability of EquivalenceThere does not exist an algorithm that decideswhether or not two arbitrary lambda terms areequivalent.

The Church-Rosser Theorem

In the λ-calculus, given terms t1 and t2 gotten froma common term t by a sequence of reductions, thereexists a term s that t1 and t2 both reduce to.

t

��

// t1

��t2 // s

The Church-Rosser Theorem

In the λ-calculus, given terms t1 and t2 gotten froma common term t by a sequence of reductions, thereexists a term s that t1 and t2 both reduce to.

t

��

// t1

��t2 // s

Equivalence of the λ-calculus and combinatory logic.

Define combinators:

I = λx .x

K = λx .λy .x

S = λx .λy .λz .((x z) (y z))

Then these combinators suffice to construct anylambda term, up to equivalence.

For example,

Y = S (K (S I I)) (S (S (K S) K) (K (S I I)))

Equivalence of the λ-calculus and combinatory logic.

Define combinators:

I = λx .x

K = λx .λy .x

S = λx .λy .λz .((x z) (y z))

Then these combinators suffice to construct anylambda term, up to equivalence.For example,

Y = S (K (S I I)) (S (S (K S) K) (K (S I I)))

Correspondence between SK and propositional logicConsider the axiom of propositional logic:

a =⇒ (b =⇒ a)

Now look at the K combinator again:

λa.λb.a

Now repeat this to yourself:“If I have a proof of a, then given a proof of b, I stillhave a proof of a”

Correspondence between SK and propositional logicConsider the axiom of propositional logic:

a =⇒ (b =⇒ a)

Now look at the K combinator again:

λa.λb.a

Now repeat this to yourself:“If I have a proof of a, then given a proof of b, I stillhave a proof of a”

Correspondence between SK and propositional logicConsider the axiom of propositional logic:

a =⇒ (b =⇒ a)

Now look at the K combinator again:

λa.λb.a

Now repeat this to yourself:“If I have a proof of a, then given a proof of b, I stillhave a proof of a”

Now consider the axiom:

(a =⇒ (b =⇒ c)) =⇒ ((a =⇒ b) =⇒ (a =⇒ c))

Now look at the S combinator again:

λf .λg .λa.((f a)(g a))

Now, repeat this to yourself:“If I have a way f of turning proofs of a into proofsthat b implies c, then given a proof g that a impliesb, I can make a proof that a implies c.”

Now consider the axiom:

(a =⇒ (b =⇒ c)) =⇒ ((a =⇒ b) =⇒ (a =⇒ c))

Now look at the S combinator again:

λf .λg .λa.((f a)(g a))

Now, repeat this to yourself:“If I have a way f of turning proofs of a into proofsthat b implies c, then given a proof g that a impliesb, I can make a proof that a implies c.”

Now consider the axiom:

(a =⇒ (b =⇒ c)) =⇒ ((a =⇒ b) =⇒ (a =⇒ c))

Now look at the S combinator again:

λf .λg .λa.((f a)(g a))

Now, repeat this to yourself:“If I have a way f of turning proofs of a into proofsthat b implies c, then given a proof g that a impliesb, I can make a proof that a implies c.”

Really, the only sane way to think about this stuff isto appeal to category theory.

The proposition

a =⇒ (b =⇒ a)

Corresponds to an object (“function space”). Thinkof A as the set of proofs of the proposition a.

(AB)A

Which, in nice categories is isomorphic to

A(A×B)

(All I’ve done here is uncurry.)

Really, the only sane way to think about this stuff isto appeal to category theory.The proposition

a =⇒ (b =⇒ a)

Corresponds to an object (“function space”). Thinkof A as the set of proofs of the proposition a.

(AB)A

Which, in nice categories is isomorphic to

A(A×B)

(All I’ve done here is uncurry.)

Really, the only sane way to think about this stuff isto appeal to category theory.The proposition

a =⇒ (b =⇒ a)

Corresponds to an object (“function space”). Thinkof A as the set of proofs of the proposition a.

(AB)A

Which, in nice categories is isomorphic to

A(A×B)

(All I’ve done here is uncurry.)

Really, the only sane way to think about this stuff isto appeal to category theory.The proposition

a =⇒ (b =⇒ a)

Corresponds to an object (“function space”). Thinkof A as the set of proofs of the proposition a.

(AB)A

Which, in nice categories is isomorphic to

A(A×B)

(All I’ve done here is uncurry.)

Really, the only sane way to think about this stuff isto appeal to category theory.The proposition

a =⇒ (b =⇒ a)

Corresponds to an object (“function space”). Thinkof A as the set of proofs of the proposition a.

(AB)A

Which, in nice categories is isomorphic to

A(A×B)

(All I’ve done here is uncurry.)

The latter function space contains the firstprojection (which looks an awful lot like K). Theexistence of this first projection shows that the typeAA×B is inhabited, and thus the original propositiona =⇒ (b =⇒ a) is valid.

The correspondence between lambda expression,logical formulas, and objects in categories is calledthe Curry-Howard-Lambek correspondence.

Thanks!