or:

__How to Succeed at Recursion Without Really Recursing__

Tiger got to hunt, Bird got to fly; Lisper got to sit and wonder, (Y (Y Y))? Tiger got to sleep, Bird got to land; Lisper got to tell himself he understand. — Kurt Vonnegut, modified by Darius Bacon |

## Introduction

I recently wrote a blog post about the Y combinator. Since then, I've received so many useful comments that I thought it was appropriate to expand the post into a more complete article. This article will go into greater depth on the subject, but I hope it'll be more comprehensible as well. You don't need to have read the previous post to understand this one (in fact, it's probably better if you haven't.) The only background knowledge I require is a tiny knowledge of the Scheme programming language including recursion and first-class functions, which I will review. Comments are (again) welcome.

## Why Y?

Before I get into the details of what Y actually is, I'd like to address the question of why you, as a programmer, should bother to learn about it. To be honest, there aren't a lot of good nuts-and-bolts practical reasons for learning about Y. Even though it does have a few practical applications, for the most part it's mainly of interest to computer language theorists. Nevertheless, I do think it's worth your while to know something about Y for the following reasons:

It's one of the most beautiful ideas in all of programming. If you have any sense of programming aesthetics, you're sure to be delighted by Y.

It shows in a very stark way how amazingly powerful the simple ideas of functional programming are.

In 1959, the British scientist C. P. Snow gave a famous lecture
called The Two
Cultures where he bemoaned the fact that many intelligent and
well-educated people of the time had almost no knowledge of science. He
used knowledge of the Second Law of Thermodynamics as a kind of dividing
line between those who were scientifically literate and those who weren't.
I think we can similarly use knowledge of the Y combinator as a dividing
line between programmers who are "functionally literate" (*i.e.* have
a reasonably deep knowledge of functional programming) and those who
aren't. There are other topics that could serve just as well as Y (notably
monads), but Y will do nicely. So if you aspire to have the True
Lambda-Nature, read on.

By the way, Paul Graham (the Lisp hacker, Lisp book author, essayist, and now venture capitalist) apparently thinks so highly of Y that he named his startup incubator company Y Combinator. Paul got rich from his knowledge of ideas like these; maybe someone else will too. Maybe even you.

## A puzzle

### Factorials

We'll start our exploration of the Y combinator by defining some functions
to compute factorials. The factorial of a non-negative
integer `n`

is the product of all integers starting
from `1`

and going up to and including `n`

. Thus we
have:

factorial 1 = 1 factorial 2 = 2 * 1 = 2 factorial 3 = 3 * 2 * 1 = 6 factorial 4 = 4 * 3 * 2 * 1 = 24

and so on. (I'm using a function notation without parentheses here, so
`factorial 3`

is the same as what is usually written as
`factorial(3)`

. Humor me.) Factorials increase very
rapidly with increasing `n`

; the factorial of `20`

is
`2432902008176640000`

. The factorial of `0`

is defined
to be `1`

; this turns out to be the appropriate definition for the kinds of things factorials are actually used for (like solving problems in
combinatorics).

### Recursive definitions of the factorial function

It's easy to write a function in a programming language to compute
factorials using some kind of a looping control construct like
a `while`

or `for`

loop (*e.g.* in C or Java).
However, it's also easy to write a recursive function to compute
factorials, because factorials have a very natural recursive
definition:

factorial 0 = 1 factorial n = n * factorial (n - 1)

where the second line applies for all `n`

greater than zero. In
fact, in the computer language Haskell,
that's the way you actually define the factorial function. In Scheme, the language we'll be using here,
this function would be written like this:

(define (factorial n) (if (= n 0) 1 (* n (factorial (- n 1)))))

Scheme uses a parenthesized prefix notation for everything, so something
like ` (- n 1) `

represents what is usually written
` n - 1 `

in most programming languages. The reasons
for this are beyond the scope of this article, but getting used to this
notation isn't very hard.

In fact, the above definition of the factorial function in Scheme could also be written in a slightly more explicit way as follows:

(define factorial (lambda (n) (if (= n 0) 1 (* n (factorial (- n 1))))))

The keyword `lambda`

simply indicates that the thing we're
defining (*i.e.* whatever is enclosed by the open parenthesis to the
immediate left of the `lambda`

and its corresponding close
parenthesis) is a function. What comes immediately after the word
`lambda`

, in parentheses, are the *formal arguments* of the
function; here there is just one argument, which is `n`

. The
*body* of the function comes after the formal arguments, and here
consists of the expression ```
(if (= n 0) 1 (* n (factorial (- n
1))))
```

. This kind of function is an *anonymous function*. Here you
do give the anonymous function the name `factorial`

after you've
defined it, but you don't have to, and often it's handy not to if you're only
going to be using it once. In Scheme and some other languages, anonymous
functions are also called *lambda expressions*. Many programming
languages besides Scheme allow you to define anonymous functions, including
Python, Ruby, Javascript, Ocaml, and Haskell (but not C, C++, or Java,
unfortunately). We'll be using lambda expressions a lot below.

In the Scheme language, the definition of `factorial`

just given
is identical to the one before it; Scheme simply translates the first
definition into the second one before evaluating it. So all functions in
Scheme are really lambda expressions.

Note that the body of the function has a call to the `factorial`

function (which we're in the process of defining) inside it, which makes this
a recursive definition. I will call this kind of definition, where the name of
the function being defined is used in the body of the function, an
*explicitly recursive definition*. (You might wonder what an "implicitly
recursive" function would be. I'm not going to use that expression, but the
notion I have in mind is a recursive function which is generated through
non-recursive means — keep reading!)

For the sake of argument, we're going to assume that our version of Scheme
doesn't have the equivalent of `for`

or `while`

loops in
C or Java (although in fact, real Scheme implementations do have such
constructs, but under a different name), so that in order to define a function
like `factorial`

, we pretty much have to use recursion. Scheme is
often used as a teaching language partly for this reason: it forces students
to learn to think recursively.

### Functions as data and higher-order functions

Scheme is a cool language for many reasons, but one that is relevant to us
here is that it allows you to use functions as "first class" data objects
(this is often expressed by saying that Scheme supports *first-class
functions*). This means that in Scheme, we can pass a function to another
function as an argument, we can return a function as the result of evaluating
another function applied to its arguments, and we can create functions
on-the-fly as we need them (using the `lambda`

notation shown
above). This is the essence of functional programming, and it will feature
prominently in the ensuing discussion. Functions which take other functions as
arguments, and/or which return other functions as their results, are usually
referred to as *higher-order functions*.

### Eliminating explicit recursion

Now, here's the puzzle: what if you were asked to define the
`factorial`

function in Scheme, but were told that you could not
use recursive function calls in the definition (for instance, in the
`factorial`

function given above you cannot use the word
`factorial`

anywhere in the body of the function). However, you
*are* allowed to use first-class functions and higher-order functions any
way you see fit. With this knowledge, can you define the
`factorial`

function?

The answer to this question is yes, and it will lead us directly to the Y combinator.

## What the Y combinator is and what it does

The Y combinator is a higher-order function. It takes a single argument, which is a function that isn't recursive. It returns a version of the function which is recursive. We will walk through this process of generating recursive functions from non-recursive ones using Y in great detail below, but that's the basic idea.

More generally, Y gives us a way to get recursion in a programming language that supports first-class functions but that doesn't have recursion built in to it. So what Y shows us is that such a language already allows us to define recursive functions, even though the language definition itself says nothing about recursion. This is a Beautiful Thing: it shows us that functional programming alone can allow us to do things that we would never expect to be able to do (and it's not the only example of this).

### Lazy or strict evaluation?

We will be looking at two broad classes of computer languages: those that use *lazy evaluation* and those that use *strict evaluation*. Lazy
evaluation means that in order to evaluate an expression in the language, you
only evaluate as much of the expression as is needed to get the final result.
So (for instance) if there is a part of the expression that doesn't need to
get evaluated (because the result will not depend on it) it won't be
evaluated. In contrast, strict evaluation means that all parts of an
evaluation will be evaluated completely before the value of the expression as
a whole is determined (with some necessary exceptions, such as
`if`

expressions, which have to be lazy to work properly). In
practice, lazy evaluation is more general, but strict evaluation is more
predictable and often more efficient. Most programming languages use strict
evaluation. The programming language Haskell uses lazy evaluation, and this is
one of the most interesting things about that language. We will use both kinds
of evaluation in what follows.

### One Y combinator or many?

Even though we often refer to Y as "the" Y combinator, in actual fact there
are an infinite number of Y combinators. We will only be concerned with two of
these, one lazy and one strict. We need two Y combinators because the Y
combinator we define for lazy languages will not work for strict languages.
The lazy Y combinator is often referred to as the *normal-order Y
combinator* and the strict one is referred to as the *applicative-order Y
combinator*. Basically, *normal-order* is another way of saying "lazy"
and *applicative-order* is another way of saying "strict".

### Static or dynamic typing?

Another big dividing line in programming languages is between *static
typing* and *dynamic typing*. A statically-typed language is one where
the types of all expressions are determined at compile time, and any type
errors cause the compilation to fail. A dynamically-typed language doesn't do
any type checking until run time, and if a function is applied to arguments of
invalid types (*e.g.* by trying to add together an integer and a string),
then an error is reported. Among commonly-used programming languages, C, C++
and Java are statically typed, and Perl, Python and Ruby are dynamically
typed. Scheme (the language we'll be using for our examples) is also
dynamically typed. (There are also languages that straddle the border between
statically-typed and dynamically-typed, but I won't discuss this
further.)

One often hears static typing referred to as *strong typing* and
dynamic typing referred to as *weak typing*, but this is an abuse of
terminology. Strong typing simply means that every value in the language has
one and only one type, whereas weak typing means that some values can have
multiple types. So Scheme, which is dynamically typed, is also strongly typed,
while C, which is statically typed, is weakly typed (because you can cast a
pointer to one kind of object into a pointer to another type of object without
altering the pointer's value). I will only be concerned with strongly typed
languages here.

It turns out to be much simpler to define the Y combinator in dynamically typed languages, so that's what I'll do. It is possible to define a Y combinator in many statically typed languages, but (at least in the examples I've seen) such definitions usually require some non-obvious type hackery, because the Y combinator itself doesn't have a straightforward static type. That's beyond the scope of this article, so I won't mention it further.

### What a "combinator" is

A combinator is just a *lambda expression* with no *free
variables*. We saw above what lambda expressions are (they're just
anonymous functions), but what's a free variable? It's a variable (*i.e.
* a name or identifier in the language) which isn't a *bound
variable*. Happy now? No? OK, let me explain.

A bound variable is simply a variable which is contained inside the body of a lambda expression that has that variable name as one of its arguments.

Let's look at some examples of lambda expressions and free and bound variables:

`(lambda (x) x)`

`(lambda (x) y)`

`(lambda (x) (lambda (y) x))`

`(lambda (x) (lambda (y) (x y)))`

`(x (lambda (y) y))`

`((lambda (x) x) y)`

Are the variables in the body of these lambda expressions free variables or bound variables? We'll ignore the formal arguments of the lambda expressions, because only variables in the body of the lambda expression can be considered free or bound. As for the other variables, here are the answers:

- The
`x`

in the body of the lambda expression is a bound variable, because the formal argument of the lambda expression is also`x`

. This lambda expression has no other variables, therefore it has no free variables, therefore it's a combinator. - The
`y`

in the lambda body is a free variable. This lambda expression is therefore not a combinator. - Aside from the formal arguments of the lambda expression, there is only one variable, the final
`x`

, which is a bound variable (it's bound by the formal argument of the outer lambda expression). Therefore, this lambda expression as a whole has no free variables, so this is a combinator. - Aside from the formal arguments of the lambda expression, there are two variables, the final
`x`

and`y`

, both bound variables. This is a combinator. - The entire expression is not a lambda expression, so it's by definition not a combinator. Nevertheless, the
`x`

is a free variable and the final`y`

is a bound variable. - Again, the entire expression isn't a lambda expression (it's a function application), so this isn't a combinator either. The second
`x`

is a bound variable while the`y`

is a free variable.

When you're wondering if a recursive function like
`factorial`

:

(define factorial (lambda (n) (if (= n 0) 1 (* n (factorial (- n 1))))))

is a combinator, you don't consider the `define`

part, so what you're really asking is if

(lambda (n) (if (= n 0) 1 (* n (factorial (- n 1)))))

is a combinator. Since in this lambda expression, the name
`factorial`

represents a free variable (the name
`factorial`

is not a formal argument of the lambda expression), this is not a combinator. This will be important below. In fact, the names `=`

, `*`

, and `-`

are also free variables, so even without the name `factorial`

this would not be a combinator (to say nothing of the numbers!).

## Back to the puzzle

### Abstracting out the recursive function call

Recall the factorial function we had previously:

(define factorial (lambda (n) (if (= n 0) 1 (* n (factorial (- n 1))))))

What we want to do is to come up with a version of this that does the same
thing but doesn't have that pesky recursive call to `factorial`

in the body of the function.

Where do we start? It would be nice if you could save all of the function except for the offending recursive call, and put something else there. That might look like this:

(define sort-of-factorial (lambda (n) (if (= n 0) 1 (* n (<???> (- n 1))))))

This still leaves us with the problem of what to put in the place
marked `<???>`

. It's a tried-and-true principle of
functional programming that if you don't know exactly what you want to put
somewhere in a piece of code, just abstract it out and make it a parameter
of a function. The easiest way to do this is as follows:

(define almost-factorial (lambda (f) (lambda (n) (if (= n 0) 1 (* n (f (- n 1)))))))

What we've done here is to rename the recursive call to
`factorial`

to `f`

, and to make `f`

an
argument to a function which we're calling `almost-factorial`

.
Notice that `almost-factorial`

is not at all the factorial
function. Instead, it's a higher-order function which takes a single argument
`f`

, which had better be a function (or else ```
(f (- n
1))
```

won't make sense), and *returns* another function (the
`(lambda (n) ...)`

part) which (hopefully) will be a factorial
function if we choose the right value for `f`

.

It's important to realize that this trick is not in any way specific to
the `factorial`

function. We can do exactly the same trick with
any recursive function. For instance, consider a recursive function to
compute fibonacci numbers. The recursive definition of fibonacci numbers
is as follows:

fibonacci 0 = 0 fibonacci 1 = 1 fibonacci n = fibonacci (n - 1) + fibonacci (n - 2)

(In fact, that's the definition of the fibonacci function in Haskell.) In Scheme, we can write the function this way:

(define fibonacci (lambda (n) (cond ((= n 0) 0) ((= n 1) 1) (else (+ (fibonacci (- n 1)) (fibonacci (- n 2)))))))

(where `cond`

is just a shorthand expression for
nested `if`

expressions). We can then remove the explicit
recursion just like we did for `factorial`

:

(define almost-fibonacci (lambda (f) (lambda (n) (cond ((= n 0) 0) ((= n 1) 1) (else (+ (f (- n 1)) (f (- n 2))))))))

As you can see, the transformation from a recursive function to a
non-recursive `almost-`

equivalent function is a purely
mechanical one: you rename the name of the recursive function inside the
body of the function to `f`

and you wrap a ```
(lambda (f)
...)
```

around the body.

If you've followed what I just did (never mind *why* I did it; we'll
see that later), then congratulations! As Yoda says, you've just taken the
first step into a larger world.

### Sneak preview

I probably shouldn't do this yet, but I'm going to give you a sneak
preview of where we're going. Once we define the Y combinator, we'll be
able to define the factorial function using `almost-factorial`

as follows:

(define factorial (Y almost-factorial))

where `Y`

is the Y combinator. Note that this definition of
`factorial`

doesn't have any explicit recursion in it. Similarly,
we can define the `fibonacci`

function using
`almost-fibonacci`

in the same way:

(define fibonacci (Y almost-fibonacci))

So the Y combinator will give us recursion wherever we need it as long as
we have the appropriate `almost-`

function available
(*i.e.* the non-recursive function derived from the recursive one by
abstracting out the recursive function calls).

Read on to see what's really going on here and why this will work.

### Recovering `factorial`

from `almost-factorial`

Let's assume, for the sake of argument, that we already had a working
factorial function lying around (recursive or not, we don't care). We'll
call that hypothetical factorial function `factorialA`

. Now
let's consider the following:

(define factorialB (almost-factorial factorialA))

Question: does `factorialB`

actually compute factorials?

To answer this, it's helpful to expand out the definition
of `almost-factorial`

:

(define factorialB ((lambda (f) (lambda (n) (if (= n 0) 1 (* n (f (- n 1)))))) factorialA))

Now, by substituting `factorialA`

for `f`

inside the
body of the lambda expression we get:

(define factorialB (lambda (n) (if (= n 0) 1 (* n (factorialA (- n 1))))))

This looks a lot like the recursive factorial function, but it isn't:
`factorialA`

is not the same function as `factorialB`

.
So it's a non-recursive function that depends on a hypothetical
`factorialA`

function to work. Does it actually work? Well, it's
pretty obvious that it should work for `n = 0`

, since
`(factorialB 0)`

will just return `1`

(the factorial of
`0`

). If `n > 0`

, then the value of ```
(factorialB
n)
```

will be `(* n (factorialA (- n 1)))`

. Now, we assumed
that `factorialA`

would correctly compute factorials, so
`(factorialA (- n 1))`

is the factorial of `n - 1`

, and
therefore `(* n (factorialA (- n 1)))`

is the factorial of
`n`

(by the definition of factorial), thus proving that
`factorialB`

computes the factorial function correctly as long as
`factorialA`

does. So this works. The only problem is that we don't
actually have a `factorialA`

lying around.

Now, if you're really clever, you might be asking yourself whether we can just do this:

(define factorialA (almost-factorial factorialA))

The idea is this: let's assume that `factorialA`

is a valid
factorial function. Then if we pass it as an argument
to `almost-factorial`

, the resulting function will have to be a
valid factorial function, so why not just name that
function `factorialA`

?

It looks like you've created a perpetual-motion machine (or perhaps I
should say a perpetual-calculation machine), and there must
be *something* wrong with this definition... mustn't there?

In fact, this definition will work fine as long as the Scheme language you're using uses lazy evaluation! Standard Scheme uses strict evaluation, so it won't work (it'll go into an infinite loop). If you use DrScheme as your Scheme interpreter (which you should), then you can use the "lazy Scheme" language level, and the above code will actually work (huzzah!). We'll see why below, but for now I want to stick to standard (strict) Scheme and approach the problem in a slightly different way.

Let's define a couple of functions:

(define identity (lambda (x) x)) (define factorial0 (almost-factorial identity))

The `identity`

function is pretty simple: it takes in a single
argument and returns it unchanged (it's also a combinator, as I hope you can
tell). We're basically going to use it as a placeholder when we need to pass a
function as an argument and we don't know what function we should pass.

`factorial0`

is more interesting. It's a function that can
compute *some*, but not *all* factorials. Specifically, it can
compute the factorials up to and including the factorial of zero (which
means that it can only compute the factorial of zero, but you'll soon see
why I describe it this way). Let's verify that:

(factorial0 0) ==> ((almost-factorial identity) 0) ==> (((lambda (f) (lambda (n) (if (= n 0) 1 (* n (f (- n 1)))))) identity) 0) ==> ((lambda (n) (if (= n 0) 1 (* n (identity (- n 1))))) 0) ==> (if (= 0 0) 1 (* 0 (identity (- 0 1)))) ==> (if #t 1 (* 0 (identity (- 0 1)))) ==> 1

OK, so it works. Unfortunately, it won't work for `n > 0`

.
For instance, if `n = 1`

then we'll have (skipping a few obvious
steps):

(factorial0 1) ==> (* 1 (identity (- 1 1))) ==> (* 1 (identity 0)) ==> (* 1 0) ==> 0

which is not the correct answer.

Now consider this spiffed-up version of `factorial0`

:

(define factorial1 (almost-factorial factorial0))

which is the same thing as:

(define factorial1 (almost-factorial (almost-factorial identity)))

This will correctly compute the factorials of `0`

and `1`

, but it will be incorrect for any `n > 1`

.
Let's verify this as well, again skipping some obvious steps:

(factorial1 0) ==> ((almost-factorial factorial0) 0) ==> 1 (via essentially the same derivation we showed above) (factorial1 1) ==> ((almost-factorial factorial0) 1) ==> (((lambda (f) (lambda (n) (if (= n 0) 1 (* n (f (- n 1)))))) factorial0) 1) ==> ((lambda (n) (if (= n 0) 1 (* n (factorial0 (- n 1))))) 1) ==> (if (= 1 0) 1 (* 1 (factorial0 (- 1 1)))) ==> (if #f 1 (* 1 (factorial0 (- 1 1)))) ==> (* 1 (factorial0 (- 1 1))) ==> (* 1 (factorial0 0)) ==> (* 1 1) ==> 1

which is the correct answer. So `factorial1`

can compute
factorials for `n = 0`

and `n = 1`

. You can verify,
though, that it won't be correct for `n > 1`

.

We can keep going, and define functions which can compute factorials up to any particular limit:

(define factorial2 (almost-factorial factorial1)) (define factorial3 (almost-factorial factorial2)) (define factorial4 (almost-factorial factorial3)) (define factorial5 (almost-factorial factorial4)) etc.

`factorial2`

will compute correct factorials for inputs
between `0`

and `2`

, `factorial3`

will
compute correct factorials for inputs between `0`

and `3`

, and so on. You should be able to verify this for
yourself using the above derivations as models, though you probably won't
be able to do it in your head (at least, *I* can't do it in my
head).

One interesting way of looking at this is
that `almost-factorial`

takes in a crappy factorial function and
outputs a factorial function that is slightly less crappy, in that it will
handle exactly one extra value of the input correctly.

Note that you can again rewrite the definitions of the factorial functions like this:

(define factorial0 (almost-factorial identity)) (define factorial1 (almost-factorial (almost-factorial identity))) (define factorial2 (almost-factorial (almost-factorial (almost-factorial identity)))) (define factorial3 (almost-factorial (almost-factorial (almost-factorial (almost-factorial identity))))) (define factorial4 (almost-factorial (almost-factorial (almost-factorial (almost-factorial (almost-factorial identity)))))) (define factorial5 (almost-factorial (almost-factorial (almost-factorial (almost-factorial (almost-factorial (almost-factorial identity)))))))

and so on. Again, if you're very clever you might wonder if you could do this:

(define factorial-infinity (almost-factorial (almost-factorial (almost-factorial ...))))

where the `...`

means that you're repeating the chain of
`almost-factorials`

an infinite number of times. If you did wonder
this, go to the head of the class! Unfortunately, we can't write this out
directly, but we can define the equivalent of this. Note also that
`factorial-infinity`

is just the `factorial`

function we
want: it works on all integers greater than or equal to zero.

What we have shown is that if we could define an infinite chain of
`almost-factorials`

, that would give us the factorial function.
Another way of saying this is that the factorial function is the
*fixpoint* of `almost-factorial`

, which is what I will explain
next.

### Fixpoints of functions

The notion of a fixpoint should be familiar to anyone who has amused
themselves playing with a pocket calculator. You start with `0`

and
hit the `cos`

(cosine) key repeatedly. What you find is that the
answer rapidly converges to a number which is (approximately)
`0.73908513321516067`

; hitting the `cos`

key again
doesn't change anything because ```
cos(0.73908513321516067) =
0.73908513321516067
```

. We say that the number
`0.73908513321516067`

is a *fixpoint* of the cosine
function.

The cosine function takes a single input value (a real number) and produces
a single output value (also a real number). The fact that the input and output
of the function are the same type is what allows you to apply it repeatedly,
so that if `x`

is a real number, we can calculate what
`cos(x)`

is, and since that will also be a real number, we can
calculate what `cos(cos(x))`

is, and then what
`cos(cos(cos(x)))`

is, and so on. The fixpoint is the value
`x`

where `cos(x) = x`

.

Fixpoints don't have to be real numbers. In fact, they can be any type of
thing, as long as the function that generates them can take the same type of
thing as input as it produces as output. Most importantly for our discussion,
fixpoints can be functions. If you have a higher-order function like
`almost-factorial`

that takes in a function as its input and
produces a function as its output (with both input and output functions taking
a single integer argument as input and producing a single integer as output),
then it should be possible to compute its fixpoint (which will, naturally, be
a function which takes a single integer argument as input and produces a
single integer as output). That fixpoint function will be the function for
which

fixpoint-function = (almost-factorial fixpoint-function)

By repeatedly substituting the right-hand side of that equation into
the `fixpoint-function`

on the right, we get:

fixpoint-function = (almost-factorial (almost-factorial fixpoint-function)) = (almost-factorial (almost-factorial (almost-factorial fixpoint-function))) = ... = (almost-factorial (almost-factorial (almost-factorial (almost-factorial (almost-factorial ...)))))

As we saw above, this will be the factorial function we want. Thus, the
fixpoint of `almost-factorial`

will be
the `factorial`

function:

factorial = (almost-factorial factorial) = (almost-factorial (almost-factorial (almost-factorial (almost-factorial (almost-factorial ...)))))

That's all well and good, but just *knowing*
that `factorial`

is the fixpoint
of `almost-factorial`

doesn't tell us how to compute it.
Wouldn't it be nice if there was some magical higher-order function that
would take as its input a function like `almost-factorial`

, and
would output its fixpoint function, which in that case would
be `factorial`

? Wouldn't that be really freakin' sweet?

That function exists, and it's the Y combinator. Y is also known as
the *fixpoint combinator*: it takes in a function and returns its
fixpoint.

### Eliminating (most) explicit recursion (lazy version)

OK, it's time to derive Y.

Let's start by specifying what Y does:

(Y f) = fixpoint-of-f

What do we know about the fixpoint of `f`

? We know that

(f fixpoint-of-f) = fixpoint-of-f

by the definition of what a fixpoint of a function is. Therefore, we have:

(Y f) = fixpoint-of-f = (f fixpoint-of-f)

and we can substitute `(Y f)`

for `fixpoint-of-f`

to
get:

(Y f) = (f (Y f))

Voila! We've just defined Y. If we want it to be expressed as a Scheme function, we would have to write it like this:

(define (Y f) (f (Y f)))

or, using an explicit `lambda`

expression, as:

(define Y (lambda (f) (f (Y f))))

However, there are two caveats regarding this definition of Y:

It will only work in a lazy language (see below).

It is not a combinator, because the

`Y`

in the body of the definition is a free variable which is only bound once the definition is complete. In other words, we couldn't just take the body of this version of`Y`

and plop it in wherever we needed it, because it requires that the name`Y`

be defined somewhere.

Nevertheless, if you're using lazy Scheme, you can indeed define factorials like this:

(define Y (lambda (f) (f (Y f)))) (define almost-factorial (lambda (f) (lambda (n) (if (= n 0) 1 (* n (f (- n 1))))))) (define factorial (Y almost-factorial))

and it will work correctly.

What have we accomplished? We originally wanted to be able to define the
factorial function without using any explicitly recursive functions at all.
We've *almost* done that. Our definition of `Y`

is still
explicitly recursive. However, we've taken a giant step, because this is
the *only* function in our language that needs to be explicitly
recursive in order to define recursive functions. With this version
of `Y`

we can go ahead and define other recursive functions (for
instance, defining `fibonacci`

as ```
(Y
almost-fibonacci)
```

).

### Eliminating (most) explicit recursion (strict version)

I said above that the definition of Y that we derived wouldn't work in a
strict language (like standard Scheme). In a strict language, we evaluate
all the arguments to a function call before applying the function to its
arguments, whether or not those arguments are needed. So if we have a
function `f`

and we try to evaluate `(Y f)`

using the
above definition, we get:

(Y f) = (f (Y f)) = (f (f (Y f))) = (f (f (f (Y f)))) etc.

and so on ad infinitum. The evaluation of `(Y f)`

will never
terminate, so we will never get a usable function out of it. This
definition of Y doesn't work for strict languages.

However, there is a clever hack that we can use to save the day and define
a version of Y that works in strict languages. The trick is to realize that
`(Y f)`

is going to become a function of one argument. Therefore,
this equality will hold:

(Y f) = (lambda (x) ((Y f) x))

Whatever one-argument function `(Y f)`

is, ```
(lambda (x) ((Y
f) x))
```

has to be the same function. All you're doing is taking in a
single input value `x`

and giving it to the function defined by
`(Y f)`

. In a similar way, this will be true:

cos = (lambda (x) (cos x))

It doesn't matter whether you use `cos`

or ```
(lambda (x)
(cos x))
```

as your cosine function; they will both do the same thing.

However, it turns out that `(lambda (x) ((Y f) x))`

has a big
advantage when defining Y in a strict language. By the reasoning given above,
we should be able to define Y as follows:

(define Y (lambda (f) (f (lambda (x) ((Y f) x)))))

Since we know that `(lambda (x) ((Y f) x))`

is the same function
as `(Y f)`

, this is a valid version of Y which will work just as
well as the previous version, even though it's a bit more complicated (and
perhaps a tiny bit slower in practice). We could use this version of Y to
define the `factorial`

function in lazy Scheme, and it would work
fine.

The cool thing about *this* version of Y is that it will also work in
a strict language (like standard Scheme)! The reason for this is that when you
give Y a particular `f`

to find the fixpoint of, it will return

(Y f) = (f (lambda (x) ((Y f) x)))

This time, there is no infinite loop, because the inner `(Y f)`

is kept inside a `lambda`

expression, where it sits until it's
needed (since the body of a lambda expression is never evaluated in Scheme
until the lambda expression is applied to its arguments). Basically, you're
using the lambda to delay the evaluation of `(Y f)`

. So if
`f`

was `almost-factorial`

, we would have this:

(define almost-factorial (lambda (f) (lambda (n) (if (= n 0) 1 (* n (f (- n 1))))))) (define factorial (Y almost-factorial))

Expanding out the call to Y, we have:

(define factorial ((lambda (f) (f (lambda (x) ((Y f) x)))) almost-factorial)) ==> (define factorial (almost-factorial (lambda (x) ((Y almost-factorial) x)))) ==> (define factorial (lambda (n) (if (= n 0) 1 (* n ((lambda (x) ((Y almost-factorial) x)) (- n 1))))))

Here again, `(lambda (x) ((Y almost-factorial) x))`

is the same
function as `(Y almost-factorial)`

, which is the fixpoint of
`almost-factorial`

, which is just the factorial function. However,
the `(Y almost-factorial)`

in ```
(lambda (x) ((Y
almost-factorial) x))
```

won't be evaluated until the entire lambda
expression is applied to its argument, which won't happen until later (or not
at all, for the factorial of zero). Therefore this factorial function will
work in a strict language, and the version of Y used to define it will also
work in a strict language.

I realize that the preceding discussion and derivation is nontrivial, so don't be discouraged if you don't get it right away. Just sleep on it, play with it in your mind and with your trusty DrScheme interpreter, and you'll eventually get it.

At this point, we've accomplished everything we've set out to accomplish, except for one tiny little detail: we haven't yet derived the Y combinator itself.

## Deriving the Y combinator

### The lazy (normal-order) Y combinator

At this point, we want to define not just Y, but a Y *combinator*.
Note that the previous (lazy) definition of Y:

(define Y (lambda (f) (f (Y f))))

is a valid definition of Y but is not a Y combinator, since the definition of Y refers to Y itself. In other words, this definition is explicitly recursive. A combinator isn't allowed to be explicitly recursive; it has to be a lambda expression with no free variables (as I mentioned above), which means that it can't refer to its own name (if it even has a name) in its definition. If it did, the name would be a free variable in the definition, as we have in our definition of Y:

(lambda (f) (f (Y f)))

Note that Y in this definition is free; it isn't the bound variable of any lambda expression. So this is not a combinator.

Another way to think about this is that you should be able to replace the name of a combinator with its definition everywhere it's found and have everything still work. (Can you see why this wouldn't work with the explicitly recursive definition of Y? You would get into an infinite loop and you'd never be able to replace all the Ys with their definitions.) So whatever the Y combinator will be, it will not be explicitly recursive. From this non-recursive function we will be able to define whatever recursive functions we want.

I'm going to go back a bit to our original problem and derive a Y combinator from the bottom up. After I've done that I'll check to make sure that it is a fixpoint combinator, like the versions of Y we've already seen. In what follows I will borrow (steal) liberally from a very elegant derivation of the Y combinator sent to me by Eli Barzilay (thanks, Eli!), who is one of the DrScheme developers and an all-around Scheme uberstud.

Recall our original recursive `factorial`

function:

(define (factorial n) (if (= n 0) 1 (* n (factorial (- n 1)))))

Recall that we want to define a version of this without the explicit recursion. One way we could do this is to pass the factorial function itself as an extra argument when you call the function:

;; This won't work yet: (define (part-factorial self n) (if (= n 0) 1 (* n (self (- n 1)))))

Note that `part-factorial`

is not the same as
the `almost-factorial`

function described above. We would have
to call this `part-factorial`

function in a different way to get
it to compute factorials:

(part-factorial part-factorial 5) ==> 120

This is not explicitly recursive because we send along an extra copy of the
`part-factorial`

function as the `self`

argument.
However, it won't work unless the point of recursion calls the function the
exact same way:

(define (part-factorial self n) (if (= n 0) 1 (* n (self self (- n 1))))) ;; note the extra "self" here (part-factorial part-factorial 5) ==> 120

This works, but now we have moved away from our original way of calling the factorial function. We can move back to something closer to our original version by rewriting it like this:

(define (part-factorial self) (lambda (n) (if (= n 0) 1 (* n ((self self) (- n 1)))))) ((part-factorial part-factorial) 5) ==> 120 (define factorial (part-factorial part-factorial)) (factorial 5) ==> 120

Pause for a second here. Notice that we've *already* defined a version
of the factorial function without using explicit recursion anywhere! This is
the most crucial step. Everything else we do will be concerned with packaging
what we've already done so that we can easily re-use it with other
functions.

Now let's try to get back something like our `almost-factorial`

function by pulling out the `(self self)`

call using
a `let`

expression outside of a `lambda`

:

(define (part-factorial self) (let ((f (self self))) (lambda (n) (if (= n 0) 1 (* n (f (- n 1))))))) (define factorial (part-factorial part-factorial)) (factorial 5) ==> 120

This will work fine in a lazy language. In a strict language, the
`(self self)`

call in the `let`

statement will send us
into an infinite loop, because in order to calculate ```
(part-factorial
part-factorial)
```

(in the definition of `factorial`

) you will
first have to calculate `(part-factorial part-factorial)`

(in the
`let`

expression). (For fun: figure out why this wasn't a problem
with the previous definition.) I'll let this go for now, because I want to
define the lazy Y combinator, but in the next section I'll solve this problem
in the same way we solved it before (by wrapping a `lambda`

around
the `(self self)`

call). Note that in a lazy language, the
`(self self)`

call in the `let`

statement will never be
evaluated unless `f`

is actually needed (for instance, if ```
n =
0
```

then `f`

isn't needed to compute the answer, so
`(self self)`

won't be evaluated). Understanding how lazy languages
evaluate expressions is not trivial, so don't worry if you find this a little
confusing. I recommend you experiment with the code using the lazy Scheme
language level of DrScheme to get a better feel for what's going on.

It turns out that any `let`

expression can be converted into an
equivalent `lambda`

expression using this equation:

(let ((x <expr1>)) <expr2>) ==> ((lambda (x) <expr2>) <expr1>)

where `<expr1>`

and `<expr2>`

are
arbitrary Scheme expressions. (I'm only considering `let`

expressions with a single binding and `lambda`

expressions with
a single argument, but the principle can easily be generalized
to `lets`

with multiple bindings and `lambdas`

with
multiple arguments.) This leads us to:

(define (part-factorial self) ((lambda (f) (lambda (n) (if (= n 0) 1 (* n (f (- n 1)))))) (self self))) (define factorial (part-factorial part-factorial)) (factorial 5) ==> 120

If you look closely, you'll see that we have our old friend the
`almost-factorial`

function embedded inside the
`part-factorial`

function. Let's pull it outside:

(define almost-factorial (lambda (f) (lambda (n) (if (= n 0) 1 (* n (f (- n 1))))))) (define (part-factorial self) (almost-factorial (self self))) (define factorial (part-factorial part-factorial)) (factorial 5) ==> 120

I don't know about you, but I'm getting pretty fed up with this
whole `(part-factorial part-factorial)`

thing, and I'm not going
to take it anymore! Fortunately, I don't have to; I can first rewrite
the `part-factorial`

function like this:

(define part-factorial (lambda (self) (almost-factorial (self self))))

Then I can rewrite the `factorial`

function like this:

(define almost-factorial (lambda (f) (lambda (n) (if (= n 0) 1 (* n (f (- n 1))))))) (define factorial (let ((part-factorial (lambda (self) (almost-factorial (self self))))) (part-factorial part-factorial))) (factorial 5) ==> 120

The `factorial`

function can be written a little more concisely
by changing the name of `part-factorial`

to `x`

(since we aren't using this name anywhere else now):

(define factorial (let ((x (lambda (self) (almost-factorial (self self))))) (x x)))

Now let's use the same `let ==> lambda`

trick we used above to get:

(define almost-factorial (lambda (f) (lambda (n) (if (= n 0) 1 (* n (f (- n 1))))))) (define factorial ((lambda (x) (x x)) (lambda (self) (almost-factorial (self self))))) (factorial 5) ==> 120

And again, to make this definition a little more concise, we can
rename `self`

to `x`

to get:

(define almost-factorial (lambda (f) (lambda (n) (if (= n 0) 1 (* n (f (- n 1))))))) (define factorial ((lambda (x) (x x)) (lambda (x) (almost-factorial (x x))))) (factorial 5) ==> 120

Note that the two `lambda`

expressions in the definition
of `factorial`

both are functions of `x`

, but the
two `x`

's don't conflict with each other. In fact, we could
have renamed `self`

to `y`

or almost any other name,
but it'll be convenient to use `x`

in what follows.

We're almost there! This works fine, but it's too specific to
the `factorial`

function. Let's change it to a
generic `make-recursive`

function that makes recursive functions
from non-recursive ones (sound familiar?):

(define almost-factorial (lambda (f) (lambda (n) (if (= n 0) 1 (* n (f (- n 1))))))) (define (make-recursive f) ((lambda (x) (x x)) (lambda (x) (f (x x))))) (define factorial (make-recursive almost-factorial)) (factorial 5) ==> 120

The `make-recursive`

function is in fact the long-sought lazy Y
combinator, also known as the *normal-order Y combinator*, so let's
write it that way:

(define almost-factorial (lambda (f) (lambda (n) (if (= n 0) 1 (* n (f (- n 1))))))) (define (Y f) ((lambda (x) (x x)) (lambda (x) (f (x x))))) (define factorial (Y almost-factorial))

I'm going to expand out the definition of Y a little bit:

(define Y (lambda (f) ((lambda (x) (x x)) (lambda (x) (f (x x))))))

Note that we can apply the inner `lambda`

expression to
its argument to get an equivalent version of Y:

(define Y (lambda (f) ((lambda (x) (f (x x))) (lambda (x) (f (x x))))))

What this means is that, for a given function `f`

(which is a
non-recursive function like `almost-factorial`

), the
corresponding recursive function can be obtained first by
computing `(lambda (x) (f (x x)))`

, and then applying
this `lambda`

expression to itself. This is the usual
definition of the normal-order Y combinator.

The only thing left to do is to check that this Y combinator is a fixpoint combinator (which it has to be in order to compute the right thing). To do this we have to demonstrate that this equation is correct:

(Y f) = (f (Y f))

From the definition of the normal-order Y combinator given above, we have:

(Y f) = ((lambda (x) (f (x x))) (lambda (x) (f (x x))))

Now apply the first lambda expression to its argument, which is the second lambda expression, to get this:

= (f ((lambda (x) (f (x x))) (lambda (x) (f (x x))))) = (f (Y f))

as desired. So, not only is the normal-order Y combinator also a fixpoint combinator, it's just about the most obvious fixpoint combinator there is, in that the proof that it's a fixpoint combinator is so trivial.

If you've made it through all of this derivation, you should pat yourself on the back and take a well-deserved break. When you come back, we'll finish off by deriving...

### The strict (applicative-order) Y combinator

Let's pick up the previous derivation just before the point where it failed for strict languages:

(define (part-factorial self) (lambda (n) (if (= n 0) 1 (* n ((self self) (- n 1)))))) ((part-factorial part-factorial) 5) ==> 120 (define factorial (part-factorial part-factorial)) (factorial 5) ==> 120

Up to this point, everything works in a strict language. Now if we pull
the `(self self)`

out into a `let`

expression as before,
we have:

(define (part-factorial self) (let ((f (self self))) (lambda (n) (if (= n 0) 1 (* n (f (- n 1))))))) (define factorial (part-factorial part-factorial)) (factorial 5) ==> 120

As I said above, this will not work in a strict language, because whenever
the `factorial`

function is called it will evaluate the function
call `(part-factorial part-factorial)`

, and when that function call
is evaluated it will first evaluate `(self self)`

as part of the
`let`

expression, which in this case will be ```
(part-factorial
part-factorial)
```

, leading to an infinite loop of ```
(part-factorial
part-factorial)
```

calls.

We saw above that the way around problems like this is to realize that what
we are trying to evaluate are functions of one argument. In this case,
`(self self)`

will be a function of one argument (it's going to
be the same as `(part-factorial part-factorial)`

, which is just
the `factorial`

function). We can wrap a lambda expression around
this function to get an equivalent function:

(define (part-factorial self) (let ((f (lambda (y) ((self self) y)))) (lambda (n) (if (= n 0) 1 (* n (f (- n 1)))))))

All we've done here is convert `(self self)`

, a function of one
argument, to `(lambda (y) ((self self) y))`

, an equivalent function
of one argument (we saw this trick earlier). I'm using `y`

instead
of `x`

as the variable binding of the new lambda expression so as
not to cause name conflicts later on in the derivation when `self`

gets renamed to `x`

, but I could have chosen another name as
well.

After we've done this, the `part-factorial`

function will now
work even in a strict language. That's because once ```
(part-factorial
part-factorial)
```

is evaluated, as part of evaluating the
`let`

expression the code `(lambda (x) ((self self) x))`

will be evaluated. Unlike before, this will *not* send us into an
infinite loop; the lambda expression won't be evaluated further until it's
applied to its argument. This lambda wrapper doesn't change the value of the
thing it wraps, but it does delay its evaluation, which is all we need to get
the definition of `part-factorial`

to work in a strict
language.

And that's the trick. After that, we carry through every other step of the derivation in exactly the same way. We end up with this definition of the strict Y combinator:

(define Y (lambda (f) ((lambda (x) (f (lambda (y) ((x x) y)))) (lambda (x) (f (lambda (y) ((x x) y)))))))

This can also be written in the equivalent form:

(define Y (lambda (f) ((lambda (x) (x x)) (lambda (x) (f (lambda (y) ((x x) y)))))))

Hopefully, you can see why this is equivalent. Either of these are the
strict Y combinator, or as it's called in the technical literature, the
*applicative-order Y combinator*. In a strict language (like standard
Scheme) you can use this to define the factorial function in the usual
way:

(define factorial (Y almost-factorial))

I recommend you try this out with DrScheme, and lo! marvel at the awesome power of the applicative-order Y combinator, that which hath created recursion where no recursion hath previously existed.

## Other matters

### Practical applications

This article has (I hope) convinced you that you don't need to have explicit recursion built in to a language in order for that language to allow you to define recursive functions, as long as the language supports first-class functions so that you can define a Y combinator. However, I don't want to leave you with the notion that recursion in real computer languages is implemented this way. In practice, it's far more efficient to just implement recursion directly in a computer language than to use the Y combinator. There are lots of other interesting issues that come up when considering how to implement recursion efficiently, but those issues are beyond the scope of this article. The point is that implementing recursion using the Y combinator is mainly of theoretical interest.

That said, in the paper Y in Practical
Programs, Bruce McAdams discusses a few ways in which Y can be used to
define variants of recursive functions that *e.g.* print traces of their
execution or automatically memoize their execution to give greater efficiency
(as well as a few more esoteric applications), so Y isn't *just* a
theoretical construct.

### Mutual Recursion

Experienced functional programmers and/or particularly astute readers
may have noticed that I didn't describe how to use the Y combinator to
implement *mutual recursion*, which is where you have two or more
functions which all call each other. The simplest example I can think of to
illustrate mutual recursion are the following pair of functions which determine
whether a non-negative integer is even or odd:

(define (even? n) (if (= n 0) #t (odd? (- n 1)))) (define (odd? n) (if (= n 0) #f (even? (- n 1))))

Before you start yelling at me, yes, I know that this isn't the most efficient way to compute evenness or oddness — it's just to illustrate what mutual recursion is. Any computer language that supports recursive function definitions has to support mutual recursion as well, but I haven't shown you how to use Y to define mutually-recursive functions. I'm going to cop out here because I think this article is long enough as it is, but rest assured that it is possible to define analogs of Y that can define mutually-recursive functions.

## Further reading

The Wikipedia article on the Y combinator is somewhat difficult reading, but it has some interesting material I didn't cover here.

The Little Schemer, 4th. ed., by Dan Friedman and Matthias Felleisen. Chapter 9 has a derivation of the Y combinator which is what got me interested in this subject.

The article Y in Practical Programs, by Bruce McAdams, which was referred to in the previous section.

## Acknowledgments

I would like to thank the following people:

Everyone who commented on my first blog post on the Y combinator, and also everyone who comments on this article.

Eli Barzilay, for a very interesting email discussion on this subject. The derivation of the normal-order Y combinator is taken directly from Eli (with permission).

My friend Darius Bacon for the poem. I'd also like to apologize to the estate of Kurt Vonnegut for abusing his work. The original poem appeared in Vonnegut's brilliant novel Cat's Cradle. If you haven't read it, you should do so as soon as possible.

All the DrScheme implementors for giving me a terrific tool with which to explore this subject.

The authors of the book The Little Schemer, Dan Friedman and Matthias Felleisen. This article is (in my mind at least) a massive expansion of chapter 9 of their book.