OK, I'm going to try to explain what the Y combinator is, why it's interesting, and how it works. I'll have to include a bit of background information to try to make this post self-contained.

The Y combinator has become a bit of a cool buzzphrase since Paul Graham named his startup incubator company Y Combinator. Why did he choose that name? I think the reason is that he wanted to attract the kind of programmers who were smart enough to have heard about the Y combinator, and who actually care about this kind of thing. So if you understand the Y combinator, maybe you can get some brownie points with Paul if you're interested in working with his company. If you meet him, say hi for me.

Before I explain what the Y combinator is or why (no pun intended) it's important, I want to explain the problem that the Y combinator solves. Let's say you have a very simple programming language which does not have recursion as a built-in feature. (If you don't know what recursion is, Wikipedia is your friend.) However, let's say that your programming language does have one fairly advanced feature: it supports *first-class functions*. This means that functions can be treated as data: they can be passed as arguments to other functions, they can be created on the fly, and they can be returned as the result of a function. As we'll see, some very simple languages do have this property.

So what is the problem that the Y combinator solves? Basically, it's this: **The Y combinator allows us to define recursive functions in computer languages that do not have built-in support for recursive functions, but that do support first-class functions.**

That's it. That's what the Y combinator allows us to do. It's important to note that the Y combinator is mainly important to theoretical computer scientists, especially programming language theorists who tend to work a lot with minimal programming languages. It's unlikely that you're going to be using it much in practical programming (or at least *I* haven't; maybe that just means that I'm not enlightened enough yet). So why should you care, other than as a way to impress Paul Graham when you present him with your cool internet startup idea? For this reason: it's one of the most beautiful and counterintuitive ideas in all of computer science. The Y combinator is one of those things that makes me feel like computer science is an art on the level of great painting or poetry. It's the same reaction I have when I think about Godel's incompleteness theorem. Basically, learning about the Y combinator will teach you to appreciate beauty in programming.

I mentioned above that the Y combinator is only useful in languages that support first-class functions but which do *not* support recursion. What kind of crazy-ass language would support first-class functions but not recursion? It turns out that (arguably) the simplest programming language there is, the untyped lambda calculus (hereafter called ULC for short), is such a language. (By the way, the word "calculus" here has nothing whatsoever to do with the mathematical field referred to as calculus, with derivatives, limits, integration and so on. It just means "a system of computation".) Here's a description of the ULC:

An expression in the untyped lambda calculus is one of three types of expression:

a variable, usually identified by a single letter, such as

`x`

,`y`

, or`z`

a lambda expression, which we'll denote by the form

`(lambda (<var>) <expr>)`

where`<var>`

represents some variable (called the "bound variable" of the lambda expression) and`<expr>`

represents some other ULC expression (called the "body" of the lambda expression)a function application, which we'll denote by the form

`(<expr1> <expr2>)`

where`<expr1>`

and`<expr2>`

represent some other ULC expressions

Of these three kinds of ULC expressions, only lambda expressions are considered to be "data" *i.e.* when a computation completes, the result should always be a lambda expression. This means that the lambda calculus is so incredibly minimal that there is no notion of numbers, characters, lists, strings, booleans or any other data types we are used to working with in our programming languages; the only kind of data are these lambda expressions, which are functions of a single argument (and which don't even have a name). Through a rule called "beta-reduction", some expressions in the lambda calculus can be converted into simpler expressions, and this is how you compute stuff in ULC. I don't want to get into a discussion of beta-reduction here; instead I'll refer you to the Wikipedia article on lambda calculus if you want to pursue this more deeply. The only relevance of ULC to this discussion is that it does not provide built-in support for recursive functions -- how could it, given that lambda expressions have no names? So if you want to prove that the lambda calculus is capable of representing any computable function (which it is) you would have to prove that it's able to represent arbitrary recursive functions (which are computable). Since we can't define recursive functions directly in ULC, we need the Y combinator to define recursive functions in this language.

Now that we know a little bit about lambda calculus, I can explain what the "combinator" part of the Y combinator means. A combinator is a lambda expression that has a particular property: the only variables allowed in the body of the lambda expression are bound variables. These variables may be the bound variable of the lambda expression itself, or else the lambda expression may contain other lambda expressions in its body, and inside those expressions the only variables that can be used are bound variables in either lambda expression. Basically, a combinator is a lambda expression which doesn't have any free variables -- all the variables used in the expression have to be bound variables. So `(lambda (x) x)`

is a combinator (known as the identity combinator or I for short) but `(lambda (x) y)`

is not (because `y`

in this expression is not a bound variable). Here are some other combinators:

The constant or K combinator:

`(lambda (x) (lambda (y) x))`

The S combinator:

`(lambda (f) (lambda (g) (lambda (x) ((f x) (g x)))))`

You should check that neither of these combinators contain any free variables. Of course, there are an infinite number of combinators. An interesting side note is that with only the S and K combinators, we can create a programming language that is as powerful as lambda calculus, because any expression in lambda calculus can be translated into an expression involving only S and K combinators. But that's a discussion for another time.

OK, now that we've spent some time in abstract-land, I want to make this discussion more concrete by centering it around an example problem. I'll be using the Scheme language for the rest of this discussion, because Scheme can easily model ULC (it was originally designed as a kind of extended lambda calculus, but has grown into a full-fledged practical programming language).

Consider the following function, which takes in a non-negative integer and returns the factorial of that integer:

(define (factorial n) (if (= n 0) 1 (* n (factorial (- n 1)))))

The factorial of a non-negative integer `n`

is the product of all integers from 1 to `n`

, with the special rule that the factorial of 0 is 1. The definition of factorial given above relies on the fact that `(factorial n) = (* n (factorial (- n 1)))`

for all positive integers (zero is a special case, also reflected in the definition of the function).

The interesting thing about this definition is that there is a call to the factorial function from inside the definition of the factorial function. This is what it means to have a recursive function -- somewhere inside the function, it will call itself. (To be more accurate, there are other kinds of recursion such as mutual recursion where a group of functions call other functions in the group, but we'll just consider this simple case.) And what we want to figure out is this: if the programming language didn't allow us to define recursive functions (for instance, if we couldn't use the word "factorial" inside the definition of the factorial function), could we still define this function?

The answer is yes, and I'll approach this answer in small steps. First of all, let's assume that there is one and only one function in our hypothetical computer language that is allowed to use recursion in its definition. That function will be called `Y`

, and here is its definition:

(define (Y f) (f (Y f)))

If we have this Y function (it's not a combinator yet), we will be able to define our factorial function even if we're not allowed to use recursion anywhere else in the language. And this is where things start to get beautiful, because we'll soon see that we don't actually need recursion to define Y either.

Now I'm going to rewrite our two functions in a more explicit form:

(define factorial (lambda (n) (if (= n 0) 1 (* n (factorial (- n 1)))))) (define Y (lambda (f) (f (Y f))))

These definitions are identical to the ones given previously in terms of how they work; a Scheme interpreter would simply convert the original forms to the new ones. The new forms are lambda expressions, but they have to have names since the name of the lambda expression is used to identify the function in the recursive calls.

Now here is where I'm going to pull a great big rabbit out of a hat.

Let's consider the following function:

(define almost-factorial (lambda (f) (lambda (n) (if (= n 0) 1 (* n (f (- n 1)))))))

This function is not the factorial function. It doesn't take a single integer argument; instead, it takes a *function* as an argument and returns another function. That other function (that starts with `(lambda (n) ...)`

) looks a lot like the factorial function, but instead of the word `factorial`

we have `f`

. A very interesting thing about `almost-factorial`

is that it is not recursive! What we've done is taken the name of the recursive function, changed it from `factorial`

to `f`

, and wrapped a `(lambda (f) ...)`

around the whole function body. We could use this exact same trick to convert any recursive function which only calls itself to a non-recursive function. However, the non-recursive version by itself won't do what the recursive function does; it will need a little help.

Now think about this: what would `f`

have to be in the `almost-factorial`

function to make whatever `almost-factorial`

returned be the factorial function? Let's imagine that we have already defined the factorial function previously; we'll call that (hypothetical) version of the factorial function `factorialA`

. If we passed that as an argument to the `almost-factorial`

function, out would come another function which would in fact also compute factorials! Check it out:

;; Assume we've defined factorialA somehow: (define factorialA ...) (define factorialB (almost-factorial factorialA))

Now, if you pass a non-negative integer as an argument to `factorialB`

, unless the argument is zero it will call `factorialA`

to get the factorial of the next smaller number, and multiply it by the argument to get the correct factorial as the result. If the argument *is* zero, then it will return 1 by the definition of `almost-factorial`

. So `factorialB`

will compute factorials, and it's just as valid a definition of the factorial function as `factorialA`

is. So you might wonder why you can't just write this:

(define factorialB (almost-factorial factorialB))

This is a nice idea, but Scheme won't allow you to define a value in terms of itself this way, because you have to know what `factorialB`

is in order to apply `almost-factorial`

to it. But the idea is right. What you really are trying to define here is a function `factorialB`

which has the property that when you pass it as the argument to `almost-factorial`

, you get the same function out. In other words, applying `almost-factorial`

to `factorialB`

will return `factorialB`

. This may seem like a weird idea, but if you think in terms of numbers instead of functions it's not so weird. Consider this equation:

x = cosine(x)

Here, we're looking for a real number `x`

that has the property that when you compute the cosine of `x`

, you get `x`

back. This number can easily be computed; if you have a calculator, just input zero and repeatedly hit the "cos" button until it stops changing. In fact, `x`

is approximately 0.73908513321516067. There is a name for computations like this: `x`

is called a *fixpoint* of the cosine function. Similarly, in the `factorialB`

example, we are looking for the fixpoint of `almost-factorial`

, which would give us the `factorialB`

function we want. The difference is that in the latter case, the fixpoint is a function, not a number.

Wouldn't it be nice if there was some magical function that we could pass `almost-factorial`

to, and which would return the fixpoint of `almost-factorial`

, which would in fact be `factorialB`

? I'm sure you, like me, have spent many sleepless nights wishing and hoping for such a magical function.

How would this function work? It would take a single argument, which is a function like `almost-factorial`

(which *itself* takes a single function argument), and returns its fixpoint. In other words, this magic function (which I'll call `magic-function`

) would do this:

(magic-function almost-factorial) ==> factorialB

However, we know from the equation above that this is equivalent to:

(magic-function almost-factorial) ==> (almost-factorial factorialB)

And since we already know what `factorialB`

is in terms of `magic-function`

and `almost-factorial`

, we can rewrite this as:

(magic-function almost-factorial) ==> (almost-factorial (magic-function almost-factorial))

Since `magic-function`

` should work for any function taking the same kind of argument as `

`almost-factorial`

, we can generalize this to:

```
``` (magic-function f)
==> (f (magic-function f))

This can be written as a Scheme definition:

(define (magic-function f)
(f (magic-function f)))

Or, making the lambda expression explicit, we get:

(define magic-function
(lambda (f)
(f (magic-function f))))

And guess what? This is *exactly* the definition of `Y`

given above. So `magic-function`

and `Y`

are the exact same function. Furthermore, if we have `almost-factorial`

we can get the factorial function as follows:

(define factorial
(Y almost-factorial))

If you prefer writing things out explicitly, our new definition for `factorial`

is as follows:

(define Y
(lambda (f)
(f (Y f))))
(define factorial
(Y (lambda (f)
(lambda (n)
(if (= n 0)
1
(* n (f (- n 1))))))))

This will in fact produce a valid factorial function, with no recursion in its definition whatsoever. All the "recursion action" is taken care of by the `Y`

function. What `Y`

does is compute fixpoints of single-argument functions, so it's often called the fixpoint function. However, it's not a combinator yet; that's the next step.

At this point I sadly have to make a detour and introduce a technical detail. The code as written above won't work in standard Scheme. If you use DrScheme as your Scheme interpreter (which I recommend) you can run it by using the "Lazy Scheme" language, which is not standard. The reason for this is that standard Scheme uses "strict evaluation" of function arguments, whereby all arguments to a function have to be evaluated fully before the function is applied to its arguments (even if some of the arguments aren't needed). Other languages like Haskell use "lazy evaluation" whereby function arguments are only evaluated if they have to be in order to continue the computation. The definition of `Y`

given above will work in a lazy language but not in a strict language. We'll show a definition of `Y`

that works in strict languages below.

If you've survived this far, rest assured that the hard stuff is behind you. Everything from here on will be quite mechanical. At least, I hope so.

We've reduced the need for explicit recursion in our language into a single recursive function called `Y`

. It would be nice if we could remove the need for explicit recursion altogether, by finding a non-recursive definition of `Y`

. That will be the Y combinator, also known as the fixed-point combinator. The Y combinator will be any function which is also a combinator and which satisfies this equation:

Y f = f (Y f)

for all functions `f`

for which the expression `(f (Y f))`

makes sense *i.e.* all functions that take a single argument which is also a function of a single argument, and return a function of a single argument (like `almost-factorial`

above). We'll call these kinds of functions "Y-compatible" functions. Note that any recursive function of one argument can trivially be converted into a Y-compatible function using the trick we used to convert the factorial function into `almost-factorial`

.

There are many fixed-point combinators. Here's one of the earliest ones ever discovered. Pulling another giant rabbit out of a hat, let's define the following function:

;; assume f is already defined
(define G
(lambda (x)
(f (x x)))))

What happens when we apply `G`

to itself? We get this:

(G G) = (f (G G))

Therefore, `(G G)`

is the fixpoint of `f`

. But this is exactly what we want `Y`

to compute! More formally, if we define:

(define Y
(lambda (f)
(G G)))

or, expanding out G:

(define Y
(lambda (f)
((lambda (x)
(f (x x)))
(lambda (x)
(f (x x))))))

then `(Y f) = (f (Y f))`

for Y-compatible functions. In other words, if we pass `f`

as an argument to `Y`

, we get back the fixpoint of `f`

. That's the property we wanted, and this `Y`

is the Y combinator of myth and legend. It enables us to define recursive functions in a language in which direct recursion is not allowed, as long as the language supports first-class functions.

As I mentioned above, this assumes that your Scheme implementation uses lazy evaluation, so it will work with Lazy Scheme in DrScheme. Technically, this version of the Y combinator is known as the "normal-order Y combinator", where "normal-order" is just a formal name for lazy evaluation (actually, normal-order evaluation is not *exactly* the same as lazy evaluation, but it's the same for our purposes). It is also possible to define a version of the Y combinator that works with strictly-evaluated languages like standard Scheme; that Y combinator is called the "applicative-order Y combinator", where "applicative-order" is just a formal name for strict (call-by-value) evaluation. Here's one way to define the applicative-order Y combinator:

(define Y
(lambda (f)
((lambda (x)
(f (lambda (y) ((x x) y))))
(lambda (x)
(f (lambda (y) ((x x) y)))))))

See if you can figure out how it works. You can type this into any standard Scheme interpreter, and define the factorial function as `(Y almost-factorial)`

, and it will work. Try it!

What does all this prove? It proves that **any language that supports first-class functions already supports recursion, even if recursion isn't built in to the language**. This is extremely counterintuitive (at least to me) and consequently extremely beautiful.

That's all for this post. I hope you've enjoyed our trip into the bowels of the Y combinator, and that you've gotten some idea of what this beautiful thing is and how it works.