How not to explain Haskell monads
One of the key concepts in Haskell that sets it apart from other programming languages is the concept of a "monad". People seem to find this difficult to learn (I did as well), and as a result there are loads of monad tutorials on the web, some of which are very good (I particularly like All About Monads by Jeff Newbern). It's even been said that writing a monad tutorial is a rite of passage for new Haskell programmers. However, one big problem with many monad tutorials is that they try to explain what monads are in reference to existing concepts that the reader already understands (I've even seen this in presentations by Simon Peyton-Jones, the main author of the GHC compiler and general Haskell grand poobah). This is a mistake, and I'm going to tell you why.
It's natural, when trying to explain what something is, to explain it by reference to things the other person already knows about. This works well when the new thing is similar in some ways to things the other person is familiar with. It breaks down utterly when the new thing is completely out of the experience of the person learning it. For instance, if you were trying to explain what fire is to a caveman who had never seen a fire, what would you say? "It's kind of like a cross between air and water, but hot..." Not very effective. Similarly, explaining what an atom is in terms of quantum mechanics is problematic, because we know that the electron doesn't _really_ orbit around the nucleus like a planet around a star, and the notion of a "delocalized electron cloud" doesn't really mean much. Feynman once said that nobody really understood quantum mechanics, and on an intuitive level that's true. But on a mathematical level, quantum mechanics is well-understood; we just don't have a good intuition for what the math really means.
How does this relate to monads? Time and again, in tutorials, blog posts and on the Haskell mailing lists, I've seen monads explained in one of two supposedly-intuitive ways: a monad is "kind of like an action" or "kind of like a container". How can something be both an action and a container? Aren't these separate concepts? Is a monad some kind of weird "active container"? No, but the point is that claiming that a monad is a kind of action or a kind of container is incorrect. So what is a monad, anyway?
Here's the answer: A monad is a purely abstract concept, with no fundamental relationship to anything you've probably ever heard of before. The notion of a monad comes from category theory, which is the most abstract branch of mathematics I know of. In fact, the whole point of category theory is to abstract out all of the structure of mathematics to expose the similarities and analogies between seemingly disparate areas (for instance, between algebra and topology), so as to condense mathematics into its fundamental concepts, and thus reduce redundancy. (I could go on about this for quite a while, but I'd rather get back to the point I'm trying to make.) Since I'm guessing that most programmers learning Haskell don't know much about category theory, monads are not going to mean anything to them. That doesn't mean that they need to learn all about category theory to use monads in Haskell (fortunately), but it does mean that they need to get comfortable thinking about things in a more abstract way than they are probably used to.
So monads in Haskell are not actions, and they're not containers. Why do people use these analogies? It's because a monad is such a general concept that specific kinds of monads can be thought of as actions or containers. The
IO monad can be considered to be an "action" (a kind of function wrapped in a data structure). The list monad and the
Maybe monad can be considered as containers. And there are probably other monads that can't be thought of usefully as either of these things, but as something else entirely. So why have one "monad" concept if it describes all these radically different things? It's because all these seemingly different things can have similar sets of operations that characterize how they work, even though the details of how those operations work are going to be radically different for each particular kind of monad. In other words, there are structural similarities between all monads, even if the details of what they do is quite different. That's what makes it useful to have the general concept of a monad in the language in the first place; you can provide support for one thing (a monad), including syntactic support, and it will work in all these different contexts.
Warning! I'm going to get into a fairly detailed discussion of Haskell language features below in order to make the above discussion more concrete. Wish me luck.
Here's an abbreviated definition of what a monad is in Haskell, leaving out some nonessential features:
class Monad m where (>>=) :: m a -> (a -> m b) -> m b return :: a -> m a
This says that in order for
m to be a monad, there must be two functions called
return that have the given type signatures (the stuff following the
:: is a type signature).
>>= is actually a binary operator usually called "bind", and
return is a regular function of one argument. This kind of "class" is not like the classes in C++ or Java; it's a type class which is a way of specifying what related types have in common. It's somewhat similar to the notion of an interface in Java, except that all type checking is done at compile time. Different types can become instances of this type class by saying how they would implement the two operations
return that characterize this particular type class. As long as the implementations have the correct types, it's allowable as an instance of the class.
Most type classes are simpler than the
Monad class; for instance, we have:
class Eq a where (==) :: a -> a -> Bool (/=) :: a -> a -> Bool
This says that any type
a is an instance of the type class
Eq (meaning "equality") as long as there are definitions of two functions (actually binary operators, but in Haskell any binary operator can also be treated as a function) called
/=. The first is the "equals" operator and the second is the "not-equals" operator. They each take two arguments of type
a and return a boolean result. The reason for the notation
a -> a -> Bool for a function of two arguments is beyond the scope of this discussion but is explained in any Haskell tutorial.
Looking back at the
Monad class above, we see one peculiarity: the thing that can be made into an instance of the
Monad class (called
m) appears in the type signature concatenated to another thing called
a, for instance:
return :: a -> m a
Where does the
a come from? And what does
m a mean?
a represents an arbitrary type (in Haskellese we say that
return is a polymorphic function); a more precise type signature would be this:
return :: forall a. a -> m a
return will work on an argument of any type
a, which means that the first argument can be of any type whatsoever. However, whatever that type
a is, the result of applying
return to a value of that type will be a value of type
m a. That means that
m is a type constructor, which is like a function on types.
m takes a type argument
a and produces a new type
m a (function application in Haskell doesn't use parentheses; you just put the function name in front of its arguments, with spaces separating each entity). So anything that can be a monad in Haskell first of all has to be a type constructor with a single type argument. Common monads in Haskell include the
IO monad, which does input and output, and the
Maybe monad, which is useful for computations that can fail. Both
Maybe are type constructors. There is no
IO type or
Maybe type, but there is e.g. an
IO Int type and a
Maybe String type.
Let's say I have a type constructor called
Foo which takes one type argument. That would mean that
Foo Int and
Foo String would be valid types. How would I tell Haskell that
Foo is a monad? I'd have to supply an instance declaration for
Foo which might look something like this:
instance Monad Foo where [definition of what return means for Foo] [definition of what >>= means for Foo]
Whatever the definition of
return for type
Foo a was, it would have to have the type
a -> Foo a, which means that it would take a value of type
a and convert it into a value of type
Foo a. So if type
Int, the result would have type
Whatever the definition of
>>= for type
Foo a was, it would have to have the type
Foo a -> (a -> Foo a) -> Foo a, which means that the first argument to
>>= has type
Foo a (a monadic value), the second argument has type
a -> Foo a (a function taking a value of type
a to a value of type
Foo a, and the result of the function application would have type
As far as Haskell is concerned, once you've done all that, you have your monad. To be a well-behaved monad (in category theory terms), there are additional rules that have to be obeyed. However, Haskell isn't powerful enough to enforce those rules, so it's up to the programmer to do it himself. That's a topic for another time, but the standard monads defined in Haskell are all well-behaved in this sense.
The point I'm trying to make is that any type constructor (taking one type argument) for which the return and bind functions are defined with the correct types is a monad in Haskell, regardless of whether it's "really" a kind of container, or a kind of action, or whatever. It's kind of like duck typing: if it looks like a monad, and acts like a monad, it is a monad. Anything that is compatible with the above definition is a monad, and trying to find an intuitive explanation that covers all monads is fruitless. I had a physics professor once who had trouble explaining what a tensor was. Finally he said "a tensor is something that transforms the way a tensor does!" In other words, the thing is defined by its behavior (represented by the equations it has to satisfy), not by any intuitive notions.
I realize that I haven't said anything whatsoever about what monads are good for; you can refer to existing Haskell tutorials for that, and maybe I'll talk about that here some other time.