<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss'><id>tag:blogger.com,1999:blog-11295132</id><updated>2009-11-22T01:33:52.109-08:00</updated><title type='text'>A Neighborhood of Infinity</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://blog.sigfpe.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default'/><link rel='alternate' type='text/html' href='http://blog.sigfpe.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><link rel='next' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default?start-index=26&amp;max-results=25'/><author><name>sigfpe</name><uri>http://www.blogger.com/profile/08096190433222340957</uri><email>noreply@blogger.com</email></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>236</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-11295132.post-6189536188959217455</id><published>2009-11-04T22:01:00.001-08:00</published><updated>2009-11-08T09:37:09.183-08:00</updated><title type='text'>Memoizing Polymorphic Functions with High School Algebra and Quantifiers</title><content type='html'>A little while back Conal Elliott asked about the memoization of &lt;a href="http://conal.net/blog/posts/memoizing-polymorphic-functions-part-one/"&gt;polymorphic types&lt;/a&gt;. I thought it'd be fun to describe how to memoize such functions in the same spirit as Ralph Hinze's use of &lt;a href="http://www.haskell.org/haskellwiki/Memoization"&gt;tries&lt;/a&gt; to memoize non-polymorphic functions. Along the way I'll try to give a brief introduction to quantified types in Haskell as well as showing some applications of the Yoneda lemma at work.&lt;br /&gt;&lt;br /&gt;You can think of a generalized trie for a function type T as a type that's isomorphic to T but doesn't have an arrow '-&amp;gt;' anywhere in its definition. It's something that contains all the same information as a function, but as a data structure rather than as a function. Hinze showed how to construct these by using the &lt;a href="http://en.wikipedia.org/wiki/Tarski's_high_school_algebra_problem"&gt;high school algebra&lt;/a&gt; axioms on non-polymorphic types. Polymorphic types are types involving quantification. So to make suitable tries for these we need to add some rules for handling quantifiers to high school algebra.&lt;br /&gt;&lt;br /&gt;At first it seems unlikely that we could memoize polymorphic types. When Hinze demonstrated how to construct generalized tries he showed how to make a tree structure that was tailored to the specific types in hand. With polymorphic functions we don't know what types we'll be dealing with, so we need a one-size fits all type. That sounds impossible, but it's not.&lt;br /&gt;&lt;br /&gt;The first thing we need to look at is universal quantification. Suppose F(a) is a type expression involving the symbol a. Then the type &amp;forall;a.F(a) can be thought of as being a bit like the product of F(a) for all possible values of a. So &amp;forall;a.F(a) is a bit like the imaginary infinite tuple&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;data Forall f a = (f Bool, f Int, f Char, f String, f [Bool], f (IO (Int -&gt; Char)), ...)&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;One reason you can think of it this way is that all of the projections exist. So for any type you choose, say B, there is a function (&amp;forall;a.F(a)) -&amp;gt; F(B). In Haskell the &amp;forall; is written as &lt;tt&gt;forall&lt;/tt&gt; and probably the best known example is the Haskell &lt;tt&gt;id&lt;/tt&gt; function of type &lt;tt&gt;forall a. a -&amp;gt; a&lt;/tt&gt;. For any concrete type B, &lt;tt&gt;id&lt;/tt&gt; gives us a function of type &lt;tt&gt;B -&amp;gt; B&lt;/tt&gt;. Note that we usually write the signature simply as &lt;tt&gt;a -&amp;gt; a&lt;/tt&gt;. Haskell implicitly prepends a &lt;tt&gt;forall&lt;/tt&gt; for every free variable in a type. We have to use the following line of code if we want to be able to use &lt;tt&gt;forall&lt;/tt&gt; explicitly (among other things):&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; {-# LANGUAGE RankNTypes, ExistentialQuantification, EmptyDataDecls #-}&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;I'll approach the tries through a series of propositions. So here's our first one:&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Proposition 1&lt;/b&gt;&lt;br /&gt;&amp;forall;a. a = 0&lt;br /&gt;&lt;br /&gt;0 is the type with no elements. &amp;forall;a. a is a type that can give us an object of type B for any B. There is no way to to this. How could such a function manufacture an element of B for any B with nothing to work from? It would have to work even for types that haven't been defined yet. (By the way, do you notice a similarity with the &lt;a href="http://en.wikipedia.org/wiki/Axiom_of_choice"&gt;axiom of choice&lt;/a&gt;?) So &amp;forall;a. a is the type with no elements. Here's the usual way to write the type with no elements:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; data Void&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;We also have:&lt;br /&gt;&lt;b&gt;Proposition 2&lt;/b&gt;&lt;br /&gt;&amp;forall;a. a&lt;sup&gt;a&lt;/sup&gt; = 1&lt;br /&gt;&lt;br /&gt;If we have a function of type &lt;tt&gt;forall a. a -&amp;gt; a&lt;/tt&gt; then for any element of type a you give it, it can give you back an element of type a. There's only one way to do this - it must give you back what you gave it. It can't transform that element in any way because there is no uniform transformation you could write that works for all values of a. So &amp;forall;a. a&lt;sup&gt;a&lt;/sup&gt; has one element, &lt;tt&gt;id&lt;/tt&gt;.&lt;br /&gt;&lt;br /&gt;A slightly more interesting proposition is this:&lt;br /&gt;&lt;b&gt;Proposition 3&lt;/b&gt;&lt;br /&gt;&amp;forall;a. a&lt;sup&gt;a.a&lt;/sup&gt; = 2&lt;br /&gt;&lt;br /&gt;A function of type &lt;tt&gt;(a,a) -&amp;gt; a&lt;/tt&gt; gives you an a when you give it a pair of a's. As we don't know in advance what type a will be we can't write code that examines a in any way. So a function of this type must return one of the pair, and which one it returns can't depend on the value of the argument. So there are only two functions of this type, &lt;tt&gt;fst&lt;/tt&gt; and &lt;tt&gt;snd&lt;/tt&gt;.&lt;br /&gt;&lt;br /&gt;We can rewrite the last proposition as &amp;forall;a. a&lt;sup&gt;a&lt;sup&gt;2&lt;/sup&gt;&lt;/sup&gt; = 2. That suggests that maybe &amp;forall;a. a&lt;sup&gt;a&lt;sup&gt;n&lt;/sup&gt;&lt;/sup&gt; = n for any type n. We can go one better. Here's another proposition:&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Proposition 4&lt;/b&gt;&lt;br /&gt;For any functor F and type n, &amp;forall;a. F(a)&lt;sup&gt;a&lt;sup&gt;n&lt;/sup&gt;&lt;/sup&gt; = F(n)&lt;br /&gt;&lt;br /&gt;I've &lt;a href="http://blog.sigfpe.com/2006/11/yoneda-lemma.html"&gt;already talked&lt;/a&gt; about that result. Here's an implementation of the isomorphisms:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; yoneda :: (forall b . (a -&amp;gt; b) -&amp;gt; f b) -&amp;gt; f a&lt;br /&gt;&amp;gt; yoneda t = t id &lt;br /&gt;&lt;br /&gt;&amp;gt; yoneda' :: Functor f =&amp;gt; f a -&amp;gt; (forall b . (a -&amp;gt; b) -&amp;gt; f b)&lt;br /&gt;&amp;gt; yoneda' a f = fmap f a&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Throughtout this article I'll use the convention that if f is an isomorphism, f' is its inverse.&lt;br /&gt;&lt;br /&gt;Now it's time to look at a kind of dual of the above propositions. Instead of universal quantification we'll consider existential quantification. The type &amp;exist;a.F(a) is a kind of infinite sum of all types of the form F(a). We can imagine it being a bit like the following definition:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;data Exist f a = ABool (f Bool) | AnInt (f Int) | AChar (f Char) | AString (f String) | AListOfBool (f [Bool]) ...&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;The important point is that given any element of any type we can turn it into an element of &amp;exist;a.F(a). You'd think that we could write this in Haskell as &lt;tt&gt;exists a. F(a)&lt;/tt&gt; but unfortunately Haskell does things differently. The idea behind the notation is this: as we can put anything of type F(b) into it. So if X = &amp;exist;a.F(a) then we have a function F(a) -&amp;gt; X for any a. So we have a function of type &amp;forall;a. F(a) -&amp;gt; X. So although this type is existentially quantified, its constructor is universally quantified. We tell Haskell to make a type existentially quantified by telling it the constructor is universally quantified:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; data Exist f a = forall a. Exist (f a)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;You can think of &lt;tt&gt;Exist&lt;/tt&gt; as not being a single constructor, but an infinite family of constructors, like &lt;tt&gt;ABool&lt;/tt&gt;, &lt;tt&gt;AnInt&lt;/tt&gt;, etc. above.&lt;br /&gt;&lt;br /&gt;If you have an element of an ordinary non-polymorphic algebraic sum type then the only way you can do stuff to it is to apply case analysis. To do something with an existential type means you have to perform a kind of infinite case analysis. So to do something with an element of &amp;exist;a. F(a) you need to provide an infinite family of functions, one for each possible type. In other words, you need to apply a function of type &amp;forall;a. F(a) &amp;rarr;B to it.&lt;br /&gt;&lt;br /&gt;Time for another proposition:&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Proposition 5&lt;/b&gt;&lt;br /&gt;&amp;exist;a. a = 1&lt;br /&gt;&lt;br /&gt;It seems weird at first that the sum of all types is 1. But once you put something into this type, you can no longer get any information about it back out again. If you try doing case analysis you have to provide a polymorphic function that accepts an argument of type &amp;forall;a. a, which is as good as saying you can't do any case analysis. Proposition 5 is actually a special case of the following:&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Proposition 6&lt;/b&gt;&lt;br /&gt;For any functor f, &amp;exist;a. (n&lt;sup&gt;a&lt;/sup&gt;, f(a)) = f(n)&lt;br /&gt;&lt;br /&gt;Briefly, the reason this is that the only thing you can do with a matching pair of n&lt;sup&gt;a&lt;/sup&gt; and f(a) is apply the former to the latter using &lt;tt&gt;fmap&lt;/tt&gt;. This is a kind of dual to the Yoneda lemma and I say more about it &lt;a href="http://blog.sigfpe.com/2009/03/dinatural-transformations-and-coends.html"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;We already know from high school algebra that this is true:&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Proposition 7&lt;/b&gt;&lt;br /&gt;x&lt;sup&gt;y+z&lt;/sup&gt;=x&lt;sup&gt;y&lt;/sup&gt;.x&lt;sup&gt;z&lt;/sup&gt;.&lt;br /&gt;&lt;br /&gt;We can write the isomorphisms explicitly:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; prop7 :: (Either a b -&amp;gt; c) -&amp;gt; (a -&amp;gt; c, b -&amp;gt; c)&lt;br /&gt;&amp;gt; prop7 f = (f . Left, f . Right)&lt;br /&gt;&lt;br /&gt;&amp;gt; prop7' :: (a -&amp;gt; c, b -&amp;gt; c) -&amp;gt; Either a b -&amp;gt; c&lt;br /&gt;&amp;gt; prop7' = uncurry either&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;It should be no surprise that the following 'infinite' version is true as well:&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Proposition 8&lt;/b&gt;&lt;br /&gt;x&lt;sup&gt;&amp;exist;a. f(a)&lt;/sup&gt; = &amp;forall;a. x&lt;sup&gt;f(a)&lt;/sup&gt;.&lt;br /&gt;&lt;br /&gt;We can write the isomorphism directly:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; prop8 :: (Exist f a -&amp;gt; x) -&amp;gt; forall a. f a -&amp;gt; x&lt;br /&gt;&amp;gt; prop8 g x = g (Exist x)&lt;br /&gt;&amp;gt; prop8' :: (forall a. f a -&amp;gt; x) -&amp;gt; Exist f a -&amp;gt; x&lt;br /&gt;&amp;gt; prop8' g (Exist x) = g x&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;We're now equipped to start constructing generalized tries for polymorphic functions. So let's consider memoizing the type &lt;tt&gt;forall a. [a] -&amp;gt; f a&lt;/tt&gt;, for &lt;tt&gt;f&lt;/tt&gt; a functor. At first this looks hard. We have to memoize a function that can take as argument a list of any type. How can we build a trie if we don't know anything in advance about the type of a? The solution is straightforward. We follow Hinze in applying a bit of high school algebra along with some of the propositions from above. By definition, L(a) = &lt;tt&gt;[a]&lt;/tt&gt; is a solution to the equation L(a) = 1+a.L(a). So we want to simplify &amp;forall;a. f(a)&lt;sup&gt;L(a)&lt;/sup&gt;. We have&lt;br /&gt;&lt;blockquote&gt;&lt;br /&gt;f(a)&lt;sup&gt;L(a)&lt;/sup&gt; = f(a)&lt;sup&gt;1+a.L(a)&lt;/sup&gt; = f(a).f(a)&lt;sup&gt;a.L(a)&lt;/sup&gt; = f(a).f(a)&lt;sup&gt;a+a&lt;sup&gt;2&lt;/sup&gt;.L(a)&lt;/sup&gt; = f(a).f(a)&lt;sup&gt;a&lt;/sup&gt;.f(a)&lt;sup&gt;a&lt;sup&gt;2&lt;/sup&gt;.L(a)&lt;/sup&gt;&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;I hope you can see a bit of a pattern forming. Let's define T(n) = f(a)&lt;sup&gt;a&lt;sup&gt;n&lt;/sup&gt;.L(a)&lt;/sup&gt;. Then&lt;br /&gt;&lt;blockquote&gt;&lt;br /&gt;T(n) = f(a)&lt;sup&gt;a&lt;sup&gt;n&lt;/sup&gt;.(1+a.L(a))&lt;/sup&gt; = f(a)&lt;sup&gt;a&lt;sup&gt;n&lt;/sup&gt;&lt;/sup&gt;.T(n+1) = f(n).T(n+1)&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;That's it! We can translate this definition directly into Haskell.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; data T f n = T (f n) (T f (Maybe n))&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;I'm using the fact that &lt;tt&gt;Maybe n&lt;/tt&gt; is standard Haskell for the type n+1. (But note that this equality is only valid when we think of the list type as data, not codata. So like with Hinze's original tries, values at infinite lists don't get memoized.)&lt;br /&gt;&lt;br /&gt;To build the isomorphism we need to trace through the steps in the derivation. At one point we used a&lt;sup&gt;n&lt;/sup&gt;+a&lt;sup&gt;1+n&lt;/sup&gt;.L(a) = a&lt;sup&gt;n&lt;/sup&gt;.L(a) which we can implement as the pair of isomorphisms:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; lemma :: Either (n -&amp;gt; a) (Maybe n -&amp;gt; a, [a]) -&amp;gt; (n -&amp;gt; a, [a])&lt;br /&gt;&amp;gt; lemma (Left f) = (f, [])&lt;br /&gt;&amp;gt; lemma (Right (f, xs)) = (\n -&amp;gt; f (Just n),f Nothing : xs)&lt;br /&gt;&lt;br /&gt;&amp;gt; lemma' :: (n -&amp;gt; a, [a]) -&amp;gt; Either (n -&amp;gt; a) (Maybe n -&amp;gt; a, [a])&lt;br /&gt;&amp;gt; lemma' (f, []) = Left f&lt;br /&gt;&amp;gt; lemma' (f, x:xs) = Right (maybe x f, xs)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;We can put the other steps together with this to give:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; memoize :: Functor f =&amp;gt; (forall a. (n -&amp;gt; a, [a]) -&amp;gt; f a) -&amp;gt; T f n&lt;br /&gt;&amp;gt; memoize f = let x = prop7 (f . lemma)&lt;br /&gt;&amp;gt;          in T (yoneda (fst x)) (memoize (snd x))&lt;br /&gt;&lt;br /&gt;&amp;gt; memoize' :: Functor f =&amp;gt; T f n -&amp;gt; forall a. (n -&amp;gt; a, [a]) -&amp;gt; f a&lt;br /&gt;&amp;gt; memoize' (T a b) = let y = (yoneda' a, memoize' b)&lt;br /&gt;&amp;gt;                 in prop7' y . lemma'&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Let's try a few examples. I'll use the identity functor for the first example.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; data I a = I a deriving Show&lt;br /&gt;&amp;gt; instance Functor I where&lt;br /&gt;&amp;gt;    fmap f (I a) = I (f a)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Here's our first test function and some data to try it on:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; test1 (f, xs) = I $ if length xs&amp;gt;0 then head xs else f ()&lt;br /&gt;&amp;gt; data1 = (const 1,[2,3,4])&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;In &lt;tt&gt;data1&lt;/tt&gt; we'e using a function to represent a kind of 'head' before the main list. For the next example we're leaving the first element of the pair undefined so that &lt;tt&gt;data2&lt;/tt&gt; is effectively of list type:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; test2 (f, xs) = reverse xs&lt;br /&gt;&amp;gt; data2 = (undefined,[1..10])&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;We can test them by building the memoized versions of these functions.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; memo1 = memoize test1 :: T I ()&lt;br /&gt;&amp;gt; memo2 = memoize test2 :: T [] Void&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;and then apply them&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; ex1 = memoize' memo1 data1&lt;br /&gt;&amp;gt; ex2 = memoize' memo2 data2&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;It appears to work!&lt;br /&gt;&lt;br /&gt;So what's actually going on? We have&lt;br /&gt;&lt;blockquote&gt;&lt;br /&gt;T(0) = f(0).T(1) = f(0).f(1).T(2) = ... = f(0).f(1).f(2).f(3)...&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;Now consider a function &lt;tt&gt;g :: [a] -&amp;gt; f a&lt;/tt&gt; applied to a list. If the list isn't an infinite stream then it must have a certain length, say n. From these elements it builds something of type &lt;tt&gt;f a&lt;/tt&gt;. However this &lt;tt&gt;f a&lt;/tt&gt; is constructed, each of the elements of type &lt;tt&gt;a&lt;/tt&gt; in it must be constructed from one of the n elements in the list. So if we apply g to the list [0,1,2,...,n-1] it will construct an element of &lt;tt&gt;f a&lt;/tt&gt; where each &lt;tt&gt;a&lt;/tt&gt; in it contains a label saying which position in the list it came from. (Compare with Neel's comment &lt;a href="http://conal.net/blog/posts/memoizing-polymorphic-functions-part-two/"&gt;here&lt;/a&gt;). If we use integers we don't get a perfect trie because there are more elements of type &lt;tt&gt;f Integer&lt;/tt&gt; than there are ways to indicate source positions. What we need is that for each length of list, n, we have a type with precisely n elements. And that's what the type n gives us.&lt;br /&gt;&lt;br /&gt;We can memoize many different functions this way, though if the functor f is a function type you'll need to use some of Hinze's techniques to eliminate them. And you'll notice I haven't used all of the propositions above. I've tried to give some extra tools to allow people to memoize more types than just my two examples.&lt;br /&gt;&lt;br /&gt;One last thing: I wouldn't use the type above to memoize in a real world application. But the methods above could be used to derive approximate tries that are efficient. One obvious example of an approximation would be to use &lt;tt&gt;Int&lt;/tt&gt; instead of the finite types.&lt;br /&gt;&lt;HR&gt;&lt;br /&gt;Update: I forgot to provide one very important link. This post was inspired by Thorsten Altenkirch's post &lt;a href="http://sneezy.cs.nott.ac.uk/fplunch/weblog/?p=112"&gt;here&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-6189536188959217455?l=blog.sigfpe.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sigfpe.com/feeds/6189536188959217455/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=11295132&amp;postID=6189536188959217455' title='8 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/6189536188959217455'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/6189536188959217455'/><link rel='alternate' type='text/html' href='http://blog.sigfpe.com/2009/11/memoizing-polymorphic-functions-with.html' title='Memoizing Polymorphic Functions with High School Algebra and Quantifiers'/><author><name>sigfpe</name><uri>http://www.blogger.com/profile/08096190433222340957</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='09401818062305273147'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>8</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11295132.post-216092501077380325</id><published>2009-10-31T08:35:00.000-07:00</published><updated>2009-10-31T09:51:20.855-07:00</updated><title type='text'>Buffon's Needle, the Easy Way</title><content type='html'>Buffon's needle is a popular probability problem. Rule lines on the floor a distance d apart and toss a needle of length l&amp;lt;d onto it. What is the probability that the needle crosses a line? A solution is described at &lt;a href="http://en.wikipedia.org/wiki/Buffon's_needle"&gt;wikipedia&lt;/a&gt; but it involves a double integral and some trigonometry. Nowhere does it mention that there is a less familiar but much simpler proof, though if you follow the links you'll find it. In addition, the usual solution involves &amp;pi; but gives little intuition as to why &amp;pi; appears. The simpler proof reveals that it appears naturally as a ratio of the circumference of a circle to its diameter. I've known this problem since I was a kid and yet I hadn't seen the simpler proof until a friend sold me his copy of &lt;em&gt;Introduction to Geometric Probability&lt;/em&gt; for $5 a few days ago.&lt;br /&gt;&lt;br /&gt;So instead of solving Buffon's needle problem we'll solve what appears to be a harder problem: when thrown, what is the expectation of the number of times a rigid curved (in a plane) wire length l (no restriction on l) crosses one of our ruled lines d apart? Here's an example of one of these 'noodles'. It crosses the ruled lines three times:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_UdKHLrHa05M/SuxeAlA9LII/AAAAAAAAAZ0/CLB-HGj1Mqg/s1600-h/noodle1.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 384px; height: 241px;" src="http://2.bp.blogspot.com/_UdKHLrHa05M/SuxeAlA9LII/AAAAAAAAAZ0/CLB-HGj1Mqg/s400/noodle1.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5398793417339513986" /&gt;&lt;/a&gt;&lt;br /&gt;Expectation is linear in the sense that E(A+B) = E(A)+E(B). So if we imagine the wire divided up into N very short segments of length l/N the expectation for the whole wire must be the sum of the expectations for all of the little pieces. If the wire is well behaved, for N large enough the segments are close to identical straight line segments. Here's a zoomed up view of a piece of our noodle:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_UdKHLrHa05M/SuxgwA9207I/AAAAAAAAAZ8/b2kMq3YzGjU/s1600-h/noodle2.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 167px;" src="http://1.bp.blogspot.com/_UdKHLrHa05M/SuxgwA9207I/AAAAAAAAAZ8/b2kMq3YzGjU/s400/noodle2.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5398796431319815090" /&gt;&lt;/a&gt;&lt;br /&gt;For a small straight line segment the expectation must simply be a function of the length of the segment. The expectation for the whole wire is the expectation for one segment multiplied by the number of segments. In other words, the expectation is proportional to the length of the wire and we can write E(l)=kl for some constant k.&lt;br /&gt;&lt;br /&gt;Now we know it's proportional to the length, we need to find the constant of proportionality, k. We need to 'calibrate' by thinking of a noodle shape where we know in advance exactly how many times it will cross the ruled lines. The following picture gives the solution:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_UdKHLrHa05M/SuxhivjcI0I/AAAAAAAAAaE/YpcBk5iPZJQ/s1600-h/noodle3.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 384px; height: 241px;" src="http://1.bp.blogspot.com/_UdKHLrHa05M/SuxhivjcI0I/AAAAAAAAAaE/YpcBk5iPZJQ/s400/noodle3.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5398797302818939714" /&gt;&lt;/a&gt;&lt;br /&gt;A circle of diameter d will almost always cross the lines in two places. The length of this wire is &amp;pi;d so E(&amp;pi;d)=2 and k=2/&amp;pi;d.&lt;br /&gt;&lt;br /&gt;The expected number of crossings for a wire of length l is 2l/&amp;pi;d. A needle of length l&amp;lt;d can intersect only zero or one times. So the expected value is in fact the probability of intersecting a line. The &lt;a href="http://en.wikipedia.org/wiki/Buffon%27s_noodle"&gt;solution&lt;/a&gt; is 2l/&amp;pi;d.&lt;br /&gt;&lt;br /&gt;No integrals needed.&lt;br /&gt;&lt;br /&gt;The expected number of crossings is an example of an invariant measure, something I've talked about &lt;a href="http://blog.sigfpe.com/2006/08/what-can-we-measure-part-i.html"&gt;before&lt;/a&gt;. There are only a certain number of functions of a noodle that are additive and invariant under rotations and just knowing these facts is almost enough to pin down the solution.&lt;br /&gt;&lt;br /&gt;&lt;H3&gt;Puzzle&lt;/H3&gt;&lt;br /&gt;Now I can leave you with a puzzle to solve. In the UK, a &lt;a href="http://en.wikipedia.org/wiki/50p_coin"&gt;50p coin&lt;/a&gt; is a 7 sided curvilinear polygon of constant width. Being constant width means a vending machine can consistently measure its width no matter how the coin is oriented in its plane. Can you use a variation of the argument above to compute the circumference of a 50p coin as a function of its width?&lt;br /&gt;&lt;HR&gt;&lt;br /&gt;&lt;iframe src="http://rcm.amazon.com/e/cm?t=sigfpe-20&amp;o=1&amp;p=8&amp;l=as1&amp;asins=0521596548&amp;fc1=000000&amp;IS2=1&amp;lt1=_blank&amp;m=amazon&amp;lc1=0000FF&amp;bc1=000000&amp;bg1=FFFFFF&amp;f=ifr" style="width:120px;height:240px;" scrolling="no" marginwidth="0" marginheight="0" frameborder="0"&gt;&lt;/iframe&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-216092501077380325?l=blog.sigfpe.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sigfpe.com/feeds/216092501077380325/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=11295132&amp;postID=216092501077380325' title='11 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/216092501077380325'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/216092501077380325'/><link rel='alternate' type='text/html' href='http://blog.sigfpe.com/2009/10/buffons-needle-easy-way.html' title='Buffon&apos;s Needle, the Easy Way'/><author><name>sigfpe</name><uri>http://www.blogger.com/profile/08096190433222340957</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='09401818062305273147'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_UdKHLrHa05M/SuxeAlA9LII/AAAAAAAAAZ0/CLB-HGj1Mqg/s72-c/noodle1.png' height='72' width='72'/><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>11</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11295132.post-7800915298217494425</id><published>2009-10-13T10:40:00.000-07:00</published><updated>2009-10-17T16:41:54.019-07:00</updated><title type='text'>"What Category do Haskell Types and Functions Live In?"</title><content type='html'>The question in my title is one that is often raised by Haskell programmers and it's a &lt;a href="http://www.keim.cs.gunma-u.ac.jp/~hamana/Papers/cpo.pdf"&gt;difficult one to answer rigorously&lt;/a&gt; and satisfyingly. But you may notice that I've put the question in quotes. This is because I'm not asking the question myself. Instead I want to argue that often there's a better question to ask.&lt;br /&gt;&lt;br /&gt;Superficially Haskell looks a lot like category theory. We have types that look like objects and functions that look like arrows. Given two functions we can compose them just how arrows compose in a category. We also have things that look like products, coproducts, other kinds of limit including infinite ones, natural transformations, &lt;a href="http://comonad.com/reader/2008/kan-extension-iii/"&gt;Kan extensions&lt;/a&gt;, monads and quite a bit of &lt;a href="http://blog.sigfpe.com/2008/05/interchange-law.html"&gt;2-categorical structure&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;So what goes wrong? (Besides the obvious problem that on a real computer, composing two working functions might result in a non-working function because you run out of memory.)&lt;br /&gt;&lt;br /&gt;Among other things, Haskell functions can fail to terminate because of things like infinite loops. Computer scientists often use the &lt;a href="http://en.wikibooks.org/wiki/Haskell/Denotational_semantics"&gt;notation&lt;/a&gt; ⊥ to represent a non-terminating computation. So when we talk of the Haskell integers, say, we don't just mean the values 0, 1, 2, ... but we also have to include ⊥. Unfortunately, when we do this we break a few things. For one thing we &lt;a href="http://www.mail-archive.com/haskell-cafe@haskell.org/msg35716.html"&gt;no longer have coproducts&lt;/a&gt;. But people find it useful to talk about algebraic datatypes as constructing types using products and coproducts and that would be a hard thing to give up.&lt;br /&gt;&lt;br /&gt;So we could restrict ourselves to considering only the category theory of computable functions. But that's not a &lt;a href="http://blog.sigfpe.com/2008/01/type-that-should-not-be.html"&gt;trivial thing&lt;/a&gt; to do either, and it doesn't reflect what real Haskell programs do.&lt;br /&gt;&lt;br /&gt;But even if we did manage to tweak this and that to get a bona fide category out of Haskell, all we'd get is a  custom tailored category that serves just one purpose. One theme running through much of my blog is that Haskell can be used to gain an understanding of a nice chunk of elementary category theory in general. Showing that Haskell simply gives us one example of a category really isn't that interesting. When I talked about the &lt;a href="http://blog.sigfpe.com/2006/11/yoneda-lemma.html"&gt;Yoneda Lemma&lt;/a&gt; I felt like I was talking about more than just one property of some obscure category that I can't actually define and that most category theorists have never even heard of.&lt;br /&gt;&lt;br /&gt;So what's going on? Why does it feel like Haskell is so naturally category theoretical while the details are so messy?&lt;br /&gt;&lt;br /&gt;Going back to my Yoneda lemma code, consider my definition of &lt;code&gt;check&lt;/code&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; check a f = fmap f a&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;It's straightforward to translate this into standard category theoretical notation that applies to any category. Even though the code is implemented in a specific programming language there's nothing about it that prevents it being translated for use in any category. So it doesn't matter what category Haskell corresponds to. What matters is that this bit of code is written in language suitable for any category. And the proof I give can be similarly translated.&lt;br /&gt;&lt;br /&gt;Consider this standard problem given to category theory students: prove that (A&amp;times;B)&amp;times;C is isomorphic to A&amp;times;(B&amp;times;C). In Haskell we could construct the isomorphism as:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; iso :: ((a,b),c) -&gt; (a,(b,c))&lt;br /&gt;&amp;gt; iso ((x,y),z) = (x,(y,z))&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;But now we hit a problem. We can straightforwardly translate this into mathematical notation and it will give a valid isomorphism in the category of sets, &lt;b&gt;Set&lt;/B&gt;. But &lt;code&gt;iso&lt;/code&gt; is written to accept arguments which are elements of some type. Not all objects in categories have elements, and arrows might not correspond to functions. And even if they did, if we were working with &lt;a href="http://en.wikipedia.org/wiki/Compactly_generated_Hausdorff_space"&gt;(certain types of)&lt;/a&gt; topological spaces we'd be giving a construction for the isomorphism, and our proof would show the underlying function had an inverse, but we'd be failing to show it's continuous. It looks like writing Haskell code like this only tells us about a particularly limited type of category.&lt;br /&gt;&lt;br /&gt;But not so. Type &lt;code&gt;cabal install pointfree&lt;/code&gt; to install pointfree and then run &lt;code&gt;pointfree 'iso ((x,y),z) = (x,(y,z))'&lt;/code&gt; and it responds with&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&gt; iso = uncurry (uncurry ((. (,)) . (.) . (,)))&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;pointfree rewrites a function in point-free style. There are no x's, y's or z's in the written version, only &lt;code&gt;uncurry&lt;/code&gt;, composition &lt;code&gt;(.)&lt;/code&gt;, and the product function &lt;code&gt;(,)&lt;/code&gt;. These exist in all &lt;a href="http://en.wikipedia.org/wiki/Cartesian_closed_category"&gt;Cartesian closed categories&lt;/a&gt; (CCC). So our original function definition, despite apparently referring to elements, can be mechanically turned into a definition valid for any CCC. We can now reinterpret the meaning of x, y and z in the original definition as not referring to elements at all, but as labels indicating how a bunch of fairly general categorically defined primitives are to be assembled together.&lt;br /&gt;&lt;br /&gt;(Incidentally, my first foray into pure functional programming was to write a &lt;a href="http://homepage.mac.com/sigfpe/Computing/sasl.html"&gt;SASL compiler&lt;/a&gt;. It was little more than a bunch of rewrite rules to convert SASL code into point-free compositions of S, K and I, among other combinators.)&lt;br /&gt;&lt;br /&gt;What we have here is an example of an &lt;a href="http://en.wikipedia.org/wiki/Internal_language"&gt;internal language&lt;/a&gt; at work. I'm not sure what a precise definition of "internal language" is, but it's something like this: take a formal system and find a way to translate it to talk about categories in such a way that true propositions in one are turned into true propositions in the other. The formal system now becomes an internal language for those categories.&lt;br /&gt;&lt;br /&gt;The best known example is &lt;a href="http://en.wikipedia.org/wiki/Topos"&gt;topos theory&lt;/a&gt;. A topos is a category that has a bunch of properties that make it a bit like &lt;b&gt;Set&lt;/b&gt;. We take a subset of the language of set theory that makes use of just these properties. Our propositions that look like set theory can now be mechanically translated into statements valid of all toposes. This means we can happily write lots of arguments referring to elements of objects in a topos and get correct results.&lt;br /&gt;&lt;br /&gt;In their book &lt;em&gt;Introduction to Higher-Order Categorical Logic&lt;/em&gt;, Lambek and Scott showed that "pure typed &amp;lambda;-calculus" is the internal language of CCCs. Even though expressions in the &amp;lambda;-calculus contain named variables these can always be eliminated and replaced by point-free forms. Theorems about typed &amp;lambda;-calculus are actually theorems about CCCs. When we write Haskell code with 'points' in it, we don't need to interpret these literally.&lt;br /&gt;&lt;br /&gt;So despite not knowing which category Haskell lives in, much of the code I've written in these web pages talks about a wide variety of categories because Haskell is essentially a programming language based on an internal language (or a bunch of them). Despite the fact that even a function like &lt;code&gt;iso&lt;/code&gt; might have quite &lt;a href="http://blog.sigfpe.com/2008/02/how-many-functions-are-there-from-to.html"&gt;complex semantics&lt;/a&gt; when run on a real computer, the uninterpreted programs themselves often represent completely rigorous, and quite general pieces of category theory.&lt;br /&gt;&lt;br /&gt;So the question to ask isn't "what category does Haskell live in?" but "what class of category corresponds to the internal language in which I wrote this bit of code?". I partly answer this question for do-notation (a little internal language of its own) in an &lt;a href="http://blog.sigfpe.com/2008/11/some-thoughts-on-reasoning-and-monads.html"&gt;earlier post&lt;/a&gt;. Haskell (and various subsets and extensions) is essentially a way to give semantics to internal languages for various classes of category. However complicated and messy those semantics might get on a real world computer, the language itself is a thing of beauty and more general than might appear at first.&lt;br /&gt;&lt;br /&gt;BTW This trick of reinterpreting what look like variables as something else was used by Roger Penrose in his &lt;a href="http://en.wikipedia.org/wiki/Abstract_index_notation"&gt;abstract index notation&lt;/a&gt;. Just as we can sanitise variables by reinterpreting them as specification for plumbing in some category, Penrose reinterpreted what were originally indices into arrays of numbers as plumbing in another category. Actually, this isn't just an analogy. With a little reformatting abstract index notation is very close to the way I've been using monads to work with vector spaces so that abstract index notation can be viewed as a special case of an internal language for categories with monads.&lt;br /&gt;&lt;HR&gt;&lt;br /&gt;&lt;iframe src="http://rcm.amazon.com/e/cm?t=sigfpe-20&amp;o=1&amp;p=8&amp;l=as1&amp;asins=0521356539&amp;fc1=000000&amp;IS2=1&amp;lt1=_blank&amp;m=amazon&amp;lc1=0000FF&amp;bc1=000000&amp;bg1=FFFFFF&amp;f=ifr" style="width:120px;height:240px;" scrolling="no" marginwidth="0" marginheight="0" frameborder="0"&gt;&lt;/iframe&gt;&lt;br /&gt;&lt;iframe src="http://rcm.amazon.com/e/cm?t=sigfpe-20&amp;o=1&amp;p=8&amp;l=as1&amp;asins=0521337070&amp;fc1=000000&amp;IS2=1&amp;lt1=_blank&amp;m=amazon&amp;lc1=0000FF&amp;bc1=000000&amp;bg1=FFFFFF&amp;f=ifr" style="width:120px;height:240px;" scrolling="no" marginwidth="0" marginheight="0" frameborder="0"&gt;&lt;/iframe&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-7800915298217494425?l=blog.sigfpe.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sigfpe.com/feeds/7800915298217494425/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=11295132&amp;postID=7800915298217494425' title='22 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/7800915298217494425'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/7800915298217494425'/><link rel='alternate' type='text/html' href='http://blog.sigfpe.com/2009/10/what-category-do-haskell-types-and.html' title='&quot;What Category do Haskell Types and Functions Live In?&quot;'/><author><name>sigfpe</name><uri>http://www.blogger.com/profile/08096190433222340957</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='09401818062305273147'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>22</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11295132.post-3880470744869663438</id><published>2009-10-10T09:12:00.000-07:00</published><updated>2009-10-11T08:17:37.816-07:00</updated><title type='text'>Vectors, Invariance, and Math APIs</title><content type='html'>Many software libraries, especially those for physics or 3D graphics, are equipped with tools for working with vectors. I'd like to point out how in these libraries the functions for manipulating vectors sometimes have special and useful properties that make it worthwhile to separate them out into their own interface.&lt;br /&gt;&lt;br /&gt;&lt;H3&gt;Two types of Function&lt;/H3&gt;&lt;br /&gt;Suppose an object of mass m is moving with velocity &lt;b&gt;v&lt;/b&gt; and we apply force &lt;b&gt;f&lt;/b&gt; to it for time t. What is the final velocity? This is given by g:&lt;br /&gt;&lt;blockquote&gt;&lt;br /&gt;g(m,t,&lt;b&gt;f&lt;/b&gt;,&lt;b&gt;v&lt;/b&gt;) = &lt;b&gt;v&lt;/b&gt;+(t/m)*&lt;b&gt;f&lt;/b&gt;&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;Now suppose that R is a rotation operation, typically represented by a matrix. What happens if we apply it to both of the vector arguments of g? &lt;br /&gt;&lt;blockquote&gt;&lt;br /&gt;g(m,t,R&lt;b&gt;f&lt;/b&gt;,R&lt;b&gt;v&lt;/b&gt;) = R&lt;b&gt;v&lt;/b&gt;+(t/m)*R&lt;b&gt;f&lt;/b&gt; = Rg(m,t,&lt;b&gt;f&lt;/b&gt;,&lt;b&gt;v&lt;/b&gt;)&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;In other words, rotating the vector arguments is the same as rotating the vector result.&lt;br /&gt;&lt;br /&gt;Another example: Consider the function that gives the force on an electric charge as a function of its velocity and the magnetic field:&lt;br /&gt;&lt;blockquote&gt;&lt;br /&gt;f(e,&lt;b&gt;v&lt;/b&gt;,&lt;b&gt;B&lt;/b&gt;) = e&lt;b&gt;v&lt;/b&gt;&amp;times;&lt;b&gt;B&lt;/b&gt;&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;It's essentially just the cross product. If you rotate both of the arguments to the cross product then the result is &lt;a href="http://en.wikipedia.org/wiki/Cross_product#Algebraic_properties"&gt;rotated&lt;/a&gt; too. The result is that&lt;br /&gt;&lt;blockquote&gt;&lt;br /&gt;f(e,R&lt;b&gt;v&lt;/b&gt;,R&lt;b&gt;B&lt;/b&gt;) = Rf(e,&lt;b&gt;v&lt;/b&gt;,&lt;b&gt;B&lt;/b&gt;)&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;On the other hand, many 3D APIs come with a function to perform componentwise multiplication of vectors. Write vectors &lt;b&gt;x&lt;/b&gt; as triples (x&lt;sub&gt;0&lt;/sub&gt;,x&lt;sub&gt;1&lt;/sub&gt;,x&lt;sub&gt;2&lt;/sub&gt;), and so on, we can write such a function as:&lt;br /&gt;&lt;blockquote&gt;&lt;br /&gt;f(&lt;b&gt;x&lt;/b&gt;,&lt;b&gt;y&lt;/b&gt;) = (x&lt;sub&gt;0&lt;/sub&gt;y&lt;sub&gt;0&lt;/sub&gt;,x&lt;sub&gt;1&lt;/sub&gt;y&lt;sub&gt;1&lt;/sub&gt;,x&lt;sub&gt;2&lt;/sub&gt;y&lt;sub&gt;2&lt;/sub&gt;)&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;You can show that this doesn't have a similar property.&lt;br /&gt;&lt;br /&gt;&lt;H3&gt;Rotational Invariance&lt;/H3&gt;&lt;br /&gt;To make things easy, let's restrict ourselves to functions of scalars and vectors. And when I say vector, I'm talking strictly about vectors representing magnitude and direction, but not positions. Examples of such vectors are velocities, accelerations, angular velocities, magnetic fields, and the difference between two positions. A function is said to be &lt;i&gt;rotationally invariant&lt;/i&gt; if applying a rotation R to all of its vector arguments results in the same thing as applying R to all of the vectors in its value. This allows you to have a function that returns multiple vectors, like a tuple or array.&lt;br /&gt;&lt;br /&gt;The first two functions I described above were rotationally invariant but the third wasn't. Notice how the first two examples also described physical processes. This is the important point: as far as we know, all of the laws of physics are &lt;a href="http://en.wikipedia.org/wiki/Rotational_invariance"&gt;rotationally invariant&lt;/a&gt;. If you write down an equation describing a physical process then replacing all of the vectors in it by their rotated counterparts must also result in a valid equation. So if you're describing a physical process with a computer program, and you end up with a function that isn't rotationally invariant, you've made a mistake somewhere.&lt;br /&gt;&lt;br /&gt;&lt;H3&gt;Vector APIs&lt;/H3&gt;&lt;br /&gt;Vector APIs frequently come with all manner of functions. Some have the invariance property and some don't. If you write code that you'd like to be rotationally invariant, but it turns out that it isn't, you usually have to examine the code to find the bug. But if you separate the invariant functions into their own interface, and then write code using just that interface, the code is guaranteed to be invariant. If your programming language has reasonably strict types then you may even be able to arrange things so that the type signature of the function alone is enough to tell you that the function is invariant. In effect you are able to make the compiler prove that your function is invariant.&lt;br /&gt;&lt;br /&gt;(As an aside, this is an example of why a good type system does much more than you might at first have guessed. They don't just stop you making typos, they can do things like prove that your programs satisfy certain geometrical properties.)&lt;br /&gt;&lt;br /&gt;So what functions would you have in such an API? Among the essential rotationally invariant functions are:&lt;br /&gt;&lt;br /&gt;1. Multiplcation of a vector by a scalar&lt;br /&gt;2. Addition of vectors&lt;br /&gt;3. Dot product&lt;br /&gt;4. Cross product&lt;br /&gt;&lt;br /&gt;In terms of these you can build functions such as&lt;br /&gt;&lt;br /&gt;1. Vector length&lt;br /&gt;2. Vector normalization&lt;br /&gt;3. Rotation of one vector around an axis specified by another vector&lt;br /&gt;4. Linear interpolation between vectors&lt;br /&gt;&lt;br /&gt;What kinds of functions would be excluded?&lt;br /&gt;&lt;br /&gt;1. Constructing a vector from three scalars, ie. f(x,y,z) = (x,y,z).&lt;br /&gt;2. Constructing a vector form a single scalar, ie. f(x) = (x,x,x).&lt;br /&gt;3. Extracting the ith component of a vector, ie. f(i,(x&lt;sub&gt;0&lt;/sub&gt;,x&lt;sub&gt;1&lt;/sub&gt;,x&lt;sub&gt;2&lt;/sub&gt;)) = x&lt;sub&gt;i&lt;/sub&gt;.&lt;br /&gt;4. Pointwise multiplication of vectors.&lt;br /&gt;5. Computing the elementwise cosine of a vector.&lt;br /&gt;&lt;br /&gt;On seeing the first excluded example above you might ask "how am I supposed to construct vectors?" The point is that you don't program exclusively with an invariant API, you simply use it whenever you need to prove invariance.&lt;br /&gt;&lt;br /&gt;&lt;H3&gt;Coordinate System Invariance&lt;/H3&gt;&lt;br /&gt;One purpose of writing to a particular interface is that it allows the API to hide implementation details from the user. Using a rotationally invariant API has a role to serve here. For example, many 3D renderers allow you to write &lt;a href="http://en.wikipedia.org/wiki/Shader"&gt;shaders&lt;/a&gt;. These are essentially functions that compute the colour of a piece of geometry that needs rendering. You write a shader and the renderer then calls your shader as needed when a fragment of geometry passes through its pipeline. Frequently these are used for lighting calculations but there are all kinds of other things that may be computed in shaders.&lt;br /&gt;&lt;br /&gt;In a 3D renderer different parts of the computation are often performed in different coordinate systems. For example it may be convenient to perform lighting calculations in a coordinate system oriented with the direction of the light. But the author of a renderer doesn't want to be committed to a particular choice. In order to do this, it is essential to be able to write shaders that are agnostic about which coordinate system is being used. If we work with rotationally invariant functions, our shaders are guaranteed to be agnostic in this way (assuming that the only kind of coordinate change that takes place is a rotation).&lt;br /&gt;&lt;br /&gt;&lt;H3&gt;Lots More Types&lt;/H3&gt;&lt;br /&gt;I've concentrated on just one type of invariance, rotational invariance. If we consider more types of invariance then more types of interface naturally emerge. It would take too long to cover all of the details here so I'm just going to briefly sketch the beginnings of the more general case. So just read this section as a list of pointers to further reading.&lt;br /&gt;&lt;br /&gt;For example, some functions are invariant under translations. These can be thoght of as functions of points in space. If we allow more general linear transformations then we find that some common functions transform 'oppositely' to vectors. In particular, normals to surfaces transform in this way. In fact, Pixar's &lt;a href="http://en.wikipedia.org/wiki/RenderMan_Interface_Specification"&gt;Renderman&lt;/a&gt; has three distinct types, vectors, points and normals that captures these different invariances.&lt;br /&gt;&lt;br /&gt;If we go back to rotations again but now extend these by allowing reflections then we find an interesting new phenomenon. For example, consider the result of reflecting in the x-y-plane, followed by reflecting in the y-z-plane followed by reflecting in the x-z-plane. This simply multiplies vectors by -1. Dot product is invariant under this: (-&lt;b&gt;x&lt;/b&gt;)&amp;middot;(-&lt;b&gt;y&lt;/b&gt;)=&lt;b&gt;x&lt;/b&gt;&amp;middot;&lt;b&gt;y&lt;/b&gt;. But cross product isn't because (-&lt;b&gt;x&lt;/b&gt;)&amp;times;(-&lt;b&gt;y&lt;/b&gt;)=&lt;b&gt;x&lt;/b&gt;&amp;times;&lt;b&gt;y&lt;/b&gt;. Even though the cross product is apparently vector, it doesn't get multiplied by -1. When we start considering invariance under reflection we find that some vectors behave differently. These are the &lt;a href="http://en.wikipedia.org/wiki/Pseudovector"&gt;pseudovectors&lt;/a&gt; and in effect they have their own separate type and interface. Interestingly, nature likes to keep pseudovectors and vectors separate except in &lt;a href="http://en.wikipedia.org/wiki/Parity_%28physics%29"&gt;parity violating&lt;/a&gt; phenomena. There are even &lt;a href="http://en.wikipedia.org/wiki/Pseudoscalar"&gt;pseudoscalars&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Incidentally, if you consider invariance under scaling you're led to the idea of encoding &lt;a href="http://en.wikipedia.org/wiki/Dimensional_analysis"&gt;dimensions&lt;/a&gt; in your &lt;a href="http://research.sun.com/projects/plrg/Fortress/overview.html"&gt;types&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;H3&gt;Conclusion&lt;/H3&gt;&lt;br /&gt;If you're writing a vector API think about the invariance properties that your functions may have. If any are useful then it may be worth placing those in a separate interface. The more distinct types you have, the more properties you can make your compiler prove. Obviously this needs to be balanced against practicality, complexity for users and what you actually need. To some extent, many existing APIs make some of these distinctions with varying degrees of strictness. The main point I want to make clear is that these distinctions are based on &lt;i&gt;invariance&lt;/i&gt; properties, something that not all developers of such APIs are aware of.&lt;br /&gt;&lt;br /&gt;At some point I hope to return to this topic and enumerate all of the common vector-like types in a single framework. Unfortunately it's a big topic and I've only been able to scratch the surface here. In particular there are some subtle interplays between dimensions and types.&lt;br /&gt;&lt;br /&gt;On a deeper level, I think there must be some type theoretical framework in which these invariance properties are &lt;a href="http://homepages.inf.ed.ac.uk/wadler/topics/parametricity.html"&gt;free theorems&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Update: I believe some of this material is covered in Jimm Blinn's &lt;a href="http://portal.acm.org/citation.cfm?id=1069625"&gt;Vectors and Geometry and Objects, Oh My!&lt;/a&gt;, but I don't have access to that. I suspect that there is one big difference in my presentation: I'm not so interested here in vectors (or normals or whatever) in themselves but as defining interfaces to functions with invariance properties. Like the way category theorists shift the focus from objects to arrows. It makes a difference because it immediately gives theorems that our code is guaranteed to satisfy. It's the invariance property of the cross product (say) that is useful here, not the fact that the components of a vector transform a certain way when we change coordinates (because I might not even want to refer to coordinates in my code).&lt;br /&gt;&lt;br /&gt;&lt;H3&gt;Example Code&lt;/H3&gt;&lt;br /&gt;&lt;br /&gt;To show that I'm talking about something very simple, but still powerful, here's some Haskell code:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; data Triple = T Float Float Float deriving Show&lt;br /&gt;&lt;br /&gt;&amp;gt; class Vector v where&lt;br /&gt;&amp;gt;   (.+) :: v -&amp;gt; v -&amp;gt; v&lt;br /&gt;&amp;gt;   (.*) :: Float -&amp;gt; v -&amp;gt; v&lt;br /&gt;&amp;gt;   dot :: v -&amp;gt; v -&amp;gt; Float&lt;br /&gt;&amp;gt;   cross :: v -&amp;gt; v -&amp;gt; v&lt;br /&gt;&lt;br /&gt;&amp;gt; instance Vector Triple where&lt;br /&gt;&amp;gt;   T x y z .+ T u v w = T (x+u) (y+v) (z+w)&lt;br /&gt;&amp;gt;   a .* T x y z = T (a*x) (a*y) (a*z)&lt;br /&gt;&amp;gt;   dot (T x y z) (T u v w) = x*u+y*v+z*w&lt;br /&gt;&amp;gt;   cross (T x y z) (T u v w) = T&lt;br /&gt;&amp;gt;       (y*w-v*z)&lt;br /&gt;&amp;gt;       (z*u-x*w)&lt;br /&gt;&amp;gt;       (x*v-y*u)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;You can freely apply the four primitive functions to elements of type &lt;tt&gt;Triple&lt;/tt&gt; but if you have a function of, say, signature&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; f :: Vector v =&amp;gt; (v,v,Float) -&amp;gt; [(v,v)]&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;you are guaranteed it is invariant.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-3880470744869663438?l=blog.sigfpe.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sigfpe.com/feeds/3880470744869663438/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=11295132&amp;postID=3880470744869663438' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/3880470744869663438'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/3880470744869663438'/><link rel='alternate' type='text/html' href='http://blog.sigfpe.com/2009/10/vectors-invariance-and-math-apis.html' title='Vectors, Invariance, and Math APIs'/><author><name>sigfpe</name><uri>http://www.blogger.com/profile/08096190433222340957</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='09401818062305273147'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11295132.post-8796882367440566026</id><published>2009-09-29T20:04:00.001-07:00</published><updated>2009-09-29T20:06:39.715-07:00</updated><title type='text'>test, ignore</title><content type='html'>&lt;pre lang="eq.latex"&gt;&lt;br /&gt;\int_{0}^{1}\frac{x^{4}\left(1-x\right)^{4}}{1+x^{2}}dx&lt;br /&gt;=\frac{22}{7}-\pi&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;You know, I think I won't delete this. It's a celebration of my having embedded some TeX using the advice at &lt;a href="http://www.botcyb.org/2008/10/rendering-latex-in-blogger.html"&gt;Bot Cyborg&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-8796882367440566026?l=blog.sigfpe.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sigfpe.com/feeds/8796882367440566026/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=11295132&amp;postID=8796882367440566026' title='8 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/8796882367440566026'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/8796882367440566026'/><link rel='alternate' type='text/html' href='http://blog.sigfpe.com/2009/09/test-ignore_29.html' title='test, ignore'/><author><name>sigfpe</name><uri>http://www.blogger.com/profile/08096190433222340957</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='09401818062305273147'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>8</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11295132.post-5532328337205821486</id><published>2009-09-26T13:18:00.001-07:00</published><updated>2009-09-27T18:28:17.370-07:00</updated><title type='text'>Finite Differences of Types</title><content type='html'>&lt;H3&gt;Finite Differences of Real-Valued Functions&lt;/H3&gt;&lt;br /&gt;Conor McBride's discovery that you can &lt;a href="http://strictlypositive.org/diff.pdf"&gt;differentiate container types&lt;/a&gt; to get useful constructions like &lt;a href="http://en.wikibooks.org/wiki/Haskell/Zippers"&gt;zippers&lt;/a&gt; has to be one of the most amazing things I've seen in computer science. But seeing the success of differentiation suggests the idea of taking a step back and looking at finite differences.&lt;br /&gt;&lt;br /&gt;Forget about types for the moment and consider functions on a field R. Given a function f:R&amp;rarr;R we can define &amp;Delta;f:R&amp;times;R&amp;rarr;R by&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;&lt;br /&gt;&amp;Delta;f(x,y) = (f(x)-f(y))/(x-y)&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;&amp;Delta; is the finite difference operator. But does it make any kind of sense for types? At first it seems not because we can't define subtraction and division of types. Can we massage this definition into a form that uses only addition and multiplication?&lt;br /&gt;&lt;br /&gt;First consider &amp;Delta;c where c is a constant function. Then &amp;Delta;c(x,y)=0.&lt;br /&gt;&lt;br /&gt;Now consider the identity function i(x)=x. Then &amp;Delta;i(x,y)=1.&lt;br /&gt;&lt;br /&gt;&amp;Delta; is linear in the sense that if f and g are functions, &amp;Delta;(f+g) = &amp;Delta;f+&amp;Delta;g.&lt;br /&gt;&lt;br /&gt;Now consider the product of two functions, f and g.&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;&lt;br /&gt;&amp;Delta;(fg)(x,y) = (f(x)g(x)-f(y)g(y))/(x-y)&lt;br /&gt;&lt;br /&gt;= (f(x)g(x)-f(x)g(y)+f(x)g(y)-f(y)g(y))/(x-y)&lt;br /&gt;&lt;br /&gt;= f(x)&amp;Delta;g(x,y)+g(y)&amp;Delta;f(x,y)&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;So now we have a Leibniz-like rule. We can compute finite differences of polynomials without using subtraction or division! What's more, we can use these formulae to difference algebraic functions defined implicitly by polynomials. For example consider f(x)=1/(1-x). We can rewrite this implicitly, using only addition and multiplication, as&lt;br /&gt;&lt;blockquote&gt;&lt;br /&gt;f(x) = 1+x f(x)&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;Differencing both sides we get&lt;br /&gt;&lt;blockquote&gt;&lt;br /&gt;&amp;Delta;f(x,y) = x &amp;Delta;f(x,y)+f(y)&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;That tells us that &amp;Delta;f(x,y) = f(x)f(y).&lt;br /&gt;&lt;br /&gt;&lt;H3&gt;Finite Differences of Types&lt;/H3&gt;&lt;br /&gt;We're now ready to apply our operator to types. Instead of functions on the reals we work with functors on the set of types. A good first example container is the functor F(X)=X&lt;sup&gt;N&lt;/sup&gt; for an integer N.  This is basically just an array of N elements of type X.  We could apply the Leibniz rule repeatedly, but we expect to get the same result as if we'd worked over the reals. So setting f(x)=x&lt;sup&gt;N&lt;/sup&gt; we get&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;&lt;br /&gt;&amp;Delta;f(x,y) = (x&lt;sup&gt;N&lt;/sup&gt;-y&lt;sup&gt;N&lt;/sup&gt;)/(x-y) = x&lt;sup&gt;N-1&lt;/sup&gt;+x&lt;sup&gt;N-2&lt;/sup&gt;y+x&lt;sup&gt;N-3&lt;/sup&gt;y&lt;sup&gt;2&lt;/sup&gt;+...+y&lt;sup&gt;N-1&lt;/sup&gt;&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;So we know that on types, &amp;Delta;F(X,Y) = X&lt;sup&gt;N-1&lt;/sup&gt;+X&lt;sup&gt;N-2&lt;/sup&gt;Y+...+Y&lt;sup&gt;N-1&lt;/sup&gt;.&lt;br /&gt;&lt;br /&gt;There's a straightforward interpretation we can give this. Differentiating a type makes a hole in it. Finite differencing makes a hole in it, but everything to the left of the hole is of one type and everything on the right is another. For example, for F(X)=X&lt;sup&gt;3&lt;/sup&gt;, &amp;Delta;F(X,Y)=X&lt;sup&gt;2&lt;/sup&gt;+XY+Y&lt;sup&gt;2&lt;/sup&gt; can be drawn as:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_UdKHLrHa05M/Sr-MLg52-hI/AAAAAAAAAYM/twUQVqZChek/s1600-h/dissection.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 52px;" src="http://2.bp.blogspot.com/_UdKHLrHa05M/Sr-MLg52-hI/AAAAAAAAAYM/twUQVqZChek/s400/dissection.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5386177808797465106" /&gt;&lt;/a&gt;&lt;br /&gt;If you've been reading the right papers then at this point it should all become familiar. Finite differencing is none other than dissection, as described by Conor in his &lt;a href="http://strictlypositive.org/CJ.pdf"&gt;Jokers and Clowns paper&lt;/a&gt;. I don't know if he was aware that he was talking about finite differences - the paper itself talks about this being a type of derivative. It's sort of implicit when he writes the isomorphism:&lt;br /&gt;&lt;blockquote&gt;&lt;br /&gt;&lt;tt&gt;right :: p j + (&amp;Delta;p c j , c) → (j , &amp;Delta;p c j ) + p c&lt;/tt&gt;&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;With a little rearrangement this becomes the definition of finite difference.&lt;br /&gt;&lt;br /&gt;Now that we've recognised dissection as finite difference we can reason informally about dissection using high school algebra. For example, we already know that lists, defined by L(X) = 1+X L(X) can be informally thought of as L(X)=1/(1-X). So using the example I gave above we see that &amp;Delta;L(X,Y)=1/((1-X)(1-Y)) = L(X)L(Y). So the dissection of a list is a pair of lists, one for the left elements, and one for the right elements. Just what we'd expect.&lt;br /&gt;&lt;br /&gt;Another example. Consider the trees defined by T(X)=X+T(X)&lt;sup&gt;2&lt;/sup&gt;. Informally we can interpret this as T(X)=(1+&amp;radic;(1-4X))/2. A little algebraic manipulation, using (&amp;radic;x-&amp;radic;y)(&amp;radic;x+&amp;radic;y) = x-y shows that&lt;br /&gt;&lt;blockquote&gt;&lt;br /&gt;&amp;Delta;T(X,Y) = 1/(1-(T(X)+T(Y))&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;In other words, a dissection of a tree is a list of trees, each of which is a tree of X or a tree of Y. This corresponds to the fact that if you dissect a tree at some element, and then follow the path from the root to the hole left behind, then all of the left branches (in blue) are trees of type X and all of the right branches (in red) are trees of type Y.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_UdKHLrHa05M/Sr-MB-2VZ-I/AAAAAAAAAYE/NSXxMj4mt8o/s1600-h/tree.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 289px;" src="http://2.bp.blogspot.com/_UdKHLrHa05M/Sr-MB-2VZ-I/AAAAAAAAAYE/NSXxMj4mt8o/s400/tree.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5386177645037053922" /&gt;&lt;/a&gt;&lt;br /&gt;If you're geometrically inclined then you can think of types with holes in them as being a kind of tangent to the space of types. Along those lines, dissections become secants. I think this geometric analogy can be taken a lot further and that in fact a non-trivial piece of differential geometry can be made to work with types. But that's for another day.&lt;br /&gt;&lt;br /&gt;Oh, I almost forgot. Derivatives are what you get when you compute finite differences for points really close to each other. So I hope you can see that &amp;Delta;f(x,x)=df/dx giving us holes in terms of dissections. Conor mentions this in his paper.&lt;br /&gt;&lt;br /&gt;We should also be able to use this approach to compute finite differences in other algebraic structures that don't have subtraction or division.&lt;br /&gt;&lt;br /&gt;I can leave you with some exercises:&lt;br /&gt;&lt;br /&gt;1. What does finite differencing mean when applied to both &lt;a href="http://en.wikipedia.org/wiki/Generating_function"&gt;ordinary and exponential generating functions&lt;/a&gt;?&lt;br /&gt;&lt;br /&gt;2. Can you derive the "chain rule" for finite differences? This can be useful when you compute dissections of types defined by sets of mutually recursive definitions.&lt;br /&gt;&lt;br /&gt;3. Why is &lt;code&gt;right&lt;/code&gt;, defined above, a massaged version of the definition of finite difference? (Hint: define d=((f(x)-f(y))/(x-y). In this equation, eliminate the division by a suitable multiplication and eliminate the subtraction by a suitable addition. And remember that &lt;code&gt;(,)&lt;/code&gt; is Haskell notation for the product of types.)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-5532328337205821486?l=blog.sigfpe.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sigfpe.com/feeds/5532328337205821486/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=11295132&amp;postID=5532328337205821486' title='22 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/5532328337205821486'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/5532328337205821486'/><link rel='alternate' type='text/html' href='http://blog.sigfpe.com/2009/09/finite-differences-of-types.html' title='Finite Differences of Types'/><author><name>sigfpe</name><uri>http://www.blogger.com/profile/08096190433222340957</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='09401818062305273147'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_UdKHLrHa05M/Sr-MLg52-hI/AAAAAAAAAYM/twUQVqZChek/s72-c/dissection.png' height='72' width='72'/><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>22</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11295132.post-5537671004678118594</id><published>2009-09-13T17:27:00.000-07:00</published><updated>2009-09-19T18:54:17.598-07:00</updated><title type='text'>More Parsing With Best First Search</title><content type='html'>&lt;pre&gt;&lt;br /&gt;&amp;gt; {-# LANGUAGE NoMonomorphismRestriction,GeneralizedNewtypeDeriving #-}&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;I have three goals in this post:&lt;br /&gt;&lt;br /&gt;1. Refactoring the technique in my &lt;a href="http://blog.sigfpe.com/2009/07/monad-for-combinatorial-search-with.html"&gt;previous post&lt;/a&gt; so that building the search tree is entirely separate from searching the tree.&lt;br /&gt;2. Making it work with real-valued weights, not just integers&lt;br /&gt;3. Applying it to an ambiguous parsing problem, making use of a type class to define an abstract grammar.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; import Control.Arrow&lt;br /&gt;&amp;gt; import Control.Monad&lt;br /&gt;&amp;gt; import Control.Monad.Instances&lt;br /&gt;&amp;gt; import Control.Monad.State&lt;br /&gt;&amp;gt; import Data.Either&lt;br /&gt;&amp;gt; import Data.Function&lt;br /&gt;&amp;gt; import Random&lt;br /&gt;&amp;gt; import qualified Data.List as L&lt;br /&gt;&amp;gt; import qualified Data.Map as M&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;H3&gt;Search Trees&lt;/H3&gt;&lt;br /&gt;The idea is that I want to search a tree of possibilities where each edge of the tree is marked with a weight. The goal will be to search for leaves that minimise the sum of the weights of the edges down to the leaf.&lt;br /&gt;&lt;br /&gt;Here's an example tree:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_UdKHLrHa05M/SrWHSsrdtXI/AAAAAAAAAX8/8m5xfOQDnrc/s1600-h/search.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 278px; height: 207px;" src="http://3.bp.blogspot.com/_UdKHLrHa05M/SrWHSsrdtXI/AAAAAAAAAX8/8m5xfOQDnrc/s400/search.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5383357684892153202" /&gt;&lt;/a&gt;&lt;br /&gt;The minimum weight leaf is at C. If we're working with probabilities then we'll use minus the log of the probability of a branch as the weight. That way multiplication of probabilities becomes additions of weights, and the likeliest leaf has the minimum weight path.&lt;br /&gt;&lt;br /&gt;So here's the definition of a search tree. I've given both leaves and edges weights:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; data Search c a = Leaf   { lb::c, leaf::a}&lt;br /&gt;&amp;gt;                 | Choice { lb::c, choices::[Search c a] } deriving Show&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;(Compare with &lt;a href="http://hackage.haskell.org/packages/archive/tree-monad/0.2.1/doc/html/Control-Monad-SearchTree.html"&gt;this&lt;/a&gt;.) &lt;tt&gt;lb&lt;/tt&gt; is short for 'lower bound'. It provides a lower bound for the total weight of any option in this subtree (assuming non-negative weights). The tree in the diagram would look like:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; ex1 = Choice 0 [&lt;br /&gt;&amp;gt;   Choice (-log 0.1) [&lt;br /&gt;&amp;gt;       Leaf (-log 0.5) 'A',&lt;br /&gt;&amp;gt;       Leaf (-log 0.5) 'B'],&lt;br /&gt;&amp;gt;   Choice (-log 0.2) [&lt;br /&gt;&amp;gt;       Leaf (-log 0.6) 'C',&lt;br /&gt;&amp;gt;       Leaf (-log 0.4) 'D']]&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;This tree is a container in a straightforward way and so we can make it an instance of &lt;tt&gt;Functor&lt;/tt&gt;:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; instance Functor (Search c) where&lt;br /&gt;&amp;gt;    fmap f (Leaf   c a ) = Leaf c   $ f a&lt;br /&gt;&amp;gt;    fmap f (Choice c as) = Choice c $ map (fmap f) as&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;But it's also a monad. &lt;tt&gt;&amp;gt;&amp;gt;=&lt;/tt&gt; maps all of the elements of a tree to trees in their own right, and then grafts those trees into the parent tree:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; instance Num c =&amp;gt; Monad (Search c) where&lt;br /&gt;&amp;gt;    return = Leaf 0&lt;br /&gt;&amp;gt;    a &amp;gt;&amp;gt;= f = join $ fmap f a where&lt;br /&gt;&amp;gt;        join (Leaf   c a ) = Choice c [a]&lt;br /&gt;&amp;gt;        join (Choice c as) = Choice c $ map join as&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;It's easy to make trees into a &lt;tt&gt;MonadPlus&lt;/tt&gt; by simply grafting trees into a new root. &lt;tt&gt;MonadPlus&lt;/tt&gt; is meant to be a monoid, but this operation, as written below, isn't precisely associative. But it's 'morally' associative in that two terms that are meant to be equal describe equivalent search trees.  So I'm not going to lose any sleep over it:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; instance Num c =&amp;gt; MonadPlus (Search c) where&lt;br /&gt;&amp;gt;    mzero = Choice 0 []&lt;br /&gt;&amp;gt;    a `mplus` b = Choice 0 [a,b]&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;For our searching we'll need a priority queue. I'll use a skew tree with code I lifted from somewhere I've forgotten:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; data Ord a =&amp;gt; Tree a = Null | Fork a (Tree a) (Tree a) deriving Show&lt;br /&gt;&lt;br /&gt;&amp;gt; isEmpty :: Ord a =&amp;gt; Tree a -&amp;gt; Bool&lt;br /&gt;&amp;gt; isEmpty Null = True&lt;br /&gt;&amp;gt; isEmpty (Fork x a b) = False&lt;br /&gt;&lt;br /&gt;&amp;gt; minElem :: Ord a =&amp;gt; Tree a -&amp;gt; a&lt;br /&gt;&amp;gt; minElem (Fork x a b) = x&lt;br /&gt;&lt;br /&gt;&amp;gt; deleteMin :: Ord a =&amp;gt; Tree a -&amp;gt; Tree a&lt;br /&gt;&amp;gt; deleteMin (Fork x a b) = merge a b&lt;br /&gt;&lt;br /&gt;&amp;gt; insert :: Ord a =&amp;gt; a -&amp;gt; Tree a -&amp;gt; Tree a&lt;br /&gt;&amp;gt; insert x a = merge (Fork x Null Null) a&lt;br /&gt;&lt;br /&gt;&amp;gt; merge :: Ord a =&amp;gt; Tree a -&amp;gt; Tree a -&amp;gt; Tree a&lt;br /&gt;&amp;gt; merge a Null = a&lt;br /&gt;&amp;gt; merge Null b = b&lt;br /&gt;&amp;gt; merge a b&lt;br /&gt;&amp;gt;  | minElem a &amp;lt;= minElem b = connect a b&lt;br /&gt;&amp;gt;  | otherwise = connect b a&lt;br /&gt;&lt;br /&gt;&amp;gt; connect (Fork x a b) c = Fork x b (merge a c)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;At each stage in the search we'll pick the unexplored branch with the lowest total weight so far. So when we compare trees we'll compare on their lower bounds. So we need an ordering on the trees as follows:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; instance (Num c) =&amp;gt; Eq (Search c a) where&lt;br /&gt;&amp;gt;    (==) = (==) `on` lb&lt;br /&gt;&lt;br /&gt;&amp;gt; instance (Num c,Ord c) =&amp;gt; Ord (Search c a) where&lt;br /&gt;&amp;gt;    compare = compare `on` lb&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;The real cost of a choice isn't just the weight immediately visible in the tree but the cost of the journey you took to get there. We use the &lt;tt&gt;bumpUp&lt;/tt&gt; function to put that extra cost into the part of the tree we're currently looking at:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; bumpUp delta (Leaf   c a)  = Leaf   (delta+c) a&lt;br /&gt;&amp;gt; bumpUp delta (Choice c as) = Choice (delta+c) as&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;The only tricky aspect to this code is that we need to be able to handle infinite trees. We can't have our code simply go off and return when it's found the next match because it might not be possible to do so in a finite time. Instead, the code needs to perform one operation at a time and report what it found at each point, even if that report is just stalling for time. We do this by returning a (possibly infinite) list containing elements that are either (1) the next item found or (2) a new update giving more information about the lower bound of the cost of any item that might be yet to come. This allows the caller to bail out of the search once the cost has passed a certain threshold.&lt;br /&gt;&lt;br /&gt;(Returning a useless looking constructor to stall for time is a common design pattern in Haskell. It's an example of how programs that work with codata need to keep being &lt;a href="http://blog.sigfpe.com/2007/07/data-and-codata.html"&gt;productive&lt;/a&gt; and you get something similar with the -|Skip|- in &lt;a href="http://www.galois.com/~dons/slides/08-07-stewart.pdf"&gt;Stream Fusion&lt;/a&gt;. First time I write the code I failed to do this and kept wondering why my infinite searches would just hang, despite my great efforts to make it as lazy as possible.)&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; runSearch :: (Num c,Ord c) =&amp;gt; Tree (Search c a) -&amp;gt; [Either c a]&lt;br /&gt;&amp;gt; runSearch Null = []&lt;br /&gt;&amp;gt; runSearch queue = let&lt;br /&gt;&amp;gt;    m = minElem queue&lt;br /&gt;&amp;gt;    queue' = deleteMin queue&lt;br /&gt;&amp;gt;    in case m of&lt;br /&gt;&amp;gt;        Leaf   c a  -&amp;gt; Left c : Right a : runSearch queue'&lt;br /&gt;&amp;gt;        Choice c as -&amp;gt; Left c           : (runSearch $ foldl (flip insert) queue' $ map (bumpUp c) as)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;A quick test of an infinite search: finding Pythagorean triples by brute force. We give each integer as cost one more than the previous one:&lt;br /&gt;&lt;br /&gt;I guess this is actually Dijkstra's algorithm, but on a tree rather than a general graph.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; integers m = Choice 1 [Leaf 0 m,integers (m+1)]&lt;br /&gt;&lt;br /&gt;&amp;gt; test = do&lt;br /&gt;&amp;gt;    a &amp;lt;- integers 1&lt;br /&gt;&amp;gt;    b &amp;lt;- integers 1&lt;br /&gt;&amp;gt;    c &amp;lt;- integers 1&lt;br /&gt;&amp;gt;    guard $ a*a+b*b==c*c&lt;br /&gt;&amp;gt;    return (a,b,c)&lt;br /&gt;&lt;br /&gt;&amp;gt; test1 = runSearch (insert test Null)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;If you run test1 you'll notice how the output is noisy because of all those &lt;tt&gt;Left w&lt;/tt&gt; terms. If you'e not worried about non-termination you could just throw out redundant output like so:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; reduce [] = []&lt;br /&gt;&amp;gt; reduce (Left  a : Left b : bs) = reduce (Left b : bs)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Might as well convert weights to probabilities while we're at it:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; reduce (Left  a : bs) = Left (exp (-a)) : reduce bs&lt;br /&gt;&amp;gt; reduce (Right a : bs) = Right a : reduce bs&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;This version should be a lot less chatty:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; test2 = reduce test1&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;H3&gt;Grammar&lt;/H3&gt;&lt;br /&gt;Now that searching works I can turn to an application - a more sophisticated example of what I briefly looked at &lt;a href="(http://blog.sigfpe.com/2009/07/monad-for-combinatorial-search-with.html"&gt;previously&lt;/a&gt;), parsing with ambiguous grammars. So let me first build types to represent parsed sentences in a toy grammar:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; data Noun = Noun String deriving (Show,Eq,Ord)&lt;br /&gt;&amp;gt; data Verb = Verb String deriving (Show,Eq,Ord)&lt;br /&gt;&amp;gt; data Adj  = Adj  String deriving (Show,Eq,Ord)&lt;br /&gt;&amp;gt; data Prep = Prep String deriving (Show,Eq,Ord)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;The following two are noun phrase and prepositional phrase:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; data NP = NP [Adj] Noun deriving (Show,Eq,Ord)&lt;br /&gt;&amp;gt; data PP = PP Prep  Noun deriving (Show,Eq,Ord)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;And entire sentences:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; data Sentence = Sentence [NP] Verb [NP] [PP] deriving (Show,Eq,Ord)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;We want to be able to print parsed sentences so here's a quick 'unparse' type class to recover the underlying string:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; class UnParse a where&lt;br /&gt;&amp;gt;    unParse :: a -&amp;gt; String&lt;br /&gt;&lt;br /&gt;&amp;gt; instance UnParse Noun where&lt;br /&gt;&amp;gt;    unParse (Noun a) = a&lt;br /&gt;&lt;br /&gt;&amp;gt; instance UnParse Verb where&lt;br /&gt;&amp;gt;    unParse (Verb a) = a&lt;br /&gt;&lt;br /&gt;&amp;gt; instance UnParse Adj where&lt;br /&gt;&amp;gt;    unParse (Adj a) = a&lt;br /&gt;&lt;br /&gt;&amp;gt; instance UnParse Prep where&lt;br /&gt;&amp;gt;    unParse (Prep a) = a&lt;br /&gt;&lt;br /&gt;&amp;gt; instance UnParse NP where&lt;br /&gt;&amp;gt;    unParse (NP a b) = concatMap unParse a ++ unParse b&lt;br /&gt;&lt;br /&gt;&amp;gt; instance UnParse PP where&lt;br /&gt;&amp;gt;    unParse (PP a b) = unParse a ++ unParse b&lt;br /&gt;&lt;br /&gt;&amp;gt; instance UnParse Sentence where&lt;br /&gt;&amp;gt;    unParse (Sentence a b c d) = concatMap unParse a ++ unParse b ++ concatMap unParse c ++ concatMap unParse d&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Now I'm going to approach the problem of parsing ambiguous sentences in two ways. One will be efficient, and one will be inefficient but represent the 'ground truth' against which we'll compare. (This reflects standard practice in graphics publications where authors compare their fancy new algorithm with an ultra-slow but reliable Monte Carlo ray-tracer.)&lt;br /&gt;&lt;br /&gt;I'm going to assume that sentences in my language are described by a "context free" probability distribution so that a noun phrase, say, has a fixed probability of being made up of each possible combination of constituents regardless of the context in which it appears.&lt;br /&gt;&lt;br /&gt;I need an English word for something that takes a grammar and does something with it but I'm at a loss to think of an example. I'll use 'transducer', even though I don't think that's right.&lt;br /&gt;&lt;br /&gt;So a transducer is built from either terminal nodes of one character, or it's one of a choice of transducers, each with a given probability:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; class Transducer t where&lt;br /&gt;&amp;gt;    char :: Char -&amp;gt; t Char&lt;br /&gt;&amp;gt;    choose :: [(Float,t a)] -&amp;gt; t a&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;And here's our toy grammar. It's nothing like an actual natural language because real grammars take a long time to get right. Note I'm just giving the first couple of type signatures to show that the grammar uses only the &lt;tt&gt;Monad&lt;/tt&gt; and &lt;tt&gt;Transducer&lt;/tt&gt; interfaces:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; string :: (Monad t, Transducer t) =&amp;gt; [Char] -&amp;gt; t [Char]&lt;br /&gt;&amp;gt; string ""     = return ""&lt;br /&gt;&amp;gt; string (c:cs) = do {char c; string cs; return (c:cs)}&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;So, for example, a noun has a 50% chance of being the string &lt;tt&gt;ab&lt;/tt&gt; and a 50% chance of being the string &lt;tt&gt;ba&lt;/tt&gt;:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; noun :: (Monad t, Transducer t) =&amp;gt; t Noun&lt;br /&gt;&amp;gt; noun = do&lt;br /&gt;&amp;gt;    a &amp;lt;- choose [(0.5,string "ab"),(0.5,string "ba")]&lt;br /&gt;&amp;gt;    return $ Noun a&lt;br /&gt;&lt;br /&gt;&amp;gt; verb :: (Monad t, Transducer t) =&amp;gt; t Verb&lt;br /&gt;&amp;gt; verb = do&lt;br /&gt;&amp;gt;    a &amp;lt;- choose [(0.5,string "aa"),(0.5,string "b")]&lt;br /&gt;&amp;gt;    return $ Verb a&lt;br /&gt;&lt;br /&gt;&amp;gt; adjective :: (Monad t, Transducer t) =&amp;gt; t Adj&lt;br /&gt;&amp;gt; adjective = do&lt;br /&gt;&amp;gt;    a &amp;lt;- choose [(0.5,string "ab"),(0.5,string "aa")]&lt;br /&gt;&amp;gt;    return $ Adj a&lt;br /&gt;&lt;br /&gt;&amp;gt; parsePrep = do&lt;br /&gt;&amp;gt;    a &amp;lt;- choose [(0.5,string "a"),(0.5,string "b")]&lt;br /&gt;&amp;gt;    return $ Prep a&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Some of our "parts of speech" allow sequences of terms. We need some kind of probabilistic model of how many such terms we can expect. I'm going to assume the probability falls off exponentially with the number of items:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; many :: (Monad t, Transducer t) =&amp;gt; Float -&amp;gt; t a -&amp;gt; t [a]&lt;br /&gt;&amp;gt; many p t = choose [&lt;br /&gt;&amp;gt;   (p,return []),&lt;br /&gt;&amp;gt;   (1-p,do&lt;br /&gt;&amp;gt;     a &amp;lt;- t&lt;br /&gt;&amp;gt;     as &amp;lt;- many p t&lt;br /&gt;&amp;gt;     return $ a:as)]&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;I also have a convenience function for sequences of length at least 1:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; many1 p t = do&lt;br /&gt;&amp;gt;     a &amp;lt;- t&lt;br /&gt;&amp;gt;     as &amp;lt;- many p t&lt;br /&gt;&amp;gt;     return (a:as)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;And now the rest of the grammar:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; parseNP = do&lt;br /&gt;&amp;gt;    a &amp;lt;- many 0.5 adjective&lt;br /&gt;&amp;gt;    b &amp;lt;- noun&lt;br /&gt;&amp;gt;    return $ NP a b&lt;br /&gt;&lt;br /&gt;&amp;gt; parsePP = do&lt;br /&gt;&amp;gt;    a &amp;lt;- parsePrep&lt;br /&gt;&amp;gt;    b &amp;lt;- noun&lt;br /&gt;&amp;gt;    return $ PP a b&lt;br /&gt;&lt;br /&gt;&amp;gt; sentence = do&lt;br /&gt;&amp;gt;    a &amp;lt;- many 0.5 parseNP&lt;br /&gt;&amp;gt;    b &amp;lt;- verb&lt;br /&gt;&amp;gt;    c &amp;lt;- many 0.5 parseNP&lt;br /&gt;&amp;gt;    d &amp;lt;- many 0.5 parsePP&lt;br /&gt;&amp;gt;    return $ Sentence a b c d&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;We're going to use this grammar with two instances of type Transducer. The first will use the rules of the grammar as production rules to generate random sentences. The second will parse strings using the grammar. So we get two uses from one 'transducer'. This is pretty powerful: we have described the grammar in an abstract way that doesn't asuume any particular use for it.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; newtype Generator a = Generator { unGen :: State StdGen a } deriving Monad&lt;br /&gt;&amp;gt; newtype Parser a    = Parser { runParse :: (String -&amp;gt; Search Float (a,String)) }&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Let's implement the generation first:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; instance Transducer Generator where&lt;br /&gt;&amp;gt;    char a = return a&lt;br /&gt;&amp;gt;    choose p = do&lt;br /&gt;&amp;gt;        r &amp;lt;- Generator (State random)&lt;br /&gt;&amp;gt;        case (L.find ((&amp;gt;=r) . fst) $ zip (scanl1 (+) (map fst p)) (map snd p)) of&lt;br /&gt;&amp;gt;            Just opt -&amp;gt; snd opt&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;We can test it by generating a bunch of random sentences:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; gen = mkStdGen 12343210&lt;br /&gt;&amp;gt; generate n partOfSpeech = (unGen $ sequence (replicate n partOfSpeech)) `evalState` gen&lt;br /&gt;&lt;br /&gt;&amp;gt; test3 = mapM_ print $ generate 10 sentence&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;We can now use generate-and-test to estimate what proportion of randomly generated sentences match a given sentence:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; generateAndTest n partOfSpeech chars = do&lt;br /&gt;&amp;gt;    a &amp;lt;- generate n sentence&lt;br /&gt;&amp;gt;    guard $ unParse a == chars&lt;br /&gt;&amp;gt;    return a&lt;br /&gt;&lt;br /&gt;&amp;gt; collectResults n partOfSpeech chars = M.fromListWith (+) $ map (flip (,) 1) $&lt;br /&gt;&amp;gt;   generateAndTest n partOfSpeech chars&lt;br /&gt;&amp;gt; countResults n partOfSpeech chars = mapM_ print $ L.sortBy (flip compare `on` snd) $&lt;br /&gt;&amp;gt;   M.toList $ collectResults n partOfSpeech chars&lt;br /&gt;&lt;br /&gt;&amp;gt; test4 = countResults 100000 (noun :: Parser Noun) "abab"&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;On the other hand we can build a parser, based on &lt;a href="http://www.cs.nott.ac.uk/~gmh/pearl.pdf"&gt;Hutton's&lt;/a&gt;, just like in my &lt;a href="http://blog.sigfpe.com/2009/07/monad-for-combinatorial-search-with.html"&gt;previous post&lt;/a&gt; except using this new tree search monad:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; instance Monad Parser where&lt;br /&gt;&amp;gt;   return a = Parser (\cs -&amp;gt; return (a,cs))&lt;br /&gt;&amp;gt;   p &amp;gt;&amp;gt;= f = Parser (\cs -&amp;gt; do&lt;br /&gt;&amp;gt;       (a,cs') &amp;lt;- runParse p cs&lt;br /&gt;&amp;gt;       runParse (f a) cs')&lt;br /&gt;&lt;br /&gt;&amp;gt; instance MonadPlus Parser where&lt;br /&gt;&amp;gt;   mzero = Parser (\cs -&amp;gt; mzero)&lt;br /&gt;&amp;gt;   p `mplus` q = Parser (\cs -&amp;gt; runParse p cs `mplus` runParse q cs)&lt;br /&gt;&lt;br /&gt;&amp;gt; instance Transducer Parser where&lt;br /&gt;&amp;gt;    char c = Parser $ char' where&lt;br /&gt;&amp;gt;       char' "" = mzero&lt;br /&gt;&amp;gt;       char' (a:as) = if a==c then return (a,as) else mzero&lt;br /&gt;&amp;gt;    choose p = foldl1 mplus $ map (\(p,x) -&amp;gt; prob p &amp;gt;&amp;gt; x) p where&lt;br /&gt;&amp;gt;       prob p = Parser (\cs -&amp;gt; Leaf (-log p) ((),cs))&lt;br /&gt;&lt;br /&gt;&amp;gt; goParse (Parser f) x = runSearch $ insert (f x) Null&lt;br /&gt;&lt;br /&gt;&amp;gt; end = Parser (\cs -&amp;gt; if cs=="" then return ((),"") else mzero)&lt;br /&gt;&lt;br /&gt;&amp;gt; withEnd g = do&lt;br /&gt;&amp;gt;   a &amp;lt;- g&lt;br /&gt;&amp;gt;   end&lt;br /&gt;&amp;gt;   return a&lt;br /&gt;&lt;br /&gt;&amp;gt; normalise results = let total = last (lefts results)&lt;br /&gt;&amp;gt;   in map (\x -&amp;gt; case x of&lt;br /&gt;&amp;gt;       Left a -&amp;gt; a / total&lt;br /&gt;&amp;gt;       Right b -&amp;gt; b&lt;br /&gt;&amp;gt;   ) results&lt;br /&gt;&lt;br /&gt;&amp;gt; findParse chars = mapM_ print $ reduce $ runSearch $&lt;br /&gt;&amp;gt;   insert (runParse (withEnd sentence) chars) Null&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;H3&gt;Results&lt;/H3&gt;&lt;br /&gt;And now we can try running both methods on the same string:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; main = do&lt;br /&gt;&amp;gt;    let string = "ababbbab"&lt;br /&gt;&amp;gt;    findParse string&lt;br /&gt;&amp;gt;    print "-------------------"&lt;br /&gt;&amp;gt;    countResults 1000000 (sentence :: Parser Sentence) string&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;You should see the parsings from countResults in roughly the same proportion as the relative probabilities given by findParse. Remember that the relative probability of a given parsing is the last &lt;tt&gt;Left p&lt;/tt&gt; term before that parsing. Try playing with &lt;tt&gt;string&lt;/tt&gt;, the number of Monte Carlo runs and the seed. Remember that there is going to be some variation in the randomised algorithm, especially with hard to parse strings, but raising the number of runs will eventually give reasonable numbers. Of course ultimately we don't care about the Monte Carlo method so it's allowed to be slow.&lt;br /&gt;&lt;br /&gt;Anyway, none of this is a new algorithm. You can find similar things in papers such as &lt;a href="http://www.isi.edu/natural-language/people/p5.pdf"&gt;Probabilistic tree transducers&lt;/a&gt; and &lt;a href="http://www.cis.upenn.edu/~lhuang3/wpe2/papers/knuth77gen_dijkstra.pdf"&gt;A Generalization of Dijkstra's Algorithm&lt;/a&gt;. But what is cool is how easily Haskell allows us to decouple the tree building part from the searching part. (And of course the tree is never fully built, it's built and destroyed lazily as needed.) All of the published algorithms have the parsing and searching hopelessly interleaved so it's hard to see what exactly is going on. Here the search algorithm doesn't need to know anything about grammars, or even that it is searching for parsings.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.aclweb.org/anthology/J/J99/J99-4004.pdf"&gt;Semiring Parsing&lt;/a&gt; is also easy to implement this way.&lt;br /&gt;&lt;br /&gt;BTW If you think my "ab" language is a bit to contrived, check out the last picture &lt;a href="http://www.dur.ac.uk/penelope.wilson/SacredSigns/ch_four.html"&gt;here&lt;/a&gt; for an example of some natural language that is in a similar spirit :-)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-5537671004678118594?l=blog.sigfpe.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sigfpe.com/feeds/5537671004678118594/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=11295132&amp;postID=5537671004678118594' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/5537671004678118594'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/5537671004678118594'/><link rel='alternate' type='text/html' href='http://blog.sigfpe.com/2009/09/language-nomonomorphismrestrictiongener.html' title='More Parsing With Best First Search'/><author><name>sigfpe</name><uri>http://www.blogger.com/profile/08096190433222340957</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='09401818062305273147'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_UdKHLrHa05M/SrWHSsrdtXI/AAAAAAAAAX8/8m5xfOQDnrc/s72-c/search.png' height='72' width='72'/><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11295132.post-5500092155587195817</id><published>2009-07-04T17:35:00.000-07:00</published><updated>2009-07-07T07:16:50.410-07:00</updated><title type='text'>A Monad for Combinatorial Search with Heuristics</title><content type='html'>Haskell provides a great way to perform combinatorial searching with backtracking: the list monad. Do-notation provides a nice DSL that makes it easy to express the trying out of different possibilities. But the list monad only performs a simple-minded walk through all of the alternatives giving little opportunity to direct that walk. In particular, it's not easy to provide heuristics to say things like "try this alternative first but if it starts going badly consider this alternative too". This post contains  a monad that gives a simple scheme to allow programmers to direct searches in this way.&lt;br /&gt;&lt;br /&gt;First the Haskell administrativia...&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; import Data.Char&lt;br /&gt;&amp;gt; import Control.Monad&lt;br /&gt;&amp;gt; import Data.Monoid&lt;br /&gt;&amp;gt; import Data.List&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;When using the list monad, a list is interpreted as a list of candidates in a search. The &lt;tt&gt;join&lt;/tt&gt; function for this monad takes a list of lists of candidates and flattens it into a list of candidates. This is all the list monad really does: you write code that generates new candidates from old, and the &lt;tt&gt;&amp;gt;&amp;gt;=&lt;/tt&gt; function applies this code to all of the candidates it knows about and then flattens this back out to a single list of candidates. Importantly it does this in a lazy way so that you only need look at candidates as they are generated.&lt;br /&gt;&lt;br /&gt;This new monad will keep slightly more information: each candidate will have a 'penalty' value attached to it saying how attractive a candidate it is. Candidates with score 0 will be tried first, and those with score n will be tried after those with lower scores. We can represent a collection of candidates and their scores simply as a list of lists. The first list in the list will have those with score 0, the second will have those with score 1 and so on. We'll call these lists penalty lists and the positions within those lists slots.&lt;br /&gt;&lt;br /&gt;Here's the definiton of the penalty list type:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; data PList a = P { unO :: [[a]] } deriving (Show,Eq)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;It's a functor in a straightforward way:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; instance Functor PList where&lt;br /&gt;&amp;gt;   fmap f (P xs) = P (fmap (fmap f) xs)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;The rule we'll adopt is that if you're trying a combination of two candidates then the penalty associated with the combination is the sum of the penalties of the individual objects. To implement this we need an alternative version of the &lt;tt&gt;join&lt;/tt&gt; operation. If we have a penalty list of penalty lists and we have an element in the mth slot in the nth penalty sublist then we want it to end up in the (m+n)th slot in the final penalty list. Within a slot we can just order the elements just like in the original list monad.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; headm :: Monoid m =&amp;gt; [m] -&amp;gt; m&lt;br /&gt;&amp;gt; headm (a:as) = a&lt;br /&gt;&amp;gt; headm [] = mempty&lt;br /&gt;&lt;br /&gt;&amp;gt; tailm :: Monoid m =&amp;gt; [m] -&amp;gt; [m]&lt;br /&gt;&amp;gt; tailm (a:as) = as&lt;br /&gt;&amp;gt; tailm [] = []&lt;br /&gt;&lt;br /&gt;&amp;gt; zipm :: Monoid m =&amp;gt; [[m]] -&amp;gt; [m]&lt;br /&gt;&amp;gt; zipm ms | all null ms = []&lt;br /&gt;&amp;gt; zipm ms = let&lt;br /&gt;&amp;gt;   heads = map headm ms&lt;br /&gt;&amp;gt;   tails = map tailm ms&lt;br /&gt;&amp;gt;   h = mconcat heads&lt;br /&gt;&amp;gt;   t = zipm (filter (not . null) tails)&lt;br /&gt;&amp;gt;   in h : t&lt;br /&gt;&lt;br /&gt;&amp;gt; instance Monad PList where&lt;br /&gt;&amp;gt;   return x = P [[x]]&lt;br /&gt;&amp;gt;   x &amp;gt;&amp;gt;= f = let P xs = (fmap (unO . f) x) in P (join xs) where&lt;br /&gt;&amp;gt;       join []     = []&lt;br /&gt;&amp;gt;       join (m:ms) = let&lt;br /&gt;&amp;gt;           part1 = zipm m&lt;br /&gt;&amp;gt;           part2 = join ms&lt;br /&gt;&amp;gt;           in headm part1 : zipm [tailm part1,part2]&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Explaining how &lt;tt&gt;join&lt;/tt&gt; is implemented would take many words so I hope this picture of the computation of an example will do instead. I used the &lt;tt&gt;Monoid&lt;/tt&gt; class simply to avoid directly referring to one level of nesting of brackets. It is intended to be a proper implementation of a &lt;tt&gt;Monad&lt;/tt&gt; satisfying the three monad laws but I haven't proved this and it's possible that it occasionally leaves trailing empty lists around - which have no impact on search results.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_UdKHLrHa05M/SlDQReuY75I/AAAAAAAAAX0/r6LTYp0_8N0/s1600-h/errors.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 257px; height: 400px;" src="http://4.bp.blogspot.com/_UdKHLrHa05M/SlDQReuY75I/AAAAAAAAAX0/r6LTYp0_8N0/s400/errors.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5355008955667509138" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; instance MonadPlus PList where&lt;br /&gt;&amp;gt;    mzero = P []&lt;br /&gt;&amp;gt;    mplus (P xs) (P ys) = P (zipm [xs,ys])&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;We can use this much like the list monad. First it will search for possibilities with zero penalty. When these are exhausted it'll backtrack to the last place where it can start finding possibilities with penalty 1. Then it'll try penalty 2 and so on. Importantly it manages to do this lazily so that we don't explore penalty n+1 until we've finished penalty n.&lt;br /&gt;&lt;br /&gt;So now we can start using it. We'll hunt for Pythagorean triples by simply hunting through all of the triples of integers. But we'll try to find solutions where the sum of the integers is as small as possible. So as list of candidate integers we use &lt;tt&gt;P [[1],[2],[3]...]&lt;/tt&gt;. In other words, the integer n has penalty n-1. Here's the code:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; ex1 = do&lt;br /&gt;&amp;gt;   x &amp;lt;- P $ map (\x -&amp;gt; [x]) [1..]&lt;br /&gt;&amp;gt;   y &amp;lt;- P $ map (\y -&amp;gt; [y]) [1..]&lt;br /&gt;&amp;gt;   z &amp;lt;- P $ map (\z -&amp;gt; [z]) [1..]&lt;br /&gt;&amp;gt;   guard $ x*x+y*y==z*z&lt;br /&gt;&amp;gt;   return $ (x,y,z)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Of course we wouldn't really search for Pythagorean triples this way. This is just an illustration of how to use the code. But note, crucially, that the equivalent code using the regular list monad would give us back no solutions. It'd start with x=1 and y=1 and then go off to infinity finding candidates for z. So as a side effect the penalty list allows us to tame some infinite searches.&lt;br /&gt;&lt;br /&gt;Anyway, that was a simple numerical example. But this monad can be used with much more complex kinds of search. In fact it almost serves as a drop-in replacement for the list monad. This is a really nice example of the way separation of concerns is easy in Haskell. The task of generating candidates for search can easily be separated from the task of selecting from those candidates, even though the operations are highly interleaved during execution.&lt;br /&gt;&lt;br /&gt;So here's a more complex example: writing a parser that can tolerate errors without running into combinatorial explosion. The idea is that we associate a penalty with each error. The penalty will make the parser run on the assumption of no errors until it can no longer parse, and then it'll backtrack on the assumption of one error until that assumption is no longer tenable and so on. We can liberally sprinkle 'erroneous' parsings throughout our code confident that these branches will only be taken in the event that an error-free parsing can't be found.&lt;br /&gt;&lt;br /&gt;Firstly, here's a penalty list that we can use to introduce a penalty of just 1.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; penalty :: PList ()&lt;br /&gt;&amp;gt; penalty = P [[],[()]]&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;If we stick that in the code path then anything following acquires a penalty of 1.&lt;br /&gt;&lt;br /&gt;Now we can write a parser. We can implement Hutton's parser in his &lt;a href="http://www.cs.nott.ac.uk/~gmh/bib.html#pearl"&gt;monad parsers paper&lt;/a&gt; with very little modification. We simply replace the usual list with the penalty list and do away with the &lt;tt&gt;+++&lt;/tt&gt; operator to allow it to be a bit more liberal about backtracking. Here's the parser type:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; newtype Parser a = Parser (String -&amp;gt; PList (a,String))&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;We could have parameterised that with the underlying monad so that we could have parsers with a choice of search strategy.&lt;br /&gt;&lt;br /&gt;The rest is a lot like in Hutton's paper:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; parse (Parser f) x = f x&lt;br /&gt;&lt;br /&gt;&amp;gt; instance Monad Parser where&lt;br /&gt;&amp;gt;    return a = Parser (\cs -&amp;gt; P [[(a,cs)]])&lt;br /&gt;&amp;gt;    p &amp;gt;&amp;gt;= f = Parser (\cs -&amp;gt; do&lt;br /&gt;&amp;gt;        (a,cs') &amp;lt;- parse p cs&lt;br /&gt;&amp;gt;        parse (f a) cs')&lt;br /&gt;&lt;br /&gt;&amp;gt; instance MonadPlus Parser where&lt;br /&gt;&amp;gt;    mzero = Parser (\cs -&amp;gt; mzero)&lt;br /&gt;&amp;gt;    p `mplus` q = Parser (\cs -&amp;gt; parse p cs `mplus` parse q cs)&lt;br /&gt;&lt;br /&gt;&amp;gt; item :: Parser Char&lt;br /&gt;&amp;gt; item = Parser (\cs -&amp;gt; case cs of&lt;br /&gt;&amp;gt;    "" -&amp;gt; mzero&lt;br /&gt;&amp;gt;    (c:cs) -&amp;gt; P [[(c,cs)]])&lt;br /&gt;&lt;br /&gt;&amp;gt; sat :: (Char -&amp;gt; Bool) -&amp;gt; Parser Char&lt;br /&gt;&amp;gt; sat p = do&lt;br /&gt;&amp;gt;    c &amp;lt;- item&lt;br /&gt;&amp;gt;    if p c then return c else mzero&lt;br /&gt;&lt;br /&gt;&amp;gt; char :: Char -&amp;gt; Parser Char&lt;br /&gt;&amp;gt; char c = sat (c ==)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Now for a simple parsing problem. We'll parse simple arithmetical expressions a lot like in Hutton's paper. But I'm going to tolerate two kinds of error:&lt;br /&gt;1. The shift key doesn't always work so occasionally a shifted or unshifted version of a character may appear and&lt;br /&gt;2. parentheses are occasionally left out by the clumsy user.&lt;br /&gt;&lt;br /&gt;Now we can code up a simple grammar for this. First the mapping between shifted and unshifted characters (on a Mac US keyboard):&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; lowers = "1234567890-=/"&lt;br /&gt;&amp;gt; uppers = "!@#$%^&amp;amp;*()_+?"&lt;br /&gt;&amp;gt; lower x = lookup x (zip uppers lowers)&lt;br /&gt;&amp;gt; upper x = lookup x (zip lowers uppers)&lt;br /&gt;&lt;br /&gt;&amp;gt; upperChar x = case upper x of&lt;br /&gt;&amp;gt;     Nothing -&amp;gt; mzero&lt;br /&gt;&amp;gt;     Just y -&amp;gt; char y &amp;gt;&amp;gt; return x&lt;br /&gt;&lt;br /&gt;&amp;gt; lowerChar x = case lower x of&lt;br /&gt;&amp;gt;     Nothing -&amp;gt; mzero&lt;br /&gt;&amp;gt;     Just y -&amp;gt; char y &amp;gt;&amp;gt; return x&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;A version of &lt;tt&gt;penalty&lt;/tt&gt; wrapped for the parser monad:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; avoid :: Parser ()&lt;br /&gt;&amp;gt; avoid = Parser $ \cs -&amp;gt; do&lt;br /&gt;&amp;gt;    penalty&lt;br /&gt;&amp;gt;    return ((),cs)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Reading keys on the assumption that the shift key may have failed:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; keyChar x = char x `mplus` (avoid &amp;gt;&amp;gt; upperChar x) `mplus` (avoid &amp;gt;&amp;gt; lowerChar x)&lt;br /&gt;&lt;br /&gt;&amp;gt; digit = do&lt;br /&gt;&amp;gt;     x &amp;lt;- foldl mplus mzero (map keyChar "0123456789")&lt;br /&gt;&amp;gt;     return (fromIntegral (ord x-ord '0'))&lt;br /&gt;&lt;br /&gt;&amp;gt; number1 :: Integer -&amp;gt; Parser Integer&lt;br /&gt;&amp;gt; number1 m = return m `mplus` do&lt;br /&gt;&amp;gt;     n &amp;lt;- digit&lt;br /&gt;&amp;gt;     number1 (10*m+n)&lt;br /&gt;&lt;br /&gt;&amp;gt; number :: Parser Integer&lt;br /&gt;&amp;gt; number = do&lt;br /&gt;&amp;gt;     n &amp;lt;- digit&lt;br /&gt;&amp;gt;     number1 n&lt;br /&gt;&lt;br /&gt;&amp;gt; chainl :: Parser a -&amp;gt; Parser (a -&amp;gt; a -&amp;gt; a) -&amp;gt; a -&amp;gt; Parser a &lt;br /&gt;&amp;gt; chainl p op a = (p `chainl1` op) `mplus` return a &lt;br /&gt;&amp;gt; chainl1 :: Parser a -&amp;gt; Parser (a -&amp;gt; a -&amp;gt; a) -&amp;gt; Parser a &lt;br /&gt;&amp;gt; p `chainl1` op = do {a &amp;lt;- p; rest a} &lt;br /&gt;&amp;gt;     where &lt;br /&gt;&amp;gt;     rest a = (do&lt;br /&gt;&amp;gt;         f &amp;lt;- op&lt;br /&gt;&amp;gt;         b &amp;lt;- p&lt;br /&gt;&amp;gt;         rest (f a b)) `mplus` return a &lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Optional parentheses:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; shouldHave c = keyChar c `mplus` (avoid &amp;gt;&amp;gt; return c)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;And the main part of the expression grammar:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; expr = term `chainl1` addop &lt;br /&gt;&amp;gt; term = monomial `chainl1` mulop &lt;br /&gt;&amp;gt; monomial = factor `chainl1` powop&lt;br /&gt;&amp;gt; factor = number `mplus` do {shouldHave '('; n &amp;lt;- expr; shouldHave ')'; return n} &lt;br /&gt;&amp;gt; powop = keyChar '^' &amp;gt;&amp;gt; return (^)&lt;br /&gt;&amp;gt; addop = do {keyChar '+'; return (+)} `mplus` do {keyChar '-'; return (-)} &lt;br /&gt;&amp;gt; mulop = do {keyChar '*'; return (*)} `mplus` do {keyChar '/'; return (div)} &lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Match the end of a string:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; end :: Parser ()&lt;br /&gt;&amp;gt; end = Parser $ \cs -&amp;gt;&lt;br /&gt;&amp;gt;    if null cs then P [[((),"")]] else mzero&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;We can test it out with:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; completeExpr = do&lt;br /&gt;&amp;gt;   n &amp;lt;- expr&lt;br /&gt;&amp;gt;   end&lt;br /&gt;&amp;gt;   return n&lt;br /&gt;&lt;br /&gt;&amp;gt; ex2 = parse completeExpr "2^(1+3"&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;When we run this we get no error-free parsing but we do get 3 readings with one error. One comes from reading the '(' as 9, one comes from inserting the missing ')' at the end and one comes from inserting ')' after '1'. Note that even for complex expressions we'll quickly find a 1- or 2-error parsing. For the regular list monad we might never get a parsing because there are an infinite number of ways of inserting parentheses.&lt;br /&gt;&lt;br /&gt;Anyway, that was just a toy parsing problem. But a more complex application comes to my mind. Some written languages are tricky to parse because their orthography doesn't fully capture the phonetics of the original language, because there are few or no indicators of sentence or even word breaks, and because they have numerous optional orthographic and grammatical rules and use a script whose individual characters are occasionally hard to reliably identify. In such a situation it's good to have a parser driven by heuristics about what is likely to be intended and the penalty list monad might serve well. &lt;a href="http://en.wikipedia.org/wiki/Egyptian_language"&gt;Here&lt;/a&gt;'s an example of such a language.&lt;br /&gt;&lt;br /&gt;Update: I forgot to add some connections to previous monads I've talked about:&lt;br /&gt;&lt;OL&gt;&lt;br /&gt;&lt;LI&gt;&lt;code&gt;PList&lt;/code&gt; is a variation of the convolution monad I described &lt;a href="http://blog.sigfpe.com/2007/01/monads-hidden-behind-every-zipper.html"&gt;here&lt;/a&gt;. It deals with the "wrong category" aspect so it is a true Haskell monad. Penalty lists form some kind of dual to the convolution comonad.&lt;br /&gt;&lt;LI&gt;It has much in common with &lt;a href="http://blog.sigfpe.com/2007/06/how-to-write-tolerably-efficient.html"&gt;this&lt;/a&gt; monad. That monad doesn't do anything smart about ordering searches but it does have the neat ability to 'fuse' different branches of a search so that different ways to arrive at the same place don't add to the combinatorial explosion. It's good for searches where you want to know what the minimum penalty is to get somewhere, but don't care what the best path actually is.&lt;br /&gt;&lt;/ol&gt;&lt;br /&gt;Also, in response to a comment on #haskell I've made the &lt;code&gt;join&lt;/code&gt; example more complex so it's easier to generalise from it.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-5500092155587195817?l=blog.sigfpe.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sigfpe.com/feeds/5500092155587195817/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=11295132&amp;postID=5500092155587195817' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/5500092155587195817'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/5500092155587195817'/><link rel='alternate' type='text/html' href='http://blog.sigfpe.com/2009/07/monad-for-combinatorial-search-with.html' title='A Monad for Combinatorial Search with Heuristics'/><author><name>sigfpe</name><uri>http://www.blogger.com/profile/08096190433222340957</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='09401818062305273147'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_UdKHLrHa05M/SlDQReuY75I/AAAAAAAAAX0/r6LTYp0_8N0/s72-c/errors.png' height='72' width='72'/><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11295132.post-3505988620847364705</id><published>2009-06-20T13:01:00.000-07:00</published><updated>2009-06-27T15:09:26.533-07:00</updated><title type='text'>Automata and the A-D-E classification.</title><content type='html'>&lt;H3&gt;Introduction&lt;/H3&gt;&lt;br /&gt;The &lt;a href="http://en.wikipedia.org/wiki/ADE_classification"&gt;A-D-E &lt;/a&gt; classification is a strange ubiquitous pattern that appears in many branches of mathematics. Typically it appears when you try to classify certain types of mathematical construction. If the A-D-E classification applies then you end up with two infinite sequences of cryptically named objects (A&lt;sub&gt;1&lt;/sub&gt;,A&lt;sub&gt;2&lt;/sub&gt;,A&lt;sub&gt;3&lt;/sub&gt;,...) and (D&lt;sub&gt;1&lt;/sub&gt;,D&lt;sub&gt;2&lt;/sub&gt;,D&lt;sub&gt;3&lt;/sub&gt;,...) as well as three leftover objects called E&lt;sub&gt;6&lt;/sub&gt;, E&lt;sub&gt;7&lt;/sub&gt; and E&lt;sub&gt;8&lt;/sub&gt;. Unfortunately, most of these objects and their classifications are tricky to define using only elementary mathematics. However, there is one type of object that is classified in this way that can be given a relatively straightforward computational description involving a little linear algebra and assuming you know a tiny bit about automata.&lt;br /&gt;&lt;br /&gt;But first: why care about the A-D-E classification? Well I tried to say a little bit about how symmetries relate to nature a &lt;a href="http://blog.sigfpe.com/2007/11/whats-all-this-e8-stuff-about-then-part.html"&gt;while back&lt;/a&gt;. &lt;a href="http://en.wikipedia.org/wiki/Simple_Lie_group#Simply_laced_groups"&gt;Certain types&lt;/a&gt; of possible symmetry of particle physics can be classified the A-D-E way. The symmetry group corresponding to E8 is the now famous exceptional group &lt;a href="http://en.wikipedia.org/wiki/E8_(mathematics)"&gt;E8&lt;/a&gt;. I won't be able to get to an explanation of how groups are involved. But at least I'll be able to give a hint about the bigger picture that E8 is part of.&lt;br /&gt;&lt;H3&gt;Non-deterministic Finite State Automata&lt;/H3&gt;&lt;br /&gt;Here's a diagram representing a very simple &lt;a href="http://en.wikipedia.org/wiki/Nondeterministic_finite_state_machine"&gt;non-deterministic finite state automaton&lt;/a&gt; (NDA):&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_UdKHLrHa05M/Sj6cXr8ACoI/AAAAAAAAAXE/QXGI9NG6KR0/s1600-h/nda1.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 392px; height: 197px;" src="http://3.bp.blogspot.com/_UdKHLrHa05M/Sj6cXr8ACoI/AAAAAAAAAXE/QXGI9NG6KR0/s400/nda1.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5349885338108037762" /&gt;&lt;/a&gt;&lt;br /&gt;It can be in one of two states. When in state A it can transition to state B and in state B it can only transition back to state B, but it can do so in two different ways.&lt;br /&gt;&lt;H3&gt;Vector Automata&lt;/H3&gt;&lt;br /&gt;Now I'll introduce a more general kind of automaton: a &lt;em&gt;vector automaton&lt;/em&gt; (VA). (I made that term up, it's not meant to correspond with anyone else's terminology.) Every vector automaton is built from an NDA. But each state corresponds to a finite dimensional vector space and each transition corresponds to a linear function mapping from the vector space of the source state to the vector space of the destination state. We could turn the above example into a VA by assigning a 1D vector space V&lt;sub&gt;A&lt;/sub&gt; to A, a 2D vector space to V&lt;sub&gt;B&lt;/sub&gt; and defining linear functions:&lt;br /&gt;&lt;center&gt;&lt;br /&gt;f : (x) -&gt; (x,0)&lt;br /&gt;g : (x,y) -&gt; (-y,x)&lt;br /&gt;h : (x,y) -&gt; (y,-x)&lt;br /&gt;&lt;/center&gt;&lt;br /&gt;A VA is just like an NDA in that it transitions from state to state according to the given transitions. But additionally it keeps track of a vector in the vector space corresponding to the current state. Each time it makes a transition the linear function corresponding to that function is applied to the vector. So in the example above, the NDA might start in state A with a scalar value x (ie. a 1D vector). When it makes its first transition its vector becomes the 2D vector (x,0) and after that each transition rotates the vector through 90 degrees clockwise or anticlockwise.&lt;br /&gt;&lt;br /&gt;There's a lot of freedom in defining a VA given its underlying NDA. For each node you can pick any vector space you like of any finite dimension, and for each transition you can pick any linear function you like mapping between the source and target vector spaces.&lt;br /&gt;&lt;br /&gt;Let's make this a little more formal. An NDA is a finite set of states combined with a finite set of transitions. Each transition has a source and destination, each of which is a state. That's it. You're allowed any finite number of transitions between states and a transition can have the same state as source and destination.&lt;br /&gt;&lt;br /&gt;A VA is an NDA combined with a finite dimensional vector space attached to each state and a linear function for each transition such that the function maps from the vector space of the of the source to the vector space of the target.&lt;br /&gt;&lt;br /&gt;Now consider this really simple NDA:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_UdKHLrHa05M/Sj6guAo1jpI/AAAAAAAAAXM/r012tH1TMuk/s1600-h/nda2.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 296px; height: 98px;" src="http://2.bp.blogspot.com/_UdKHLrHa05M/Sj6guAo1jpI/AAAAAAAAAXM/r012tH1TMuk/s400/nda2.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5349890119668436626" /&gt;&lt;/a&gt;&lt;br /&gt;When we build a VA from this NDA, for convenience I'll call the vector spaces corresponding to A and B, A and B. And I'll call the linear function from A to B, f. How many VAs can be made from this VA? Clearly an infinite number. But a lot of them are very similar. &lt;br /&gt;&lt;br /&gt;Suppose we assume A and B are 2-dimensional and write their elements as pairs (x,y). Suppose f(1,0)=u and f(0,1)=v and that u is not a multiple of v and both are non-zero. Then we can use u and v as a basis for B. If we write f as f' in this new basis we get f'(1,0)=(1,0) and f'(0,1)=(0,1). So by relabelling the basis of B we have actually revealed that an infinite number of choices for f reduce to the same thing apart from a change of basis in B.&lt;br /&gt;&lt;br /&gt;Up to change of basis in A and B we find there are only three possibilities:&lt;br /&gt;(1) u and v are distinct, non-zero, and not multiples of each other.&lt;br /&gt;(2) u is non-zero but v is zero&lt;br /&gt;(3) both u and v are zero&lt;br /&gt;&lt;br /&gt;We started with an infinite number of possibilities for our particular choice of dimensions and ended up with just 3. We'll say that two VAs are equivalent if we can get one from the other by changing basis like this.&lt;br /&gt;&lt;br /&gt;On the other hand, consider this NDA:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_UdKHLrHa05M/Sj6kMMJkhNI/AAAAAAAAAXU/bIbn87y3_MU/s1600-h/nda3.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 236px; height: 382px;" src="http://1.bp.blogspot.com/_UdKHLrHa05M/Sj6kMMJkhNI/AAAAAAAAAXU/bIbn87y3_MU/s400/nda3.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5349893936689480914" /&gt;&lt;/a&gt;&lt;br /&gt;Let's choose A, B and C to be 1-dimensional vector spaces and define f, g and h as:&lt;br /&gt;&lt;center&gt;&lt;br /&gt;f(x) = x&lt;br /&gt;g(x) = x&lt;br /&gt;h(x) = &amp;lambda;x&lt;br /&gt;&lt;/center&gt;&lt;br /&gt;where &amp;lambda; is any real number. Then h(g(f(x)))=&amp;lambda;x. We can choose any &amp;lambda; we like so we have an infinity of possibilities. No amount of basis change is going to change this fact. This is different from the case above because now we're comparing x and h(g(f(x))) which lie in the same vector space. So when our NDAs have loops in them, the space of possible VAs, for most choices of dimension, is infinite.&lt;br /&gt;&lt;br /&gt;&lt;H3&gt;The sum of two VAs&lt;/H3&gt;&lt;br /&gt;If we have two VAs corresponding to the same NDA we can combine them together to make a single machine. The state is given by a state in the shared underlying NDA, but we now have a pair of vectors. Each time there is a transition we apply the pair of transforms to transform the two vector simultaneously. But we can encode a pair of vectors as a single vector simply by concatenating together the vector components in some basis. So this machine is just another VA. The dimension of the vector space for each state is simply the sum of the dimensions of the vector spaces in the original VAs. So, given two VAs we can sum them to get a third.&lt;br /&gt;&lt;br /&gt;Given a VA for an NDA it may or may not be equivalent to the sum of two simpler VAs. If it isn't, it's said to be irreducible. If it is, then we can ask the same question about the simpler VAs. In this way, every VA is the sum of irreducible VAs.&lt;br /&gt;&lt;br /&gt;&lt;H3&gt;Main Theorem&lt;/H3&gt;&lt;br /&gt;Now comes the main result I want to give:&lt;br /&gt;&lt;br /&gt;If the graph underlying the NDA is one of the following list then (and only then) there is a finite number of inequivalent irreducible VAs for the NDA. All other VAs for that NDA are simply finite sums of machines from this finite set. Here's the list:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_UdKHLrHa05M/SkaNuePxB6I/AAAAAAAAAXk/T6feU--U6k8/s1600-h/dynkin.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 317px;" src="http://3.bp.blogspot.com/_UdKHLrHa05M/SkaNuePxB6I/AAAAAAAAAXk/T6feU--U6k8/s400/dynkin.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5352121036709103522" /&gt;&lt;/a&gt;&lt;br /&gt;That's it! Weird huh? (Note that diagram X&lt;sub&gt;n&lt;/sub&gt; has n nodes.)&lt;br /&gt;&lt;br /&gt;Strangely, those same &lt;a href="http://en.wikipedia.org/wiki/Coxeter-Dynkin_diagram"&gt;diagrams&lt;/a&gt; (and a few more) appear (in a quite different way) when classifying the possible symmetries of fundamental particles. The symmetries are given the same names as these diagrams. And in just the same way we get those 'sporadic' symmetries leading up to &lt;a href="http://blog.sigfpe.com/2007/11/whats-all-this-e8-stuff-about-then-part.html"&gt;E&lt;sub&gt;8&lt;/sub&gt;&lt;/a&gt;. Those same diagrams also arise in &lt;a href="http://en.wikipedia.org/wiki/Catastrophe_theory"&gt;Catastrophe Theory&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;I'd like to sketch the proof, at least in one direction, but I seem to have run out of time. Another day perhaps. But note that none of these diagrams have loops for the reasons I gave above, and I've already shown that A&lt;sub&gt;2&lt;/sub&gt; gives only a finite number of choices for a certain choice of dimension.&lt;br /&gt;&lt;br /&gt;Meanwhile, I should give the proper mathematician names for the things above. The diagrams listed above are examples of &lt;a href="http://en.wikipedia.org/wiki/Coxeter-Dynkin_diagram"&gt;Dynkin diagrams&lt;/a&gt;. A non-deterministic automaton as described above is known as a &lt;a href="http://en.wikipedia.org/wiki/Quiver_(mathematics)"&gt;quiver&lt;/a&gt;. A vector automaton is normally known as a representation of a quiver. And the theorem above is half of Gabriel's theorem.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-3505988620847364705?l=blog.sigfpe.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sigfpe.com/feeds/3505988620847364705/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=11295132&amp;postID=3505988620847364705' title='11 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/3505988620847364705'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/3505988620847364705'/><link rel='alternate' type='text/html' href='http://blog.sigfpe.com/2009/06/automata-and-a-d-e-classification.html' title='Automata and the A-D-E classification.'/><author><name>sigfpe</name><uri>http://www.blogger.com/profile/08096190433222340957</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='09401818062305273147'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_UdKHLrHa05M/Sj6cXr8ACoI/AAAAAAAAAXE/QXGI9NG6KR0/s72-c/nda1.png' height='72' width='72'/><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>11</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11295132.post-2073443236440619074</id><published>2009-06-07T10:03:00.000-07:00</published><updated>2009-06-07T15:08:18.194-07:00</updated><title type='text'>Hashing Molecules</title><content type='html'>Twenty or so years ago I worked for a pharmaceutical company that had a large database of compounds. That got me thinking about the problem of how to perform lookups based on molecular structures. If you can find a bunch of numbers that encapsulate the molecular structure then you can use them as database keys. But you need to ensure that the same molecule entered in two different ways gets mapped to the same numbers, and you'd like different molecules, such as &lt;a href="http://en.wikipedia.org/wiki/Stereoisomer"&gt;stereoisomers&lt;/a&gt;, or even &lt;a href="http://en.wikipedia.org/wiki/Enantiomer"&gt;enantiomers&lt;/a&gt; to get mapped to different values.&lt;br /&gt;&lt;br /&gt;That got me thinking and around then I came up with an idea for hashing molecules inspired by some of the mathematics I'd been doing not long before. I never got around to coding it up but twenty years later it dawned on me that I could easily do it using a similar technique to what I used for &lt;a href="http://blog.sigfpe.com/2008/10/untangling-with-continued-fractions.html"&gt;untangling tangles&lt;/a&gt; and translating &lt;a href="http://blog.sigfpe.com/2009/05/trace-diagrams-with-monads.html"&gt;trace diagrams&lt;/a&gt;. Anyway, as I've already given examples of how to translate diagrams to monadic expressions I'm going to skimp on the details in this post and just talk about things specific to molecular structures. For this post to make sense you have to understand those earlier posts.&lt;br /&gt;&lt;br /&gt;So first comes the usual Haskell preamble:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&lt;br /&gt;&amp;gt; {-# LANGUAGE MultiParamTypeClasses,FlexibleInstances,FunctionalDependencies,GeneralizedNewtypeDeriving #-}&lt;br /&gt;&lt;br /&gt;&amp;gt; module Main where&lt;br /&gt;&lt;br /&gt;&amp;gt; import Data.HashTable&lt;br /&gt;&amp;gt; import Data.List as L&lt;br /&gt;&amp;gt; import Data.Int&lt;br /&gt;&amp;gt; import Data.Array&lt;br /&gt;&amp;gt; import GHC.Arr&lt;br /&gt;&amp;gt; import qualified Data.Map as M&lt;br /&gt;&amp;gt; import Control.Monad&lt;br /&gt;&amp;gt; infixl 5 .+&lt;br /&gt;&amp;gt; infixl 6 .*&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;I'm going to be using the &lt;a href="http://blog.sigfpe.com/2007/02/monads-for-vector-spaces-probability.html"&gt;vector space monad&lt;/a&gt; with a 4-dimensional vector space. This type labels the dimensions:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&lt;br /&gt;&amp;gt; data I = A | B | C | D deriving (Eq,Ord,Show,Ix,Enum)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;n-valent atoms will be represented by functions that consume n-tuples. We'll start with a simple hash:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&lt;br /&gt;&amp;gt; c' (a,b,c,d) = hashString ("C" ++ show (a,b,c,d))&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;My prime motivation for using Haskell for this problem is that the code was super-easy to write and investigate. But it's inefficient. I'll talk about how to remedy this properly at the end. But for now I'm going to memoise many functions as arrays using a &lt;code&gt;Memoisable&lt;/code&gt; type class with a &lt;code&gt;memo&lt;/code&gt; method. So I'll be using &lt;code&gt;c&lt;/code&gt; instead of &lt;code&gt;c'&lt;/code&gt;:&lt;br /&gt;&lt;br /&gt;&amp;gt; c = memo $ \x -&amp;gt; symmetrise a4 c' x .* return ()&lt;br /&gt;&lt;br /&gt;Note the use of the &lt;code&gt;symmetrise&lt;/code&gt; function. The idea is that the 4 bonds coming out from (singly-bonded) carbon can be thought of as lying at the corners of a tetrahedron.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_UdKHLrHa05M/Siv_DTt45II/AAAAAAAAAWs/lqLNr4wTQYI/s1600-h/c.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 224px; height: 199px;" src="http://3.bp.blogspot.com/_UdKHLrHa05M/Siv_DTt45II/AAAAAAAAAWs/lqLNr4wTQYI/s400/c.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5344645815102661762" /&gt;&lt;/a&gt;&lt;br /&gt;They have &lt;a href="http://en.wikipedia.org/wiki/Tetrahedral_symmetry"&gt;tetrahedral symmetry&lt;/a&gt;. So I'd like my hash to also have this symmetry so that, for example, &lt;code&gt;c (i,j,k,l) == c (j,i,l,k)&lt;/code&gt;. We can enforce this by summing over all 24 permutations of the arguments compatible with this symmetry, also known as A4. So &lt;code&gt;a4&lt;/code&gt; lists all of these permutations:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&lt;br /&gt;&amp;gt; a4 (i,j,k,l) = [&lt;br /&gt;&amp;gt;    (i,j,k,l),&lt;br /&gt;&amp;gt;    (j,i,l,k),&lt;br /&gt;&amp;gt;    (k,l,i,j),&lt;br /&gt;&amp;gt;    (l,k,j,i),&lt;br /&gt;&lt;br /&gt;&amp;gt;    (i,k,l,j),&lt;br /&gt;&amp;gt;    (i,l,j,k),&lt;br /&gt;&lt;br /&gt;&amp;gt;    (k,j,l,i),&lt;br /&gt;&amp;gt;    (l,j,i,k),&lt;br /&gt;&lt;br /&gt;&amp;gt;    (j,l,k,i),&lt;br /&gt;&amp;gt;    (l,i,k,j),&lt;br /&gt;&lt;br /&gt;&amp;gt;    (j,k,i,l),&lt;br /&gt;&amp;gt;    (k,i,j,l)&lt;br /&gt;&amp;gt;    ]&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;And &lt;code&gt;symmetrise&lt;/code&gt; performs the summation:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&lt;br /&gt;&amp;gt; symmetrise group f x = sum (map f (group x))&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;We can define other molecules too. Hydrogen is easy:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&lt;br /&gt;&amp;gt; h = memo $ \a -&amp;gt; hashString ("H" ++ show a) .* return ()&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Oxygen has &lt;a href="http://en.wikipedia.org/wiki/Permutation_group"&gt;S2&lt;/a&gt; symmetry:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&lt;br /&gt;&amp;gt; s2 (i,j) = [ (i,j), (j,i) ]&lt;br /&gt;&amp;gt; o' (a,b) = hashString ("O" ++ show (a,b))&lt;br /&gt;&amp;gt; o = memo $ \x -&amp;gt; symmetrise s2 o' x .* return ()&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;You can think of a carbon atom and a hydrogen atom, say, as a pair of arrays c&lt;sub&gt;ijkl&lt;/sub&gt; and h&lt;sub&gt;i&lt;/sub&gt;, and bonding them together as summation over a shared index. So methane would be the sum over i,j,k,l = A..D of c&lt;sub&gt;ijkl&lt;/sub&gt;h&lt;sub&gt;i&lt;/sub&gt;h&lt;sub&gt;j&lt;/sub&gt;h&lt;sub&gt;k&lt;/sub&gt;h&lt;sub&gt;l&lt;/sub&gt;. To this end, define a bond as:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&lt;br /&gt;&amp;gt; bond :: V Int32 I&lt;br /&gt;&amp;gt; bond = return A .+ return B .+ return C .+ return D&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;We can make H&lt;sub&gt;2&lt;/sub&gt; like so:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&lt;br /&gt;&amp;gt; h2 = simp $ do&lt;br /&gt;&amp;gt;    i &amp;lt;- bond&lt;br /&gt;&amp;gt;    h ! i&lt;br /&gt;&amp;gt;    h ! i&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;code&gt;i &amp;lt;- bond&lt;/code&gt; makes &lt;code&gt;i&lt;/code&gt; a bond which we then attach to two hydrogen atoms. Evaluating &lt;code&gt;h2&lt;/code&gt; will give us the hash for hydrogen gas.&lt;br /&gt;&lt;br /&gt;Rather than dive straight into CH&lt;sub&gt;4&lt;/sub&gt; we can construct some useful building blocks. Hydrogen with a bond already attached:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&lt;br /&gt;&amp;gt; h_ :: V Int32 I&lt;br /&gt;&amp;gt; h_ = do&lt;br /&gt;&amp;gt;    m &amp;lt;- bond&lt;br /&gt;&amp;gt;    h ! m&lt;br /&gt;&amp;gt;    return m&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;I'm using a trailing underscore _ to indicate a free bond.&lt;br /&gt;&lt;br /&gt;&lt;code&gt;ch2_&lt;/code&gt; accepts one bond and returns another. Once memoised it is, in effect, just a 4x4 matrix and can be used to rapidly build chains.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&lt;br /&gt;&amp;gt; ch2_ = memo $ \i -&amp;gt; simp $ do&lt;br /&gt;&amp;gt;    k &amp;lt;- bond&lt;br /&gt;&amp;gt;    l &amp;lt;- bond&lt;br /&gt;&amp;gt;    m &amp;lt;- bond&lt;br /&gt;&amp;gt;    h ! l&lt;br /&gt;&amp;gt;    h ! m&lt;br /&gt;&amp;gt;    c ! (i,k,l,m)&lt;br /&gt;&amp;gt;    return k&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;So here's code to make an &lt;a href = "http://en.wikipedia.org/wiki/Alkyl"&gt;alkyl&lt;/a&gt; chain with a free bind at the end.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&lt;br /&gt;&amp;gt; alkyl_ 0 = h_&lt;br /&gt;&lt;br /&gt;&amp;gt; alkyl_ n = simp $ do&lt;br /&gt;&amp;gt;    i &amp;lt;- alkyl_ (n-1)&lt;br /&gt;&amp;gt;    ch2_ ! i&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;We can now make &lt;a href="http://en.wikipedia.org/wiki/Alkanes"&gt;alkanes&lt;/a&gt; by attaching a hydrogen atom at the end and make methane as a special case:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&lt;br /&gt;&amp;gt; alkane n = simp $ alkyl_ n &gt;&gt;= (h ! )&lt;br /&gt;&amp;gt; methane = alkane 1&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Now a &lt;a href="http://en.wikipedia.org/wiki/Hydroxyl"&gt;hydroxyl group&lt;/a&gt;:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&lt;br /&gt;&amp;gt; oh = memo $ \i -&amp;gt; simp $ do&lt;br /&gt;&amp;gt;    j &amp;lt;- bond&lt;br /&gt;&amp;gt;    h ! j&lt;br /&gt;&amp;gt;    o ! (i,j)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;And you can have a drink on me:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&lt;br /&gt;&amp;gt; ethanol = simp $ do&lt;br /&gt;&amp;gt;    i &amp;lt;- alkyl_ 2&lt;br /&gt;&amp;gt;    oh ! i&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Carbon double bonds turn out to be straightforward. We can simply use a pair of bonds:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&lt;br /&gt;&amp;gt; doubleBond :: V Int32 (I,I)&lt;br /&gt;&amp;gt; doubleBond = simp $ do&lt;br /&gt;&amp;gt;    i &amp;lt;- bond&lt;br /&gt;&amp;gt;    j &amp;lt;- bond&lt;br /&gt;&amp;gt;    return (i,j)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Carbon-carbon double bonds &lt;a href="http://en.wikipedia.org/wiki/Alkene#Bonding"&gt;tend not to twist&lt;/a&gt; and this is reflected in the hash. We'd have to apply &lt;code&gt;symmetrise&lt;/code&gt; if we wanted twistable bonds.&lt;br /&gt;&lt;br /&gt;We can make a pre-canned doubly bonded carbon atom pair:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_UdKHLrHa05M/SiwDA51O_fI/AAAAAAAAAW8/qe1h3D4ApW0/s1600-h/c_c.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 296px; height: 200px;" src="http://4.bp.blogspot.com/_UdKHLrHa05M/SiwDA51O_fI/AAAAAAAAAW8/qe1h3D4ApW0/s400/c_c.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5344650171840921074" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&lt;br /&gt;&amp;gt; c_c = memo $ \(i,j,k,l) -&amp;gt; simp $ do&lt;br /&gt;&amp;gt;    (m,n) &amp;lt;- doubleBond&lt;br /&gt;&amp;gt;    c ! (i,j,m,n)&lt;br /&gt;&amp;gt;    c ! (m,n,k,l)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;So we can build ethene like this&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&lt;br /&gt;&amp;gt; ethene = simp $ do&lt;br /&gt;&amp;gt;    i &amp;lt;- h_&lt;br /&gt;&amp;gt;    j &amp;lt;- h_&lt;br /&gt;&amp;gt;    k &amp;lt;- h_&lt;br /&gt;&amp;gt;    l &amp;lt;- h_&lt;br /&gt;&amp;gt;    c_c ! (i,j,k,l)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Here's a methyl group with a free bond&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&lt;br /&gt;&amp;gt; ch3_ = simp $ do&lt;br /&gt;&amp;gt;    l &amp;lt;- h_&lt;br /&gt;&amp;gt;    ch2_ ! l&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;So we can build a bunch more compounds&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&lt;br /&gt;&amp;gt; propene = simp $ do&lt;br /&gt;&amp;gt;    j &amp;lt;- ch3_&lt;br /&gt;&amp;gt;    i &amp;lt;- h_&lt;br /&gt;&amp;gt;    k &amp;lt;- h_&lt;br /&gt;&amp;gt;    l &amp;lt;- h_&lt;br /&gt;&amp;gt;    c_c ! (i,j,k,l)&lt;br /&gt;&lt;br /&gt;&amp;gt; cisbut2ene = simp $ do&lt;br /&gt;&amp;gt;    i &amp;lt;- ch3_&lt;br /&gt;&amp;gt;    j &amp;lt;- h_&lt;br /&gt;&amp;gt;    k &amp;lt;- ch3_&lt;br /&gt;&amp;gt;    l &amp;lt;- h_&lt;br /&gt;&amp;gt;    c_c ! (i,j,k,l)&lt;br /&gt;&lt;br /&gt;&amp;gt; transbut2ene = simp $ do&lt;br /&gt;&amp;gt;    i &amp;lt;- h_&lt;br /&gt;&amp;gt;    j &amp;lt;- ch3_&lt;br /&gt;&amp;gt;    k &amp;lt;- ch3_&lt;br /&gt;&amp;gt;    l &amp;lt;- h_&lt;br /&gt;&amp;gt;    c_c ! (i,j,k,l)&lt;br /&gt;&lt;br /&gt;&amp;gt; cisbut2ene' = simp $ do&lt;br /&gt;&amp;gt;    i &amp;lt;- h_&lt;br /&gt;&amp;gt;    j &amp;lt;- ch3_&lt;br /&gt;&amp;gt;    k &amp;lt;- h_&lt;br /&gt;&amp;gt;    l &amp;lt;- ch3_&lt;br /&gt;&amp;gt;    c_c ! (i,j,k,l)&lt;br /&gt;&lt;br /&gt;&amp;gt; _2methylpropene = simp $ do&lt;br /&gt;&amp;gt;    i &amp;lt;- ch3_&lt;br /&gt;&amp;gt;    j &amp;lt;- ch3_&lt;br /&gt;&amp;gt;    k &amp;lt;- h_&lt;br /&gt;&amp;gt;    l &amp;lt;- h_&lt;br /&gt;&amp;gt;    c_c ! (i,j,k,l)&lt;br /&gt;&lt;br /&gt;&amp;gt; _2methylpropene' = simp $ do&lt;br /&gt;&amp;gt;    i &amp;lt;- h_&lt;br /&gt;&amp;gt;    j &amp;lt;- h_&lt;br /&gt;&amp;gt;    k &amp;lt;- ch3_&lt;br /&gt;&amp;gt;    l &amp;lt;- ch3_&lt;br /&gt;&amp;gt;    c_c ! (i,j,k,l)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;An interesting problem is building a benzene ring. Here's a first attempt with six free bonds:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&lt;br /&gt;&amp;gt; ring1 (p,q,r,s,t,u) = simp $ do&lt;br /&gt;&amp;gt;    i &amp;lt;- bond&lt;br /&gt;&amp;gt;    j &amp;lt;- bond&lt;br /&gt;&amp;gt;    k &amp;lt;- bond&lt;br /&gt;&amp;gt;    c_c ! (j,q,i,p)&lt;br /&gt;&amp;gt;    c_c ! (i,u,k,t)&lt;br /&gt;&amp;gt;    c_c ! (k,s,j,r)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;The problem with that is that &lt;a href="http://en.wikipedia.org/wiki/Benzene#Structure"&gt;benzene rings&lt;/a&gt; are special. The electrons are 'delocalised' so that the ring has rotational symmetry. We need to sum over the two consistent ways to assign single and double bonds around the ring. For the more general case of interlocking benzene rings we must still sum over all consistent assignments.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_UdKHLrHa05M/SiwBpGsJWYI/AAAAAAAAAW0/e0JQZ3v-iIM/s1600-h/benzene.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 210px;" src="http://4.bp.blogspot.com/_UdKHLrHa05M/SiwBpGsJWYI/AAAAAAAAAW0/e0JQZ3v-iIM/s400/benzene.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5344648663463975298" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&lt;br /&gt;&amp;gt; ring = memo $ \(p,q,r,s,t,u) -&amp;gt; simp $ ring1 (p,q,r,s,t,u) .+ ring1 (q,r,s,t,u,p)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;And some more compunds:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&lt;br /&gt;&amp;gt; phenyl = memo $ \p -&amp;gt; simp $ do&lt;br /&gt;&amp;gt;    q &amp;lt;- h_&lt;br /&gt;&amp;gt;    r &amp;lt;- h_&lt;br /&gt;&amp;gt;    s &amp;lt;- h_&lt;br /&gt;&amp;gt;    t &amp;lt;- h_&lt;br /&gt;&amp;gt;    u &amp;lt;- h_&lt;br /&gt;&amp;gt;    ring ! (p,q,r,s,t,u)&lt;br /&gt;&lt;br /&gt;&amp;gt; benzene :: V Int32 ()&lt;br /&gt;&amp;gt; benzene = simp $ do&lt;br /&gt;&amp;gt;    i &amp;lt;- bond&lt;br /&gt;&amp;gt;    phenyl ! i&lt;br /&gt;&amp;gt;    h ! i&lt;br /&gt;&lt;br /&gt;&amp;gt; phenol :: V Int32 ()&lt;br /&gt;&amp;gt; phenol = simp $ do&lt;br /&gt;&amp;gt;    i &amp;lt;- bond&lt;br /&gt;&amp;gt;    phenyl ! i&lt;br /&gt;&amp;gt;    oh ! i&lt;br /&gt;&lt;br /&gt;&amp;gt; toluene :: V Int32 ()&lt;br /&gt;&amp;gt; toluene = simp $ do&lt;br /&gt;&amp;gt;    i &amp;lt;- ch3_&lt;br /&gt;&amp;gt;    phenyl ! i&lt;br /&gt;&lt;br /&gt;&amp;gt; toluene' :: V Int32 ()&lt;br /&gt;&amp;gt; toluene' = simp $ do&lt;br /&gt;&amp;gt;    p &amp;lt;- h_&lt;br /&gt;&amp;gt;    q &amp;lt;- h_&lt;br /&gt;&amp;gt;    r &amp;lt;- h_&lt;br /&gt;&amp;gt;    s &amp;lt;- ch3_&lt;br /&gt;&amp;gt;    t &amp;lt;- h_&lt;br /&gt;&amp;gt;    u &amp;lt;- h_&lt;br /&gt;&amp;gt;    ring ! (p,q,r,s,t,u)&lt;br /&gt;&lt;br /&gt;&amp;gt; toluene'' :: V Int32 ()&lt;br /&gt;&amp;gt; toluene'' = simp $ do&lt;br /&gt;&amp;gt;    p &amp;lt;- h_&lt;br /&gt;&amp;gt;    q &amp;lt;- h_&lt;br /&gt;&amp;gt;    r &amp;lt;- ch3_&lt;br /&gt;&amp;gt;    s &amp;lt;- h_&lt;br /&gt;&amp;gt;    t &amp;lt;- h_&lt;br /&gt;&amp;gt;    u &amp;lt;- h_&lt;br /&gt;&amp;gt;    ring ! (p,q,r,s,t,u)&lt;br /&gt;&lt;br /&gt;&amp;gt; _1_2_dimethylbenzene = simp $ do&lt;br /&gt;&amp;gt;    p &amp;lt;- ch3_&lt;br /&gt;&amp;gt;    q &amp;lt;- ch3_&lt;br /&gt;&amp;gt;    r &amp;lt;- h_&lt;br /&gt;&amp;gt;    s &amp;lt;- h_&lt;br /&gt;&amp;gt;    t &amp;lt;- h_&lt;br /&gt;&amp;gt;    u &amp;lt;- h_&lt;br /&gt;&amp;gt;    ring ! (p,q,r,s,t,u)&lt;br /&gt;&lt;br /&gt;&amp;gt; _1_3_dimethylbenzene = simp $ do&lt;br /&gt;&amp;gt;    p &amp;lt;- ch3_&lt;br /&gt;&amp;gt;    q &amp;lt;- h_&lt;br /&gt;&amp;gt;    r &amp;lt;- ch3_&lt;br /&gt;&amp;gt;    s &amp;lt;- h_&lt;br /&gt;&amp;gt;    t &amp;lt;- h_&lt;br /&gt;&amp;gt;    u &amp;lt;- h_&lt;br /&gt;&amp;gt;    ring ! (p,q,r,s,t,u)&lt;br /&gt;&lt;br /&gt;&amp;gt; _1_4_dimethylbenzene = simp $ do&lt;br /&gt;&amp;gt;    p &amp;lt;- ch3_&lt;br /&gt;&amp;gt;    q &amp;lt;- h_&lt;br /&gt;&amp;gt;    r &amp;lt;- h_&lt;br /&gt;&amp;gt;    s &amp;lt;- ch3_&lt;br /&gt;&amp;gt;    t &amp;lt;- h_&lt;br /&gt;&amp;gt;    u &amp;lt;- h_&lt;br /&gt;&amp;gt;    ring ! (p,q,r,s,t,u)&lt;br /&gt;&lt;br /&gt;&amp;gt; _1_5_dimethylbenzene = simp $ do&lt;br /&gt;&amp;gt;    p &amp;lt;- ch3_&lt;br /&gt;&amp;gt;    q &amp;lt;- h_&lt;br /&gt;&amp;gt;    r &amp;lt;- h_&lt;br /&gt;&amp;gt;    s &amp;lt;- h_&lt;br /&gt;&amp;gt;    t &amp;lt;- ch3_&lt;br /&gt;&amp;gt;    u &amp;lt;- h_&lt;br /&gt;&amp;gt;    ring ! (p,q,r,s,t,u)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;As we might hope, the three different ways to define toluene give the same result. We also discover that 1,3- and 1,5-dimethylbenzene are the same compound (or at least &lt;em&gt;probably&lt;/em&gt; are, this isn't a &lt;a href="http://en.wikipedia.org/wiki/Perfect_hash_function"&gt;perfect hash&lt;/a&gt;).&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&lt;br /&gt;&amp;gt; main = do&lt;br /&gt;&amp;gt;    print $ "toluene = " ++ show toluene&lt;br /&gt;&amp;gt;    print $ "toluene' = " ++ show toluene'&lt;br /&gt;&amp;gt;    print $ "toluene'' = " ++ show toluene''&lt;br /&gt;&amp;gt;    print $ "_1_2_dimethylbenzene = " ++ show _1_2_dimethylbenzene&lt;br /&gt;&amp;gt;    print $ "_1_3_dimethylbenzene = " ++ show _1_3_dimethylbenzene&lt;br /&gt;&amp;gt;    print $ "_1_4_dimethylbenzene = " ++ show _1_4_dimethylbenzene&lt;br /&gt;&amp;gt;    print $ "_1_5_dimethylbenzene = " ++ show _1_5_dimethylbenzene&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Now I need to say something about performance. The above code is naive and performs many unnecessary summations. For example, hashing a long chain should only take time linear in its length. But using the above code indiscriminately could give you exponential time. A good implementation might take a divide and conquer approach: slice the molecule in half through a bunch of bonds, compute partial hashes for each half and then sew the halves together in time exponential in the number of bonds you sliced through. For the types of molecules I've seen in real pharmaceutical databases (say) this is actually pretty cheap if you're smart about the slicing. The hashes in the above code could probably be computed many thousands of times faster. As it is, you'll probably need to compile the above with optimisation.&lt;br /&gt;&lt;br /&gt;I'm willing to bet that with small changes, and with suitable choice of matrices over the &lt;em&gt;reals&lt;/em&gt;, we can get invariants of molecules that predict physical properties. These calulations are reminiscent of algorithms for various types of counting algorithm so at the very least they probably compute things that are meaningful from a statistical mechanics perspective.&lt;br /&gt;&lt;br /&gt;Incidentally, this approach to stitching together atoms was inspired by an old paper by R. C. Penner on fatgraphs - nothing to do with chemistry. A few days ago he put a paper online about an application to &lt;a href="http://arxiv.org/abs/0902.1025"&gt;modelling proteins&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Update: Since writing this I've found a name for what I'm doing. I'm converting a chemical structure diagram into a &lt;a href="http://www.google.com/search?q=tensor+networks"&gt;tensor network&lt;/a&gt;. There seems to be lots of literature on how to evaluate these efficiently. In effect, everything I've been doing in this blog in terms of converting diagrams to code is an example of evaluating a tensor network.&lt;br /&gt;&lt;br /&gt;&lt;hr&gt;&lt;br /&gt;&lt;br /&gt;Now comes the memoisation class:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&lt;br /&gt;&amp;gt; class Memoisable ix where&lt;br /&gt;&amp;gt;    memo :: (ix -&amp;gt; a) -&amp;gt; Array ix a&lt;br /&gt;&lt;br /&gt;&amp;gt; instance Memoisable I where&lt;br /&gt;&amp;gt;    memo f = array (A,D) [(i,f i) | i &amp;lt;- [A .. D]]&lt;br /&gt;&lt;br /&gt;&amp;gt; instance Memoisable (I,I) where&lt;br /&gt;&amp;gt;    memo f = array ((A,A),(D,D)) [(i,f i) |&lt;br /&gt;&amp;gt;        p &amp;lt;- [A .. D],&lt;br /&gt;&amp;gt;        q &amp;lt;- [A .. D],&lt;br /&gt;&amp;gt;        let i = (p,q) ]&lt;br /&gt;&lt;br /&gt;&amp;gt; instance Memoisable (I,I,I,I) where&lt;br /&gt;&amp;gt;    memo f = array ((A,A,A,A),(D,D,D,D)) [(i,f i) |&lt;br /&gt;&amp;gt;        p &amp;lt;- [A .. D],&lt;br /&gt;&amp;gt;        q &amp;lt;- [A .. D],&lt;br /&gt;&amp;gt;        r &amp;lt;- [A .. D],&lt;br /&gt;&amp;gt;        s &amp;lt;- [A .. D],&lt;br /&gt;&amp;gt;        let i = (p,q,r,s) ]&lt;br /&gt;&lt;br /&gt;&amp;gt; instance Memoisable (I,I,I,I,I,I) where&lt;br /&gt;&amp;gt;    memo f = array ((A,A,A,A,A,A),(D,D,D,D,D,D)) [(i,f i) |&lt;br /&gt;&amp;gt;        p &amp;lt;- [A .. D],&lt;br /&gt;&amp;gt;        q &amp;lt;- [A .. D],&lt;br /&gt;&amp;gt;        r &amp;lt;- [A .. D],&lt;br /&gt;&amp;gt;        s &amp;lt;- [A .. D],&lt;br /&gt;&amp;gt;        t &amp;lt;- [A .. D],&lt;br /&gt;&amp;gt;        u &amp;lt;- [A .. D],&lt;br /&gt;&amp;gt;        let i = (p,q,r,s,t,u) ]&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Missing instances from &lt;code&gt;Data.Array&lt;/code&gt;:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&lt;br /&gt;&amp;gt; instance (Ix a1, Ix a2, Ix a3, Ix a4, Ix a5, Ix a6) =&amp;gt; Ix (a1,a2,a3,a4,a5,a6)  where&lt;br /&gt;&amp;gt;    range ((l1,l2,l3,l4,l5,l6),(u1,u2,u3,u4,u5,u6)) =&lt;br /&gt;&amp;gt;      [(i1,i2,i3,i4,i5,i6) | i1 &amp;lt;- range (l1,u1),&lt;br /&gt;&amp;gt;                             i2 &amp;lt;- range (l2,u2),&lt;br /&gt;&amp;gt;                             i3 &amp;lt;- range (l3,u3),&lt;br /&gt;&amp;gt;                             i4 &amp;lt;- range (l4,u4),&lt;br /&gt;&amp;gt;                             i5 &amp;lt;- range (l5,u5),&lt;br /&gt;&amp;gt;                             i6 &amp;lt;- range (l6,u6)]&lt;br /&gt;&lt;br /&gt;&amp;gt;    unsafeIndex ((l1,l2,l3,l4,l5,l6),(u1,u2,u3,u4,u5,u6)) (i1,i2,i3,i4,i5,i6) =&lt;br /&gt;&amp;gt;      unsafeIndex (l6,u6) i6 + unsafeRangeSize (l6,u6) * (&lt;br /&gt;&amp;gt;      unsafeIndex (l5,u5) i5 + unsafeRangeSize (l5,u5) * (&lt;br /&gt;&amp;gt;      unsafeIndex (l4,u4) i4 + unsafeRangeSize (l4,u4) * (&lt;br /&gt;&amp;gt;      unsafeIndex (l3,u3) i3 + unsafeRangeSize (l3,u3) * (&lt;br /&gt;&amp;gt;      unsafeIndex (l2,u2) i2 + unsafeRangeSize (l2,u2) * (&lt;br /&gt;&amp;gt;      unsafeIndex (l1,u1) i1)))))&lt;br /&gt;&lt;br /&gt;&amp;gt;    inRange ((l1,l2,l3,l4,l5,l6),(u1,u2,u3,u4,u5,u6)) (i1,i2,i3,i4,i5,i6) =&lt;br /&gt;&amp;gt;      inRange (l1,u1) i1 &amp;amp;&amp;amp; inRange (l2,u2) i2 &amp;amp;&amp;amp;&lt;br /&gt;&amp;gt;      inRange (l3,u3) i3 &amp;amp;&amp;amp; inRange (l4,u4) i4 &amp;amp;&amp;amp;&lt;br /&gt;&amp;gt;      inRange (l5,u5) i5 &amp;amp;&amp;amp; inRange (l6,u6) i6&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;And the same vector space monad I've used many times before. Strictly speaking, it's more like a semiring module monad as &lt;code&gt;Int32&lt;/code&gt; isn't a field.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&lt;br /&gt;&amp;gt; swap (x,y) = (y,x)&lt;br /&gt;&lt;br /&gt;&amp;gt; class Num k =&amp;gt; VectorSpace k v | v -&amp;gt; k where&lt;br /&gt;&amp;gt;    zero :: v&lt;br /&gt;&amp;gt;    (.+) :: v -&amp;gt; v -&amp;gt; v&lt;br /&gt;&amp;gt;    (.*) :: k -&amp;gt; v -&amp;gt; v&lt;br /&gt;&amp;gt;    (.-) :: v -&amp;gt; v -&amp;gt; v&lt;br /&gt;&amp;gt;    v1 .- v2 = v1 .+ ((-1).*v2)&lt;br /&gt;&lt;br /&gt;&amp;gt; data V k a = V { unV :: [(k,a)] } deriving (Show)&lt;br /&gt;&lt;br /&gt;&amp;gt; reduce x = filter ((/=0) . fst) $ fmap swap $ M.toList $ M.fromListWith (+) $ fmap swap $ x&lt;br /&gt;&amp;gt; simp (V x) = V (reduce x)&lt;br /&gt;&lt;br /&gt;&amp;gt; instance (Ord a,Num k) =&amp;gt; Eq (V k a) where&lt;br /&gt;&amp;gt;  V x==V y = reduce x==reduce y&lt;br /&gt;&lt;br /&gt;&amp;gt; instance (Ord a,Num k,Ord k) =&amp;gt; Ord (V k a) where&lt;br /&gt;&amp;gt;  compare (V x) (V y) = compare (reduce x) (reduce y)&lt;br /&gt;&lt;br /&gt;&amp;gt; instance Num k =&amp;gt; Functor (V k) where&lt;br /&gt;&amp;gt;    fmap f (V as) = V $ map (\(k,a) -&amp;gt; (k,f a)) as&lt;br /&gt;&lt;br /&gt;&amp;gt; instance Num k =&amp;gt; Monad (V k) where&lt;br /&gt;&amp;gt;    return a = V [(1,a)]&lt;br /&gt;&amp;gt;    x &gt;&gt;= f = join (fmap f x)&lt;br /&gt;&amp;gt;        where join x = V $ concat $ fmap (uncurry scale) $ unV $ fmap unV x&lt;br /&gt;&amp;gt;              scale k1 as = map (\(k2,a) -&amp;gt; (k1*k2,a)) as&lt;br /&gt;&lt;br /&gt;&amp;gt; instance Num r =&amp;gt; MonadPlus (V r) where&lt;br /&gt;&amp;gt;    mzero = V []&lt;br /&gt;&amp;gt;    mplus (V x) (V y) = V (x++y)&lt;br /&gt;&lt;br /&gt;&amp;gt; instance (Num k,Ord a) =&amp;gt; VectorSpace k (V k a) where&lt;br /&gt;&amp;gt;    zero = V []&lt;br /&gt;&amp;gt;    V x .+ V y = V (x ++ y)&lt;br /&gt;&amp;gt;    (.*) k = (&gt;&gt;= (\a -&amp;gt; V [(k,a)]))&lt;br /&gt;&lt;br /&gt;&amp;gt; e = return :: Num k =&amp;gt; a -&amp;gt; V k a&lt;br /&gt;&amp;gt; coefficient b (V bs) = maybe 0 id (L.lookup b (map swap (reduce bs)))&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-2073443236440619074?l=blog.sigfpe.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sigfpe.com/feeds/2073443236440619074/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=11295132&amp;postID=2073443236440619074' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/2073443236440619074'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/2073443236440619074'/><link rel='alternate' type='text/html' href='http://blog.sigfpe.com/2009/06/hashing-molecules.html' title='Hashing Molecules'/><author><name>sigfpe</name><uri>http://www.blogger.com/profile/08096190433222340957</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='09401818062305273147'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_UdKHLrHa05M/Siv_DTt45II/AAAAAAAAAWs/lqLNr4wTQYI/s72-c/c.png' height='72' width='72'/><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11295132.post-7234571425486412335</id><published>2009-05-16T13:34:00.000-07:00</published><updated>2009-05-23T07:24:31.882-07:00</updated><title type='text'>Trace Diagrams with Monads</title><content type='html'>&lt;a href="http://blog.sigfpe.com/2008/08/untangling-with-continued-fractions.html"&gt;Knot diagrams&lt;/a&gt; aren't the only kind of diagram that can be translated nicely into Haskell monad notation. Other types of diagram include Penrose &lt;a href="http://en.wikipedia.org/wiki/Spin_network"&gt;Spin Networks&lt;/a&gt;, many kinds of &lt;a href="http://en.wikipedia.org/wiki/Feynman_diagram"&gt;Feynman Diagram&lt;/a&gt;, Penrose &lt;a href="http://en.wikipedia.org/wiki/Penrose%27s_graphical_notation"&gt;Tensor Notation&lt;/a&gt;, &lt;a href="http://www.nbi.dk/GroupTheory/"&gt;birdtracks&lt;/a&gt; and a type of closely related diagram I hadn't met before: &lt;a href="http://en.wikipedia.org/wiki/Trace_diagram"&gt;Trace Diagrams&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;I encourage readers to check out the Wikipedia page and associated papers on trace diagrams as they give a better tutorial than I could write. My aim here is to show how those diagrams can be translated directly into working code just like with knots.&lt;br /&gt;&lt;br /&gt;As usual, this is literate Haskell so I need these lines:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&gt; {-# LANGUAGE MultiParamTypeClasses,FunctionalDependencies,FlexibleInstances #-}&lt;br /&gt;&lt;br /&gt;&gt; module Main where&lt;br /&gt;&lt;br /&gt;&gt; import qualified Data.Map as M&lt;br /&gt;&gt; import Control.Monad&lt;br /&gt;&gt; infixl 5 .+&lt;br /&gt;&gt; infixl 6 .*&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;I'll reuse my vector space monad code from before and work in a 3D space with the axes labelled X, Y and Z.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&gt; data Space = X | Y | Z deriving (Eq,Show,Ord)&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;We draw vectors as little boxes with connections emerging from them:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_UdKHLrHa05M/Sg8oG2BtxCI/AAAAAAAAAUU/DqvIkkZPRic/s1600-h/pica.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 45px; height: 61px;" src="http://2.bp.blogspot.com/_UdKHLrHa05M/Sg8oG2BtxCI/AAAAAAAAAUU/DqvIkkZPRic/s400/pica.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5336528181504361506" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Now recall from my knot posts that we represent a diagram with m legs at the top and n legs at the bottom as an expression that takes an m-tuple as input and returns an n-tuple "in the monad" as output.&lt;br /&gt;&lt;br /&gt;Vectors can be represented as elements of &lt;code&gt;V Float Space&lt;/code&gt;, for example:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&gt; u,v,w :: V Float Space&lt;br /&gt;&gt; u = return X .- return Y&lt;br /&gt;&gt; v = return X .+ 2.* return Y&lt;br /&gt;&gt; w = return Y .- return Z&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;I could have emphasised that there are zero inputs at the top by using type signature &lt;code&gt;() -&gt; V Float Space&lt;/code&gt; instead.&lt;br /&gt;&lt;br /&gt;Given two vectors we can form their dot product. The dot product itself is represented by a little u-shaped curve:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_UdKHLrHa05M/Sg8pi3wQ6cI/AAAAAAAAAUc/SzAWswQZc5g/s1600-h/picb.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 104px; height: 70px;" src="http://1.bp.blogspot.com/_UdKHLrHa05M/Sg8pi3wQ6cI/AAAAAAAAAUc/SzAWswQZc5g/s400/picb.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5336529762516003266" /&gt;&lt;/a&gt;&lt;br /&gt;So the dot product of v and w is drawn as:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_UdKHLrHa05M/ShBQR7-7p1I/AAAAAAAAAWU/Q5P9YW0HQC4/s1600-h/picc.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 150px; height: 105px;" src="http://1.bp.blogspot.com/_UdKHLrHa05M/ShBQR7-7p1I/AAAAAAAAAWU/Q5P9YW0HQC4/s400/picc.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5336853827523684178" /&gt;&lt;/a&gt;&lt;br /&gt;(The i and j are just so you can see what corresponds to what in the code below. They're not really part of the diagram.)&lt;br /&gt;&lt;br /&gt;We can implement the dot product as&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&gt; cup :: (Space,Space) -&gt; V Float ()&lt;br /&gt;&gt; cup (i,j) = case (i,j) of&lt;br /&gt;&gt;    (X,X) -&gt; return ()&lt;br /&gt;&gt;    (Y,Y) -&gt; return ()&lt;br /&gt;&gt;    (Z,Z) -&gt; return ()&lt;br /&gt;&gt;    otherwise -&gt; 0 .* return ()&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;and compute an example using&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&gt; vdotw = do&lt;br /&gt;&gt;     i &amp;lt;- v&lt;br /&gt;&gt;     j &amp;lt;- w&lt;br /&gt;&gt;     cup (i,j)&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;We hook up legs of the diagram using corresponding inputs and outputs in the code.&lt;br /&gt;&lt;br /&gt;Now consider this diagram:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_UdKHLrHa05M/Sg8rokViFLI/AAAAAAAAAUs/s5KiRxQ4N08/s1600-h/picd.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 125px; height: 104px;" src="http://2.bp.blogspot.com/_UdKHLrHa05M/Sg8rokViFLI/AAAAAAAAAUs/s5KiRxQ4N08/s400/picd.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5336532059406079154" /&gt;&lt;/a&gt;&lt;br /&gt;If we attach another vector to the free leg then we get the dot product. So this object is a thing that maps vectors to scalars. Ie. it's a dual vector. So dual vectors are represented by diagrams with a free leg at the top. We can redraw this diagram:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_UdKHLrHa05M/ShAgiacskiI/AAAAAAAAAVE/kOAtUHdCS10/s1600-h/pice.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 45px; height: 61px;" src="http://4.bp.blogspot.com/_UdKHLrHa05M/ShAgiacskiI/AAAAAAAAAVE/kOAtUHdCS10/s400/pice.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5336801334021362210" /&gt;&lt;/a&gt;In other words, turning a vector v upside down turns it into a dual vector that takes w to the dot product of v and w. Here's some code for making the dual of v.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&gt; dual :: V Float Space -&gt; Space -&gt; V Float ()&lt;br /&gt;&gt; dual v i = do&lt;br /&gt;&gt;     j &amp;lt;- v&lt;br /&gt;&gt;     cup (i,j)&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;We can also consider cross products. These take two vectors as input and output one. So we're looking at a diagram with two legs at the top and one at the bottom. We'll use a bold dot to represent one of these:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_UdKHLrHa05M/ShBGGT5huYI/AAAAAAAAAWM/qwKyiwHotLk/s1600-h/picf.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 206px; height: 181px;" src="http://1.bp.blogspot.com/_UdKHLrHa05M/ShBGGT5huYI/AAAAAAAAAWM/qwKyiwHotLk/s400/picf.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5336842632668756354" /&gt;&lt;/a&gt;&lt;br /&gt;Here's the implementation:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&gt; cross :: (Space,Space) -&gt; V Float Space&lt;br /&gt;&gt; cross (X,Y) = return Z&lt;br /&gt;&gt; cross (Y,Z) = return X&lt;br /&gt;&gt; cross (Z,X) = return Y&lt;br /&gt;&lt;br /&gt;&gt; cross (Y,X) = (-1) .* return Z&lt;br /&gt;&gt; cross (Z,Y) = (-1) .* return X&lt;br /&gt;&gt; cross (X,Z) = (-1) .* return Y&lt;br /&gt;&lt;br /&gt;&gt; cross _ = mzero&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;We can form a triple product u.(v&amp;times;w) like this:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_UdKHLrHa05M/ShAjCOPX65I/AAAAAAAAAVM/2r7iKyOIUaY/s1600-h/picg.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 253px; height: 196px;" src="http://2.bp.blogspot.com/_UdKHLrHa05M/ShAjCOPX65I/AAAAAAAAAVM/2r7iKyOIUaY/s400/picg.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5336804079523326866" /&gt;&lt;/a&gt;&lt;br /&gt;We can then abstract out the triple product bit that looks like this:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_UdKHLrHa05M/ShAkWQCQpmI/AAAAAAAAAVU/477hs9QtpXA/s1600-h/pich.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 206px; height: 102px;" src="http://4.bp.blogspot.com/_UdKHLrHa05M/ShAkWQCQpmI/AAAAAAAAAVU/477hs9QtpXA/s400/pich.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5336805523114206818" /&gt;&lt;/a&gt;&lt;br /&gt;Implementing it as:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&gt; trident :: (Space,Space,Space) -&gt; V Float ()&lt;br /&gt;&gt; trident (i,j,k) = do&lt;br /&gt;&gt;    l &amp;lt;- cross (i,j)&lt;br /&gt;&gt;    cup (l,k)&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Remember that if u, v and w give the rows of a 3x3 matrix, then u.(v&amp;times;w) is the determinant of that matrix.&lt;br /&gt;&lt;br /&gt;We can also define a dot product for dual vectors that we can draw like this:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_UdKHLrHa05M/ShAlddOjZCI/AAAAAAAAAVc/52iZrXTcud4/s1600-h/pici.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 104px; height: 70px;" src="http://3.bp.blogspot.com/_UdKHLrHa05M/ShAlddOjZCI/AAAAAAAAAVc/52iZrXTcud4/s400/pici.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5336806746426139682" /&gt;&lt;/a&gt;&lt;br /&gt;The code looks like this:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&gt; cap :: () -&gt; V Float (Space,Space)&lt;br /&gt;&gt; cap () = return (X,X) .+ return (Y,Y) .+ return (Z,Z)&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;We can now combine the two dot products in a diagram like this:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_UdKHLrHa05M/ShBWthiml-I/AAAAAAAAAWk/asdvRbzwshk/s1600-h/picj.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 246px; height: 157px;" src="http://3.bp.blogspot.com/_UdKHLrHa05M/ShBWthiml-I/AAAAAAAAAWk/asdvRbzwshk/s400/picj.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5336860898531645410" /&gt;&lt;/a&gt;&lt;br /&gt;We can write that as:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&gt; cupcap i = do&lt;br /&gt;&gt;      (j,k) &amp;lt;- cap ()&lt;br /&gt;&gt;      cup (i,j)&lt;br /&gt;&gt;      return k&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;We'd hope that we could pull this diagram taut and get the identity linear map. If you try applying &lt;code&gt;cupcap&lt;/code&gt; to X, Y and Z you'll see it has exactly the same effect as &lt;code&gt;return&lt;/code&gt;, which does indeed represent the identity.&lt;br /&gt;&lt;br /&gt;(If you allow me to digress, I'll point out that there's something really deep going on with this almost trivial looking identity. It represents the identity map in the sense that it copies the input i to the output k. Imagine we were dealing with the &lt;a href="http://blog.sigfpe.com/2007/04/trivial-monad.html"&gt;trivial monad&lt;/a&gt;, ie. the one that just wraps values. Then no matter how &lt;code&gt;cup&lt;/code&gt; and &lt;code&gt;cap&lt;/code&gt; were implemented it would be impossible for k to be a copy of i. If you follow the flow of information through that code then i disappears into &lt;code&gt;cup&lt;/code&gt; and k is read from &lt;code&gt;cap&lt;/code&gt; without it seeing i. If we read from top to bottom we can think of cap as emitting a pair of objects and of cup as absorbing two. There is no way that any information about i can be communicated to k. But in the vector space monad, k &lt;em&gt;can&lt;/em&gt; depend on i. As I've mentioned a few times over the years, the universe is described by quantum mechanics which can be described using the &lt;a href="http://blog.sigfpe.com/2007/02/monads-for-vector-spaces-probability.html"&gt;vector space monad&lt;/a&gt;. Amazingly the above piece of code, or at least something like it, can be physically realised in terms of particles. It describes a process that is fundamentally quantum, and not classical. In fact, Coecke shows that this is a precursor to quantum teleportation in section 3c of &lt;a href="http://arxiv.org/pdf/quant-ph/0510032v1"&gt;this paper&lt;/a&gt;. You could also think in terms of information about i being sent back in time through the cap. That's the idea behind this paper on &lt;a href="http://arxiv.org/pdf/0902.4898"&gt;Effective Quantum Time Travel&lt;/a&gt;.)&lt;br /&gt;&lt;br /&gt;Now we can make a fork by bending down the tines of the cross product:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_UdKHLrHa05M/ShAqHfpD0dI/AAAAAAAAAVs/Ggt5XyhcOz8/s1600-h/pickk.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 206px; height: 102px;" src="http://3.bp.blogspot.com/_UdKHLrHa05M/ShAqHfpD0dI/AAAAAAAAAVs/Ggt5XyhcOz8/s400/pickk.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5336811866675204562" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&gt; fork () = do&lt;br /&gt;&gt;    (i,j) &amp;lt;- cap ()&lt;br /&gt;&gt;    (k,l) &amp;lt;- cap ()&lt;br /&gt;&gt;    m &amp;lt;- cross (j,k)&lt;br /&gt;&gt;    return (i,l,m)&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;We can write matrices as boxes with a leg for input and a leg for output. Here's an example matrix:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_UdKHLrHa05M/ShArlywlSMI/AAAAAAAAAV0/IsvSKCKGw9M/s1600-h/picl.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 53px; height: 79px;" src="http://3.bp.blogspot.com/_UdKHLrHa05M/ShArlywlSMI/AAAAAAAAAV0/IsvSKCKGw9M/s400/picl.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5336813486714734786" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&gt; a :: Space -&gt; V Float Space&lt;br /&gt;&gt; a X = 2 .* return X&lt;br /&gt;&gt; a Y = return Z&lt;br /&gt;&gt; a Z = (-1) .* return Y&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;It rotates by 90 degrees around the X axis and scales the X axis by a factor of two.&lt;br /&gt;&lt;br /&gt;With the help of our two dot products we can turn a matrix upside-down:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_UdKHLrHa05M/ShAtUAI2MaI/AAAAAAAAAV8/JNRV8VMC7zQ/s1600-h/picm.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 154px;" src="http://1.bp.blogspot.com/_UdKHLrHa05M/ShAtUAI2MaI/AAAAAAAAAV8/JNRV8VMC7zQ/s400/picm.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5336815380091777442" /&gt;&lt;/a&gt;&lt;br /&gt;The corresponding code is:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&gt; b :: Space -&gt; V Float Space&lt;br /&gt;&gt; b l = do&lt;br /&gt;&gt;     (i,j) &amp;lt;- cap ()&lt;br /&gt;&gt;     k &amp;lt;- a j&lt;br /&gt;&gt;     cup (k,l)&lt;br /&gt;&gt;     return i&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Turning a matrix upside down gives its transpose. You'll find that matrix B rotates in the opposite direction to A but still scales by a factor of two.&lt;br /&gt;&lt;br /&gt;Surprisingly, 3! times the determinant of a 3x3 matrix A can be represented by this diagram:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_UdKHLrHa05M/ShBRffcw89I/AAAAAAAAAWc/reNZfgEVFx4/s1600-h/picn.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 257px; height: 227px;" src="http://4.bp.blogspot.com/_UdKHLrHa05M/ShBRffcw89I/AAAAAAAAAWc/reNZfgEVFx4/s400/picn.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5336855159893980114" /&gt;&lt;/a&gt;&lt;br /&gt;So we can write a determinant routine as follows:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&gt; det a = do&lt;br /&gt;&gt;     (i,j,k) &amp;lt;- fork ()&lt;br /&gt;&gt;     i' &amp;lt;- a i&lt;br /&gt;&gt;     j' &amp;lt;- a j&lt;br /&gt;&gt;     k' &amp;lt;- a k&lt;br /&gt;&gt;     (1/6.0) .* trident (i',j',k')&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;(Again I've labelled the diagram so you can easily see what corresponds where in the code.)&lt;br /&gt;&lt;br /&gt;I could keep going, but at this point I'll just defer to Elisha Peterson's &lt;a href="http://arxiv.org/pdf/0712.2058"&gt;paper&lt;/a&gt;. I hope that I've given you enough clues to be able to translate his diagrams into Haskell code, in effect giving semantics for his syntax. As an exercise, try writing code to compute the adjugate of a 3x3 matrix.&lt;br /&gt;&lt;br /&gt;And a reminder: none of the above is intended as production-worthy code for working with 3-vectors. It is intended purely as a way to give a practical realisation of trace diagrams allow people to experimentally investigate their properties and make testable conjectures.&lt;br /&gt;&lt;br /&gt;And now comes the library code needed to make the above code work:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&gt; swap (x,y) = (y,x)&lt;br /&gt;&lt;br /&gt;&gt; class Num k =&gt; VectorSpace k v | v -&gt; k where&lt;br /&gt;&gt;    zero :: v&lt;br /&gt;&gt;    (.+) :: v -&gt; v -&gt; v&lt;br /&gt;&gt;    (.*) :: k -&gt; v -&gt; v&lt;br /&gt;&gt;    (.-) :: v -&gt; v -&gt; v&lt;br /&gt;&gt;    v1 .- v2 = v1 .+ ((-1).*v2)&lt;br /&gt;&lt;br /&gt;&gt; data V k a = V { unV :: [(k,a)] }&lt;br /&gt;&gt; instance (Num k,Ord a,Show a) =&gt; Show (V k a) where&lt;br /&gt;&gt;   show (V x) = show (reduce x)&lt;br /&gt;&lt;br /&gt;&gt; reduce x = filter ((/=0) . fst) $ fmap swap $ M.toList $ M.fromListWith (+) $ fmap swap $ x&lt;br /&gt;&lt;br /&gt;&gt; instance (Ord a,Num k) =&gt; Eq (V k a) where&lt;br /&gt;&gt;  V x==V y = reduce x==reduce y&lt;br /&gt;&lt;br /&gt;&gt; instance (Ord a,Num k,Ord k) =&gt; Ord (V k a) where&lt;br /&gt;&gt;  compare (V x) (V y) = compare (reduce x) (reduce y)&lt;br /&gt;&lt;br /&gt;&gt; instance Num k =&gt; Functor (V k) where&lt;br /&gt;&gt;    fmap f (V as) = V $ map (\(k,a) -&gt; (k,f a)) as&lt;br /&gt;&lt;br /&gt;&gt; instance Num k =&gt; Monad (V k) where&lt;br /&gt;&gt;    return a = V [(1,a)]&lt;br /&gt;&gt;    x &gt;&gt;= f = join (fmap f x)&lt;br /&gt;&gt;        where join x = V $ concat $ fmap (uncurry scale) $ unV $ fmap unV x&lt;br /&gt;&gt;              scale k1 as = map (\(k2,a) -&gt; (k1*k2,a)) as&lt;br /&gt;&lt;br /&gt;&gt; instance Num r =&gt; MonadPlus (V r) where&lt;br /&gt;&gt;    mzero = V []&lt;br /&gt;&gt;    mplus (V x) (V y) = V (x++y)&lt;br /&gt;&lt;br /&gt;&gt; instance (Num k,Ord a) =&gt; VectorSpace k (V k a) where&lt;br /&gt;&gt;    zero = V []&lt;br /&gt;&gt;    V x .+ V y = V (x ++ y)&lt;br /&gt;&gt;    (.*) k = (&gt;&gt;= (\a -&gt; V [(k,a)]))&lt;br /&gt;&lt;br /&gt;&gt; e = return :: Num k =&gt; a -&gt; V k a&lt;br /&gt;&gt; coefficient b (V bs) = maybe 0 id (lookup b (map swap (reduce bs)))&lt;br /&gt;&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-7234571425486412335?l=blog.sigfpe.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sigfpe.com/feeds/7234571425486412335/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=11295132&amp;postID=7234571425486412335' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/7234571425486412335'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/7234571425486412335'/><link rel='alternate' type='text/html' href='http://blog.sigfpe.com/2009/05/trace-diagrams-with-monads.html' title='Trace Diagrams with Monads'/><author><name>sigfpe</name><uri>http://www.blogger.com/profile/08096190433222340957</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='09401818062305273147'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_UdKHLrHa05M/Sg8oG2BtxCI/AAAAAAAAAUU/DqvIkkZPRic/s72-c/pica.png' height='72' width='72'/><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11295132.post-3930070047200672636</id><published>2009-05-03T08:30:00.000-07:00</published><updated>2009-11-14T11:48:22.829-08:00</updated><title type='text'>The Three Projections of Doctor Futamura</title><content type='html'>&lt;H3&gt;Introduction&lt;/H3&gt;&lt;br /&gt;The &lt;a href="http://en.wikipedia.org/wiki/Partial_evaluation#Futamura_projections"&gt;Three Projections of Futamura&lt;/a&gt; are a sequence of applications of a programming technique called 'partial evaluation' or 'specialisation', each one more mind-bending than the previous one. But it shouldn't be programmers who have all the fun. So I'm going to try to explain the three projections in a way that non-programmers can maybe understand too. But whether you're a programmer or not, this kind of self-referential reasoning can hurt your brain. At least it hurts mine. But it's a good pain, right?&lt;br /&gt;&lt;br /&gt;So rather than talk about computer programs, I'll talk about machines of the mechanical variety. A bit like computer programs, these machines will have some kind of slot for inputting stuff, and some kind of slot where output will come out. But unlike computer programs, I'll be able to draw pictures of them to show what I'm talking about. I'll also assume these machines have access to an infinite supply of raw materials for manufacturing purposes and I'll also assume that these machines can replicate stuff - because in a computer we can freely make copies of data, until we run out of memory at least.&lt;br /&gt;&lt;br /&gt;&lt;H3&gt;Minting coins&lt;/H3&gt;&lt;br /&gt;A really simple example of a machine is one that has a slot for inputting blanks, and outputs newly minted coins:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_UdKHLrHa05M/Sf3BzV77HfI/AAAAAAAAATc/flT5FsYW8_M/s1600-h/dollar_minting.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 340px; height: 177px;" src="http://4.bp.blogspot.com/_UdKHLrHa05M/Sf3BzV77HfI/AAAAAAAAATc/flT5FsYW8_M/s400/dollar_minting.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5331630621682507250" /&gt;&lt;/a&gt;&lt;br /&gt;That's a dedicated $1 manufacturing machine. We could imagine that internally it stamps the appropriate design onto the blank and spits out the result.&lt;br /&gt;&lt;br /&gt;It'd be more interesting if we could make a machine with another input slot that allowed us to input the description of the coin. By providing different inputs we could mint a variety of different coins with one machine. I'm going to adopt the convention that when we want to input a description we input a picture of the result we want. I'll draw pictures as rectangles with the subject inside them. Here's a general purpose machine manufacturing pound coins:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_UdKHLrHa05M/Sf3EiSy7iNI/AAAAAAAAATk/GX6t98LFgPY/s1600-h/general_minting.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 364px; height: 261px;" src="http://4.bp.blogspot.com/_UdKHLrHa05M/Sf3EiSy7iNI/AAAAAAAAATk/GX6t98LFgPY/s400/general_minting.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5331633627316586706" /&gt;&lt;/a&gt;&lt;br /&gt;The same machine could make dollars, zlotys or yen. You could imagine this machine works by taking the description and then milling the coin &lt;a href="http://en.wikipedia.org/wiki/CNC"&gt;CNC style&lt;/a&gt;. We call such a machine an &lt;a href="http://en.wikipedia.org/wiki/Interpreter_(computing)"&gt;interpreter&lt;/a&gt;. It interprets the instructions and produces its result.&lt;br /&gt;&lt;br /&gt;The interpreter has a great advantage over the dedicated dollar mint. You make make any kind of coin. But it's going to run a lot slower. The dedicated minter can just stamp a coin in one go. The interpreter can't do this because every input might be different. It has to custom mill each coin individually. Is there a way to get the benefits of both types of machine? We could do this: take the coin description and instead of milling the coin directly we mill negative reliefs for both sides of the coin. We then build a new dedicated minting machine that uses these negatives to stamp out the coin. In other words we could make a machine that takes as input a coin description and outputs a dedicated machine to make that type of coin. This kind of machine making machine is called a &lt;a href="http://en.wikipedia.org/wiki/Compiler"&gt;compiler&lt;/a&gt;. It takes a set of instructions, but instead of executing them one by one, it makes a dedicated machine to perform them. Here's one in action:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_UdKHLrHa05M/Sf3IsNROlyI/AAAAAAAAATs/Jn1cn9LrF1w/s1600-h/compiling.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 327px;" src="http://2.bp.blogspot.com/_UdKHLrHa05M/Sf3IsNROlyI/AAAAAAAAATs/Jn1cn9LrF1w/s400/compiling.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5331638195678254882" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;So here are the two important concepts so far:&lt;br /&gt;&lt;br /&gt;&lt;em&gt;Interpreters&lt;/em&gt;: these take descriptions or instructions and use them to make the thing described.&lt;br /&gt;&lt;em&gt;Compilers&lt;/em&gt;: these take descriptions or instructions and use them to make a machine dedicated to making the thing described. The process of making such a machine from a set of instructions is known as compiling.&lt;br /&gt;&lt;br /&gt;The Projections of Doctor Futamura help make clear the relationship between these kinds of things.&lt;br /&gt;&lt;br /&gt;&lt;H3&gt;Specialisation&lt;/H3&gt;&lt;br /&gt;We need one more important concept: the specialiser. Suppose we have a machine that has two inputs slots, A and B. But now suppose that when we use the machine we find that we vary the kind of thing we put into slot B, but always end up putting the same thing into slot A. If we know that slot A will always get the same input then we could streamline the machine using our knowledge of the properties of A. This is similar to the minting situation - if we know we're always going to make $1 coins then we can dedicate our machine to that purpose. In fact, if we know that we're always going to input the same thing into slot A we don't even need slot A any more. We could just stick an A inside the machine and whenever the user inputs something to slot B, the machine would then replicate the A and then use it just as if it had been input.&lt;br /&gt;&lt;br /&gt;In summary, given a machine with two slots A and B, and given some input suitable for slot A, we could redesign it as a machine with just a B slot that automatically, internally self-feeds the chosen item to A. But we can often do better than this. We don't need to self-feed stuff to slot A. We might be able to redesign the way the machine works based on the assumption that we always get the same stuff going into slot A. For example, in the minting example a dedicate $1 minter was more specialised than just a general purpose minter that interpreted the instructions for making a $1 coin. This process of customising a machine for a particular input to slot A is called specialisation or &lt;a href="http://en.wikipedia.org/wiki/Partial_evaluation"&gt;partial evaluation&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Now imagine we have a machine for automatically specialising designs for machines. It might have two slots: one for inputting a description for a two slot machine with slots A and B, and one for inputting stuff suitable for slot A. It would then print out a description for a customised machine with just a slot B. We could call it a specialisation machine. Here is one at work:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_UdKHLrHa05M/Sf3QNaRFfOI/AAAAAAAAAT0/ghjfjtM42q8/s1600-h/specialisation.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 332px;" src="http://2.bp.blogspot.com/_UdKHLrHa05M/Sf3QNaRFfOI/AAAAAAAAAT0/ghjfjtM42q8/s400/specialisation.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5331646462684396770" /&gt;&lt;/a&gt;&lt;br /&gt;It's converting a description of a two input machine into a description of a one input machine.&lt;br /&gt;&lt;br /&gt;&lt;H3&gt;The First Projection&lt;/H3&gt;&lt;br /&gt;The process of specialisation is similar to what I was talking about with dedicated minting machines. Rather than just have a similarity we can completely formalise this. Note that the interpreter above takes two inputs. So the design for an interpreter could be fed into the first input of a specialiser. Now we feed a description the coin we want into slot B. The specialiser whirrs away and eventually outputs a description of a machine that is an interpreter that is dedicated to making that one particular coin. The result will describe a machine with only one input suitable for blanks. In other words, we can use a specialiser as a compiler. This is the first of Doctor Futamura's Projections. Here's a picture of the process at work:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_UdKHLrHa05M/Sf3UEyLB5zI/AAAAAAAAAT8/rEQ6HMe0xDo/s1600-h/projection1.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 332px;" src="http://4.bp.blogspot.com/_UdKHLrHa05M/Sf3UEyLB5zI/AAAAAAAAAT8/rEQ6HMe0xDo/s400/projection1.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5331650712529135410" /&gt;&lt;/a&gt;&lt;br /&gt;What this shows is that you don't need to make compilers. You can make specialisers instead. This is actually a very practical thing to do in the computing world. For example there are &lt;a href="http://www.rapidmind.net/"&gt;commercial products&lt;/a&gt; (I'm not connected with that product in any way) that can specialise code to run on a specific architecture like &lt;a href="http://en.wikipedia.org/wiki/CUDA"&gt;CUDA&lt;/a&gt;. It's entirely practical to convert an interpreter to a compiler with such a tool. By writing a specialiser, the purveyors of such tools allow third parties to develop their own compilers and so this is more useful than just writing a dedicated compiler.&lt;br /&gt;&lt;br /&gt;&lt;H3&gt;The Second Projection&lt;/h3&gt;&lt;br /&gt;Time to kick it up a notch. The first input to the specialiser is a description of a two input machine. But the specialiser is itself a two input machine. Are you thinking what I'm thinking yet? We could stuff a description of a specialiser into the specialiser's own first input! In the first projection we provided an interpreter as input to the specialiser. If we know we're always going to want to use the same interpreter then we could streamline the specialiser to work specifically with this input. So we can specialise the specialiser to work with our interpreter like this:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_UdKHLrHa05M/Sf3e7h4AtjI/AAAAAAAAAUE/inyy38iprQs/s1600-h/projection2a.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 352px;" src="http://2.bp.blogspot.com/_UdKHLrHa05M/Sf3e7h4AtjI/AAAAAAAAAUE/inyy38iprQs/s400/projection2a.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5331662648163481138" /&gt;&lt;/a&gt;&lt;br /&gt;But what is that machine whose description it has output? An interpreter takes as input a description of how to operate on some stuff, like turning blanks into coins. In effect, the output machine has the interpreter built into it. So it takes descriptions and outputs a machine for performing those instructions. In other words it's a compiler. If the specialiser is any good then the compiler will be good too. It won't just hide an interpreter in a box and feed it your description, it will make dedicated parts to ensure your compiler produces a fast dedicated machine. And that is Doctor Futamura's Second Projection.&lt;br /&gt;&lt;br /&gt;&lt;H3&gt;The Third Projection&lt;/H3&gt;&lt;br /&gt;But we can go further. The specialiser can accept a description of a specialiser as its first input. That means we can specialise it specifically for this input. And to do that, we use a specialiser. In other words we can feed a descrption of a specialiser into &lt;em&gt;both&lt;/em&gt; inputs of the specialiser! Here we go:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_UdKHLrHa05M/Sf3iAXhRJxI/AAAAAAAAAUM/uv4uRQXeWAk/s1600-h/projection3.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 358px;" src="http://2.bp.blogspot.com/_UdKHLrHa05M/Sf3iAXhRJxI/AAAAAAAAAUM/uv4uRQXeWAk/s400/projection3.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5331666029817964306" /&gt;&lt;/a&gt;&lt;br /&gt;But what is the X machine that it outputs? In the second projection we pass in an interpreter as the second argument and get back a compiler. So the third projection gives us a dedicated machine for this task. The X machine accepts the description of an interpreter as input and outputs the description of a compiler. So the X machine is a dedicated interpreter-to-compiler converter. And that is the Third Projection of Doctor Futamura.&lt;br /&gt;&lt;br /&gt;If we have a specialiser we never need to make a compiler again. We need only design interpreters that we can automatically convert to compilers. In general it's easier to write interpreters than compilers and so in principle this makes life easier for programmers. It also allows us to compartmentalise the building of compilers. We can separate the interpreter bit from the bit that fashions specific parts for a task. The specialiser does the latter so our would-be compiler writer can concentrate on the former. But who would have guessed that passing a specialiser to itself &lt;em&gt;twice&lt;/em&gt; would give us something so useful?&lt;br /&gt;&lt;H3&gt;Summary&lt;/H3&gt;&lt;br /&gt;So here are the projections:&lt;br /&gt;&lt;OL&gt;&lt;br /&gt;&lt;LI&gt;Compiling specific programs to dedicated machines.&lt;br /&gt;&lt;LI&gt;Making a dedicated machine for compilation from an interpreter.&lt;br /&gt;&lt;LI&gt;Making a machine dedicated to the task of converting interpreters into compilers.&lt;br /&gt;&lt;/OL&gt;&lt;br /&gt;There are lots of variations we can play with. I've just talked about descriptions of things without saying much about what those descriptions look like. In practice there are lots of different 'languages' we can use to express our descriptions. So variations on these projections can generate descriptions in different languages, possibly converting between them. We might also have lots of different specialisers that are themselves optimised for specific types of specialisation. The Futamura projections give interesting ways to combine these. And there are also variations for generating dedicating machines for other tasks related to compiling - like parsing the descriptions we might feed in as input.&lt;br /&gt;&lt;br /&gt;If you want to read more on this subject there's a &lt;a href="http://www.itu.dk/people/sestoft/pebook/"&gt;whole book&lt;/a&gt; online with example code. They're not easy things to design.&lt;br /&gt;&lt;br /&gt;I think that specialisation is a killer feature that I'd like to see more of. Present day compilers (and here I'm talking about computers, not machines in general) are hard-coded black boxes for the task of compilation. They're not very good at allowing you to get in there and tweak the way compilation occurs - for example if you want to generate code according to a strategy you know. Specialisation is a nice alternative to simply bolting an API onto a compiler. It would make it easy for anyone to write optimising and optimised compilers for their own languages and combine such compilers with interpreters for interactive instead of offline compilation.&lt;br /&gt;&lt;br /&gt;I learnt about this stuff, as well as lots of other stuff in my blog, from the excellent &lt;a href="http://books.google.com/books?id=Xut5JAAACAAJ"&gt;Vicious Circles&lt;/a&gt;. The theory is closely related to the theory of writing quines that I used for my &lt;a href="http://blog.sigfpe.com/2008/02/third-order-quine-in-three-languages.html"&gt;three language quine&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;And if you keep your ears to the ground you can hear rumours of a fabled &lt;a href="http://portal.acm.org/citation.cfm?id=1480954"&gt;fourth projection&lt;/a&gt;...&lt;br /&gt;&lt;HR&gt;&lt;br /&gt;&lt;iframe src="http://rcm.amazon.com/e/cm?t=sigfpe-20&amp;o=1&amp;p=8&amp;l=as1&amp;asins=1575860082&amp;fc1=000000&amp;IS2=1&amp;lt1=_blank&amp;m=amazon&amp;lc1=0000FF&amp;bc1=000000&amp;bg1=FFFFFF&amp;f=ifr" style="width:120px;height:240px;" scrolling="no" marginwidth="0" marginheight="0" frameborder="0"&gt;&lt;/iframe&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-3930070047200672636?l=blog.sigfpe.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sigfpe.com/feeds/3930070047200672636/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=11295132&amp;postID=3930070047200672636' title='21 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/3930070047200672636'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/3930070047200672636'/><link rel='alternate' type='text/html' href='http://blog.sigfpe.com/2009/05/three-projections-of-doctor-futamura.html' title='The Three Projections of Doctor Futamura'/><author><name>sigfpe</name><uri>http://www.blogger.com/profile/08096190433222340957</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='09401818062305273147'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_UdKHLrHa05M/Sf3BzV77HfI/AAAAAAAAATc/flT5FsYW8_M/s72-c/dollar_minting.png' height='72' width='72'/><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>21</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11295132.post-5316387978274908480</id><published>2009-04-04T09:38:00.000-07:00</published><updated>2009-04-04T16:06:08.789-07:00</updated><title type='text'>Faster than a speeding photon</title><content type='html'>&lt;h3&gt;How to outrun a photon&lt;/H3&gt;&lt;br /&gt;I thought it would be fun to try to give a readable account of Unruh effect. It's a surprising phenomenon, and there isn't universal agreement over what exactly the theory predicts, let alone whether the effect has ever been observed. It has important implications for physics and philosophy and may even give a way to test some aspects of quantum gravity in the lab.&lt;br /&gt;&lt;br /&gt;One way to start the story is consideration of this problem: if a photon is speeding towards you, can you outrun it? Let's simplify things a bit so that we're considering motion in one dimension.&lt;br /&gt;&lt;br /&gt;If we're confined to one dimension, we can't dodge the photon, we can only hope to remain ahead of it. As the only things that can travel at the speed of light, c, are massless things like photons, it seems that there is no hope for a massive thing like a person in a spaceship to avoid it. The photon will always be faster than you, and so it'll catch you.&lt;br /&gt;&lt;br /&gt;But in theory you can outrun a photon! Do you see the flaw in the above reasoning that made it seem impossible?&lt;br /&gt;&lt;br /&gt;The best way to make things clear is to draw a diagram. We'll plot some graphs of position vs. time for some photons and spaceships. We'll have time going up the vertical axis and position along the horizontal axis. Here's an example:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_UdKHLrHa05M/SdeNS5RaPYI/AAAAAAAAASs/3XsgY3097Zk/s1600-h/constant_velocity.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 400px;" src="http://3.bp.blogspot.com/_UdKHLrHa05M/SdeNS5RaPYI/AAAAAAAAASs/3XsgY3097Zk/s400/constant_velocity.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5320876840512535938" /&gt;&lt;/a&gt;&lt;br /&gt;I've chosen units so that one second on the vertical axis is drawn the same size as one light-second on the horizontal axis. The net result is that photons always travel at 45 degree angles to the axes. Massive objects, that travel slower than light, are confined to travel on courses that have angles of smaller than 45 degrees with respect to the vertical axis. The path of the photon is the diagonal black line and the path of a spaceship is in red. It starts to the right of the photon but as we move up the time axis the photon eventually catches up with it.&lt;br /&gt;&lt;br /&gt;If the spaceship travels faster then it will follow an angle closer to 45 degrees. Here are a pair of paths corresponding to faster spaceships:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_UdKHLrHa05M/SdeOCbjzPXI/AAAAAAAAAS0/vAv2BpV1ZU4/s1600-h/constant_velocities.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 400px;" src="http://2.bp.blogspot.com/_UdKHLrHa05M/SdeOCbjzPXI/AAAAAAAAAS0/vAv2BpV1ZU4/s400/constant_velocities.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5320877657170328946" /&gt;&lt;/a&gt;&lt;br /&gt;The faster the ship is, the further it gets before the photon catches up. But we're just putting off the inevitable. It seems that whatever we do, the photon will always catch up.&lt;br /&gt;&lt;br /&gt;But there's a hidden assumption in the above. By drawing straight lines for the spaceship I was assuming it was travelling at a constant velocity. But there's no reason for that to be true. Here's a different path the spaceship could follow:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_UdKHLrHa05M/SdeOrfKoy1I/AAAAAAAAAS8/9HrWx5ddQzM/s1600-h/accelerating.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 400px;" src="http://3.bp.blogspot.com/_UdKHLrHa05M/SdeOrfKoy1I/AAAAAAAAAS8/9HrWx5ddQzM/s400/accelerating.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5320878362513165138" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;At no point does the red path of the spaceship meet the black path of the photon. And yet at no point does the red path reach 45 degrees to the vertical axis. In other words, the spaceship never travels at the speed of light, and yet the photon never catches up with it. Spaceships can outrun photons!&lt;br /&gt;&lt;br /&gt;So what kind of path is that? It's actually a hyperbola and it corresponds to a spaceship accelerating at a constant rate. You might wonder how it can be constant acceleration when the speed of the spaceship never exceeds that of light. From an external observer's point of view, after a while it does look like the ship is travelling at a more or less constant velocity close to the speed of light. But from the point of view of someone on the spaceship it feels exactly like constant acceleration. So that is the path that would be taken by a spaceship with its thrusters firing at a constant rate.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;An event horizon!&lt;/h3&gt;&lt;br /&gt;I chose that path so that the spaceship stays just in front of the photon. A photon that starts slightly to the right will eventually catch up with the ship. But photons starting further to the left of the ship will never reach it. This means that absolutely nothing starting to the left of the black photon path can ever catch the ship. That should sound familiar. It's exactly like a black hole. From the point of view of someone on the ship, the diagonal black line is exactly like the &lt;a href="http://en.wikipedia.org/wiki/Event_horizon"&gt;event horizon&lt;/a&gt; of a black hole. Nothing to the left of it can ever be seen by observers on the ship.&lt;br /&gt;&lt;br /&gt;What does it look like if an observer in the ship watches something that crosses the event horizon? Here's another diagram:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_UdKHLrHa05M/SdeQwA36KZI/AAAAAAAAATE/ndoXPnKNDEE/s1600-h/falling_over_horizon.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 400px;" src="http://4.bp.blogspot.com/_UdKHLrHa05M/SdeQwA36KZI/AAAAAAAAATE/ndoXPnKNDEE/s400/falling_over_horizon.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5320880639304149394" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Again, the black diagonal line is the path of a photon, which we know is a bit like an event horizon. The blue line is the path of an object at rest. In effect, it's falling over our &lt;a href="http://en.wikipedia.org/wiki/Event_horizon#Event_horizon_of_an_accelerated_particle"&gt;apparent event horizon&lt;/a&gt;. Of course the blue object doesn't see any unusual phenomenon on approaching the event horizon because there's nothing really there - it's only something seen by observers in the ship. The blue object emits a series of photons (shown in green) at equal intervals. As long as these photons are emitted before the event horizon they eventually catch up with the red spaceship. But notice how they arrive at more and more widely spaced intervals. A photon released exactly at the event horizon never reaches the ship. So the viewers on the ship see the spacing between the photons get longer and longer. They'll never see the blue object cross the event horizon they'll just see it getting closer and closer until eventually it appears to freeze. Again, this is just like a black hole event horizon.&lt;br /&gt;&lt;br /&gt;The universe looks pretty weird from the &lt;a href="http://en.wikipedia.org/wiki/Rindler_coordinates"&gt;point of view&lt;/a&gt; of a constantly accelerating observer. Half of it is simply missing behind an event horizon. But that's just the start of the weirdness. When we throw Quantum Mechanics into the mix something much weirder happens.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Matter from a vacuum&lt;/h3&gt;&lt;br /&gt;It's well known that physicists expect black holes to emit particles as &lt;a href="http://en.wikipedia.org/wiki/Hawking_radiation"&gt;Hawking radiation&lt;/a&gt;. But our accelerating observer sees something like a black hole, so we might expect them to see something like Hawking radiation. We also know that an observer at rest sees no event horizon. Which means that we might predict that accelerating observers see particles that observers at rest don't. Can we take such a prediction seriously?&lt;br /&gt;&lt;br /&gt;Let's look a bit more closely at this. According to a popular view of quantum mechanics, the vacuum is teeming with vacuum fluctuations - ephemeral particle-antiparticle pairs that briefly come into existence and then annihilate each other. In the diagram below I've drawn some of these events:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_UdKHLrHa05M/SdelVVHlgZI/AAAAAAAAATM/SGzr_D7441w/s1600-h/pair_production.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 400px;" src="http://4.bp.blogspot.com/_UdKHLrHa05M/SdelVVHlgZI/AAAAAAAAATM/SGzr_D7441w/s400/pair_production.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5320903270626328978" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;As we follow up the time axis, pairs of (complementary coloured) particles come into existence and then annihilate each other. These events are so fleeting that they have no effect on our particle detectors and we see a vacuum. But note that I've drawn one of these events straddling our apparent event horizon. From the point of view of an accelerating observer this looks like a pair of particles coming into existence, but because of the argument I sketched above, they seem to freeze near the event horizon. In other words, to an accelerating observer these fleeting events are no longer fleeting, they look like real particles coming into existence and sticking around forever. Accelerating observers appear to see particles in a vacuum!&lt;br /&gt;&lt;br /&gt;What I've described above is absolutely &lt;em&gt;not&lt;/em&gt; a rigourous argument. But amazingly, when you use the machinery of quantum field theory you end up making exactly the same prediction: accelerating observers see particles. This is known as the &lt;a href="http://en.wikipedia.org/wiki/Unruh_effect"&gt;Unruh effect&lt;/a&gt;. When you do this properly you can compute a bit more detail. It turns out that the energies of the particles are random with exactly the same distribution as &lt;a href="http://en.wikipedia.org/wiki/Black_body"&gt;black body radiation&lt;/a&gt;. In other words, the vacuum looks like it has a glow corresponding to a particular temperature that is proportional to the acceleration. But it's not a bright glow. You need to accelerate at about 10&lt;sup&gt;20&lt;/sup&gt; m/s&lt;sup&gt;2&lt;/sup&gt; before the temperature appears to be 1K. Building a thermometer that can survive such accelerations is no mean feat. So it looks like the Unruh effect is a curiosity that might never be observed in the lab.&lt;br /&gt;&lt;br /&gt;But it has been suggested that the Unruh effect has already been observed. There aren't many things that can survive that kind of acceleration, but an electron can, and an electron can behave like a thermometer. Electrons in circular particle accelerators routinely undergo the kinds of accelerations we're talking about. They do so because they are driven by a magnetic field. Now an electron has spin, so you can think of it as a bit like a little electric current running round in a loop. That means an electron is like a little dipole electromagnet. Magnets in magnetic fields tend to want to line up along the field - that's how a compass works. So electrons that spend long enough in a particle accelerator, eg. those in a storage ring, should eventually line up with the field. Lining up like this is known as polarisation, and in this particular case it's known as he &lt;a href="http://en.wikipedia.org/wiki/Sokolov-Ternov_effect"&gt;Sokolov Ternov&lt;/a&gt; effect. But when we look at electrons in a storage ring it turns out they're not quite completely lined up, they're slightly depolarised. This is easily explained by Unruh radiation - they're constantly accelerating and so they feel themselves to be in a hot environment. The continual interaction with this hot environment causes the electron spins to be a bit randomised, so they don't all line up nicely.&lt;br /&gt;&lt;br /&gt;Unfortunately this isn't definitive evidence for Unruh radiation because when we carry out the full calculation of the Sokolov-Ternov effect it turns out that it predicts partial depolarisation anyway. Now it looks like we don't have evidence for the Unruh effect. But it's not that simple. The Unruh effect isn't a new effect made up by a physicist. It's a prediction based on a new way of looking at fairly conventional physics. The Sokolov-Ternov effect is also predicted from standard physics, just in a different frame of reference. So maybe the partial depolarisation predicted by this effect is in fact the very same thing as the Unruh effect, just looked at from a different point of view.&lt;br /&gt;&lt;br /&gt;What does this mean philosophically? We're used to the idea that looking at things from different angles changes how they look. Einstein extended this notion to spacetime so space and time seem different to moving observers. The Unruh effect goes one step further. Whether or not an individual particle exists depends on your point of view. Do particles not have any kind of existence independently of how we look at them? And how does it look to an observer at rest watching an accelerating observer fly by with a thermometer. Do they see a thermometer apparently responding to nothing? Or do they also see the particles once they've interacted with a thermometer? It's all so weird that some physicists take the view that the notion of the particle is outdated and we should only be talking about quantum mechanical wavefunctions.&lt;br /&gt;&lt;br /&gt;There's another reason why the Unruh effect is important. The Unruh effect doesn't involve General Relativity, which is all about curved spacetime. But it does use the same mathematical machinery so it gives a way to test out that mathematics. So even if we don't have laboratory black holes to play with, we may still be able to investigate the mathematical framework that predicts phenomena like the Hawking effect. But also note that at least one paper claims it's all a mathematical error and there is no Unruh effect in reality.&lt;br /&gt;&lt;br /&gt;In recent years there has been a dramatic increase in the number of papers on the Unruh effect. I expect this trend is going to keep going for a while&lt;br /&gt;&lt;br /&gt;&lt;H3&gt;References&lt;/H3&gt;&lt;br /&gt;&lt;OL&gt;&lt;br /&gt;&lt;LI&gt;I learnt about the Unruh effect from Wald's book &lt;a href="http://books.google.com/books?id=Iud7eyDxT1AC"&gt;Quantum field theory in curved spacetime and black hole thermodynamics&lt;/a&gt;.&lt;br /&gt;&lt;LI&gt;The diagram showing particle creation/annihilation events straddling the event horizon came from Susskind and Lindesay's &lt;a href="http://books.google.com/books?id=cxJCBRUNmVYC"&gt;An Introduction to Black Holes, Information and the String Theory Revolution&lt;/a&gt;.&lt;br /&gt;&lt;LI&gt;I tried to catch up with recent developments by reading some of the paper &lt;a href="http://arxiv.org/abs/0710.5373"&gt;The Unruh effect and its applications&lt;/a&gt;.&lt;br /&gt;&lt;/OL&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-5316387978274908480?l=blog.sigfpe.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sigfpe.com/feeds/5316387978274908480/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=11295132&amp;postID=5316387978274908480' title='26 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/5316387978274908480'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/5316387978274908480'/><link rel='alternate' type='text/html' href='http://blog.sigfpe.com/2009/04/faster-than-speeding-photon.html' title='Faster than a speeding photon'/><author><name>sigfpe</name><uri>http://www.blogger.com/profile/08096190433222340957</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='09401818062305273147'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_UdKHLrHa05M/SdeNS5RaPYI/AAAAAAAAASs/3XsgY3097Zk/s72-c/constant_velocity.png' height='72' width='72'/><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>26</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11295132.post-6987168894740966011</id><published>2009-03-07T13:13:00.000-08:00</published><updated>2009-03-07T15:41:49.182-08:00</updated><title type='text'>Dinatural Transformations and Coends</title><content type='html'>Abstract nonsense warning: my goal here is simply to understand what the definition of a &lt;a href="http://en.wikipedia.org/wiki/End_%28category_theory%29"&gt;coend&lt;/a&gt; means in the context of Haskell.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; {-# LANGUAGE Rank2Types,ExistentialQuantification #-}&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Pick a type. Let's call it &lt;tt&gt;X&lt;/tt&gt;. Now consider the type&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;forall a . (a -&gt; X,a) -&gt; X&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;What can we say about elements of this type? There's an obvious example of such a function, &lt;tt&gt;eval&lt;/tt&gt; that applies the first element of the pair to the second.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; eval (f,x) = f x&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;It's also easy to construct others. What property do they all share? Well there's a nice way to answer this question. We can use Janis Voigtl&amp;auml;nder's &lt;a href="http://homepages.inf.ed.ac.uk/wadler/topics/parametricity.html"&gt;Free Theorem&lt;/a&gt; &lt;a href="http://linux.tcs.inf.tu-dresden.de/~voigt/ft/"&gt;Generator&lt;/a&gt;. After a bit of simplification it tells us that if &lt;tt&gt;f&lt;/tt&gt; is in&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;forall a . (a -&gt; X,a) -&gt; X&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;then for any compatible &lt;tt&gt;g&lt;/tt&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;f (x,g y) = f (x . g,y)&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;We can apply any function we like to the second argument and we get exactly the same result by pre-composing with the first argument.&lt;br /&gt;&lt;br /&gt;This is a pretty strong property - it puts a big constraint on what &lt;tt&gt;f&lt;/tt&gt; can do. It holds because any function of type &lt;tt&gt;forall a . (a -&amp;gt; X,a) -&amp;gt; X&lt;/tt&gt; maps to a type &lt;tt&gt;X&lt;/tt&gt; that makes no reference to the quantified &lt;tt&gt;a&lt;/tt&gt;. It can't let any information about the type &lt;tt&gt;a&lt;/tt&gt; escape. And this means that &lt;tt&gt;f&lt;/tt&gt; has to somehow eliminate an element of &lt;tt&gt;a&lt;/tt&gt; and and element of &lt;tt&gt;a-&amp;gt;X&lt;/tt&gt;. There's only one non-trivial way to do that: provide the former to the latter as a function argument. So &lt;tt&gt;f&lt;/tt&gt; must factor as &lt;tt&gt;h . eval&lt;/tt&gt; for some function &lt;tt&gt;h&lt;/tt&gt;.  The free theorem comes from the fact that &lt;tt&gt;eval (x,g y) = eval (x.g,y)&lt;/tt&gt;. Or, &lt;tt&gt;f&lt;/tt&gt; could be the constant function, but that still factors in the same way.&lt;br /&gt;&lt;br /&gt;Let's try something slightly more complex. Consider the type&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;forall a . (a -&gt; X,[a]) -&gt; X&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Again we can ask the free theorem generator to give us a property. We're told that if &lt;tt&gt;f&lt;/tt&gt; is of this type, then for any compatible &lt;tt&gt;g&lt;/tt&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;f(x,map g y) = f(x . g,y)&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;The crucial point this time is that in order to eliminate the &lt;tt&gt;[a]&lt;/tt&gt; we have to use &lt;tt&gt;fmap&lt;/tt&gt; to apply the function of type &lt;tt&gt;a -&amp;gt; X&lt;/tt&gt;. So any function of this type must factor through &lt;tt&gt;fmap&lt;/tt&gt;, and that's why the free theorem follows.&lt;br /&gt;&lt;br /&gt;Now it's time to put this in a more general framework. We can write both of the above examples as elements of type &lt;tt&gt;forall a . S a a -&amp;gt; X&lt;/tt&gt; where &lt;tt&gt;S&lt;/tt&gt; is some type constructor. In both cases we also have that &lt;tt&gt;S a&lt;/tt&gt; is a functor. But it's also a cofunctor in its first argument. Intuitively this says that &lt;tt&gt;S a b&lt;/tt&gt; contains or produces &lt;tt&gt;b&lt;/tt&gt;'s but consumes &lt;tt&gt;a&lt;/tt&gt;'s. Here's a class to express this:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; class Difunctor h where&lt;br /&gt;&amp;gt;   lmap :: (b -&amp;gt; a) -&amp;gt; h a c -&amp;gt; h b c&lt;br /&gt;&amp;gt;   rmap :: (c -&amp;gt; d) -&amp;gt; h a c -&amp;gt; h a d&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;(I made up the word Difunctor.)&lt;br /&gt;&lt;br /&gt;For (co)functoriality we insist on the laws&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;lmap (f . g) = lmap g . lmap f&lt;br /&gt;rmap (f . g) = rmap f . rmap g&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;We can make both of the examples above instances:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; data Ex1 x a b = Ex1 (a -&amp;gt; x) b&lt;br /&gt;&lt;br /&gt;&amp;gt; instance Difunctor (Ex1 x) where&lt;br /&gt;&amp;gt;   lmap f (Ex1 g x) = Ex1 (g . f) x&lt;br /&gt;&amp;gt;   rmap f (Ex1 g x) = Ex1 g (f x)&lt;br /&gt;&lt;br /&gt;&amp;gt; data Ex2 x a b = Ex2 (a -&amp;gt; x) [b]&lt;br /&gt;&lt;br /&gt;&amp;gt; instance Difunctor (Ex2 x) where&lt;br /&gt;&amp;gt;   lmap f (Ex2 g x) = Ex2 (g . f) x&lt;br /&gt;&amp;gt;   rmap f (Ex2 g x) = Ex2 g (map f x)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Our free theorems were essentially about this type:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; type DiNatural s x = forall a . (s a a -&amp;gt; x)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;and in both cases the free theorem could be written as&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;f . lmap g == f . rmap g&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;This is a highly non-trivial result. Look at the preconditions for this theorem: they say something about each of the arguments to &lt;tt&gt;s&lt;/tt&gt; individually. And yet we deduce that the two arguments are in fact intimately connected. We can apply &lt;tt&gt;g&lt;/tt&gt; using either &lt;tt&gt;lmap&lt;/tt&gt; or &lt;tt&gt;rmap&lt;/tt&gt; and get the same result. This property is known as dinaturality. More precisely, if for some &lt;tt&gt;X&lt;/tt&gt;, &lt;tt&gt;f :: s a a -&amp;gt; X&lt;/tt&gt;, where &lt;tt&gt;s&lt;/tt&gt; is a difunctor, then &lt;tt&gt;f&lt;/tt&gt; is dinatural if for all compatible &lt;tt&gt;g&lt;/tt&gt;,&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;f . lmap g == f . rmap g&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;We have this theorem: if &lt;tt&gt;s&lt;/tt&gt; is an instance of the &lt;tt&gt;Difunctor&lt;/tt&gt; class (obeying its laws) then for any &lt;tt&gt;X&lt;/tt&gt;, elements of &lt;tt&gt;DiNatural s X&lt;/tt&gt; are dinatural.&lt;br /&gt;&lt;br /&gt;(Compare with the definition of naturality: &lt;tt&gt;f . fmap g = fmap g . f&lt;/tt&gt;.)&lt;br /&gt;&lt;br /&gt;I'm tempted to call this "Dinaturality for Free" but there's already a &lt;a href="ftp://ftp.disi.unige.it/pub/person/RosoliniG/papers/dinff.ps.gz"&gt;paper&lt;/a&gt; by that name and I don't know what it's about. And note that I only claim it's a theorem. I don't actually know how to prove this uniformly for all difunctors, but I'd stake a beer on it. At least as long as we don't do any weird haskell coding (so our free theorems are always valid) and as long as we restrict ourselves to functions that always terminate.&lt;br /&gt;&lt;br /&gt;In the examples I gave above the reason why this holds is related to the fact that any function of the given type can be factored as something following an evaluation type function. More generally, if &lt;tt&gt;s&lt;/tt&gt; is a dinatural then if there is a single &lt;tt&gt;Y&lt;/tt&gt;, and function &lt;tt&gt;i :: s a a -&amp;gt; Y&lt;/tt&gt;, such that every function &lt;tt&gt;f :: s a a -&amp;gt; X&lt;/tt&gt;, for any &lt;tt&gt;X&lt;/tt&gt;, can be factored as &lt;tt&gt;f = h . i&lt;/tt&gt;, then &lt;tt&gt;Y&lt;/tt&gt; is said to be the coend of &lt;tt&gt;s&lt;/tt&gt;.&lt;br /&gt;&lt;br /&gt;There are different approaches to computing a coend. Above I've used inspection. The coend of &lt;tt&gt;(a -&amp;gt; X,a)&lt;/tt&gt; is &lt;tt&gt;X&lt;/tt&gt;. But there's also a kind of cheating approach where we can use an existential type to get the type uniformly for all dinaturals:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; data Coend s = forall a . Coend (s a a)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;For an explanation of why that works, see Edward Kmett's post on &lt;a href="http://comonad.com/reader/2008/kan-extension-iii/"&gt;Kan Extensions&lt;/a&gt;. For this example, this means we expect the types &lt;tt&gt;t&lt;/tt&gt; and &lt;tt&gt;Coend (Ex1 t)&lt;/tt&gt; to be isomorphic. Here is the isomorphism and its inverse:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; iso :: Coend (Ex1 t) -&amp;gt; t&lt;br /&gt;&amp;gt; iso (Coend (Ex1 f x)) = f x&lt;br /&gt;&lt;br /&gt;&amp;gt; iso' :: t -&amp;gt; Coend (Ex1 t)&lt;br /&gt;&amp;gt; iso' x = Coend (Ex1 (const x) ())&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;I'll leave the proof that &lt;tt&gt;iso&lt;/tt&gt; and &lt;tt&gt;iso'&lt;/tt&gt; are mutual inverses to you.&lt;br /&gt;&lt;br /&gt;I hope that gives some idea of what a coend is. Informally it captures the method by which a dinatural transformation of type &lt;tt&gt;forall a . s a a -&amp;gt; X&lt;/tt&gt; is able to eliminate the quantified &lt;tt&gt;a&lt;/tt&gt;. If you look through the Haskell libraries you'll find many dinaturals (or at least things that can be made dinatural through the use of &lt;tt&gt;uncurry&lt;/tt&gt;).&lt;br /&gt;&lt;br /&gt;This code and description was inspired by the discussion at &lt;a href="http://ncatlab.org/nlab/show/dinatural+transformation"&gt;nLab&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-6987168894740966011?l=blog.sigfpe.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sigfpe.com/feeds/6987168894740966011/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=11295132&amp;postID=6987168894740966011' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/6987168894740966011'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/6987168894740966011'/><link rel='alternate' type='text/html' href='http://blog.sigfpe.com/2009/03/dinatural-transformations-and-coends.html' title='Dinatural Transformations and Coends'/><author><name>sigfpe</name><uri>http://www.blogger.com/profile/08096190433222340957</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='09401818062305273147'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11295132.post-5798722270156006834</id><published>2009-02-14T18:09:00.000-08:00</published><updated>2009-02-15T16:14:09.201-08:00</updated><title type='text'>Beyond Monads</title><content type='html'>The state monad gives an elegant way to thread state information through Haskell code. Unfortunately it has an annoying limitation: the state must have the same type throughout the monadic expression. In this post I want to look at how to fix this. Unfortunately, fixing &lt;tt&gt;State&lt;/tt&gt; means it's no longer a monad, but we'll discover a new abstraction that replaces monads. And then we can look at what else this abstraction is good for. The cool bit is that we have to write virtually no new code, and we'll even coax the compiler into doing the hard work of figuring out what the new abstraction should be.&lt;br /&gt;&lt;br /&gt;This is all based on an idea that has been invented by a bunch of people independently, although in slightly different forms. I'm being chiefly guided by the paper &lt;a href="http://lambda-the-ultimate.org/node/3210"&gt;Parameterized Notions of Computation&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;The problem with the state monad is that it is defined by&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;newtype State s a = State { runState :: s -&gt; (a, s) }&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;The state going into and out of one of these values is the same, &lt;tt&gt;s&lt;/tt&gt;. We can't vary the type of the state as we pass through our code. But that's really easy to fix, just define:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; import Prelude hiding (return,(&amp;gt;&amp;gt;=),(&amp;gt;&amp;gt;),(.),id,drop)&lt;br /&gt;&amp;gt; import Control.Category&lt;br /&gt;&lt;br /&gt;&amp;gt; newtype State s1 s2 a = State { runState :: s1 -&amp;gt; (a, s2) }&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;I can now just copy and paste the definitions (with name changes to avoid clashes) out of the ghc prelude source code&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; return' a = State $ \s -&amp;gt; (a, s)&lt;br /&gt;&amp;gt; m &amp;gt;&amp;gt;&amp;gt;= k  = State $ \s -&amp;gt; let&lt;br /&gt;&amp;gt;   (a, s') = runState m s&lt;br /&gt;&amp;gt;   in runState (k a) s'&lt;br /&gt;&lt;br /&gt;&amp;gt; get   = State $ \s -&amp;gt; (s, s)&lt;br /&gt;&amp;gt; put s = State $ \_ -&amp;gt; ((), s)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;We don't have to change a thing! The old code exactly matches the new type. We can now write code using the new &lt;tt&gt;State&lt;/tt&gt;:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; test1 = return' 1 &amp;gt;&amp;gt;&amp;gt;= \x -&amp;gt;&lt;br /&gt;&amp;gt;           return' 2 &amp;gt;&amp;gt;&amp;gt;= \y -&amp;gt;&lt;br /&gt;&amp;gt;           get &amp;gt;&amp;gt;&amp;gt;= \z -&amp;gt;&lt;br /&gt;&amp;gt;           put (x+y*z) &amp;gt;&amp;gt;&amp;gt;= \_ -&amp;gt;&lt;br /&gt;&amp;gt;           return' z           &lt;br /&gt;&amp;gt; go1 = runState test1 10&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;But we're now also able to write code like:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; test2 = return' 1 &amp;gt;&amp;gt;&amp;gt;= \x -&amp;gt;&lt;br /&gt;&amp;gt;           return' 2 &amp;gt;&amp;gt;&amp;gt;= \y -&amp;gt;&lt;br /&gt;&amp;gt;           get &amp;gt;&amp;gt;&amp;gt;= \z -&amp;gt;&lt;br /&gt;&amp;gt;           put (show (x+y*z)) &amp;gt;&amp;gt;&amp;gt;= \_ -&amp;gt;&lt;br /&gt;&amp;gt;           return' z           &lt;br /&gt;&amp;gt; go2 = runState test2 10&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;The state starts of as an &lt;tt&gt;Integer&lt;/tt&gt; but ends up as a &lt;tt&gt;String&lt;/tt&gt;.&lt;br /&gt;&lt;br /&gt;Problem solved! Except that this definition of &lt;tt&gt;State&lt;/tt&gt; doesn't give us a monad and so we lose the benefits of having an interface shared by many monads. Is there a new more appropriate abstraction we can use? Rather than scratch our heads over it, we can just ask ghci to tell us what's going on.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;*Main&gt; :t return'&lt;br /&gt;return' :: a -&gt; State s1 s1 a&lt;br /&gt;*Main&gt; :t (&gt;&gt;&gt;=)&lt;br /&gt;(&gt;&gt;&gt;=) :: State s1 s11 t -&gt; (t -&gt; State s11 s2 a) -&gt; State s1 s2 a&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;This immediately suggests a new abstraction:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; class ParameterisedMonad m where&lt;br /&gt;&amp;gt;   return :: a -&amp;gt; m s s a&lt;br /&gt;&amp;gt;   (&amp;gt;&amp;gt;=) :: m s1 s2 t -&amp;gt; (t -&amp;gt; m s2 s3 a) -&amp;gt; m s1 s3 a&lt;br /&gt;&lt;br /&gt;&amp;gt; x &amp;gt;&amp;gt; f = x &amp;gt;&amp;gt;= \_ -&amp;gt; f&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;It's a lot like the usual &lt;tt&gt;Monad&lt;/tt&gt; class except that we're now parameterising uses of this class with a pair of types. Our new &lt;tt&gt;&amp;gt;&amp;gt;=&lt;/tt&gt; operator also has a compatibility condition on it. We can think of an element of &lt;tt&gt;m s1 s2&lt;/tt&gt; as having a 'tail' and 'head' living in &lt;tt&gt;s1&lt;/tt&gt; and &lt;tt&gt;s2&lt;/tt&gt; respectively. In order to use &lt;tt&gt;&amp;gt;&amp;gt;=&lt;/tt&gt; we require the head of the first argument to match the tail given by the second argument.&lt;br /&gt;&lt;br /&gt;Anyway, we have:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; instance ParameterisedMonad State where&lt;br /&gt;&amp;gt;   return = return'&lt;br /&gt;&amp;gt;   (&amp;gt;&amp;gt;=) = (&amp;gt;&amp;gt;&amp;gt;=)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;We didn't really design this class, we just used what ghci told us. Will it turn out to be a useful abstraction?&lt;br /&gt;&lt;br /&gt;First a category theoretical aside: in &lt;a href="http://blog.sigfpe.com/2008/11/from-monoids-to-monads.html"&gt;this post&lt;/a&gt; I talked about how monads were really a kind of abstract monoid. Well &lt;tt&gt;ParameterisedMonad&lt;/tt&gt; is a kind of abstract category. If we were to implement &lt;tt&gt;join&lt;/tt&gt; for this class it would play a role analogous to composition of arrows in a category. In a monoid you can multiply any old elements together to get a new element. In a category, you can't multiply two arrows together unless the tail of the second matches the head of the first.&lt;br /&gt;&lt;br /&gt;Now we can generalise the writer monad to a &lt;tt&gt;ParameterisedMonad&lt;/tt&gt;. But there's a twist: every monoid gives rise to a writer. This time we'll find that every category gives rise to a &lt;tt&gt;ParameterisedMonad&lt;/tt&gt;. Here's the definition. Again, it was lifted straight out of the source for the usual &lt;tt&gt;Writer&lt;/tt&gt; monad. The main change is replacing &lt;tt&gt;mempty&lt;/tt&gt; and &lt;tt&gt;mappend&lt;/tt&gt; with &lt;tt&gt;id&lt;/tt&gt; and &lt;tt&gt;flip (.)&lt;/tt&gt;.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; data Writer cat s1 s2 a = Writer { runWriter :: (a,cat s1 s2) }&lt;br /&gt;&amp;gt; instance (Category cat) =&amp;gt; ParameterisedMonad (Writer cat) where&lt;br /&gt;&amp;gt;   return a = Writer (a,id)&lt;br /&gt;&amp;gt;   m &amp;gt;&amp;gt;= k  = Writer $ let&lt;br /&gt;&amp;gt;      (a, w)  = runWriter m&lt;br /&gt;&amp;gt;      (b, w') = runWriter (k a)&lt;br /&gt;&amp;gt;      in (b, w' . w)&lt;br /&gt;&amp;gt; tell w = Writer ((),w)&lt;br /&gt;&amp;gt; execWriter m = snd (runWriter m)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;It's just like the usual &lt;tt&gt;Writer&lt;/tt&gt; monad except that the type of the 'written' data may change. I'll borrow an example (modified a bit) from the &lt;a href="http://lambda-the-ultimate.org/node/3210"&gt;paper&lt;/a&gt;. Define some type safe stack machine operations that are guaranteed not to blow your stack:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; push n x = (n,x)&lt;br /&gt;&amp;gt; drop (_,x) = x&lt;br /&gt;&amp;gt; dup (n,x) = (n,(n,x))&lt;br /&gt;&amp;gt; add (m,(n,x)) = (m+n,x)&lt;br /&gt;&amp;gt; swap (m,(n,x)) = (n,(m,x))&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;We can now 'write' the composition of a bunch of these operations as a 'side effect':&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; test3 = tell (push 1) &amp;gt;&amp;gt;&lt;br /&gt;&amp;gt;         tell (push 2) &amp;gt;&amp;gt;&lt;br /&gt;&amp;gt;         tell dup &amp;gt;&amp;gt;&lt;br /&gt;&amp;gt;         tell add &amp;gt;&amp;gt;&lt;br /&gt;&amp;gt;         tell swap &amp;gt;&amp;gt;&lt;br /&gt;&amp;gt;         tell drop&lt;br /&gt;&amp;gt; go3 = execWriter test3 ()&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;I guess there's one last thing I have to find. The &lt;a href="http://blog.sigfpe.com/2008/12/mother-of-all-monads.html"&gt;mother of all parameterised monads&lt;/a&gt;. Again, we lift code from the ghc libraries, this time from Control.Monad.Cont. I just tweak the definition ever so slightly. Normally when you hand a continuation to an element of the &lt;tt&gt;Cont&lt;/tt&gt; type it gives you back an element of the continuation's range. We allow the return of any type. This time the implementations of &lt;tt&gt;return&lt;/tt&gt; and &lt;tt&gt;(&amp;gt;&amp;gt;=)&lt;/tt&gt; remain completely unchanged:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; newtype Cont r1 r2 a = Cont { runCont :: (a -&amp;gt; r2) -&amp;gt; r1 }&lt;br /&gt;&amp;gt; instance ParameterisedMonad Cont where&lt;br /&gt;&amp;gt;   return a = Cont ($ a)&lt;br /&gt;&amp;gt;   m &amp;gt;&amp;gt;= k  = Cont $ \c -&amp;gt; runCont m $ \a -&amp;gt; runCont (k a) c&lt;br /&gt;&lt;br /&gt;&amp;gt; i x = Cont (\fred -&amp;gt; x &amp;gt;&amp;gt;= fred)&lt;br /&gt;&amp;gt; run m = runCont m return&lt;br /&gt;&lt;br /&gt;&amp;gt; test4 = run $ i (tell (push 1)) &amp;gt;&amp;gt;&lt;br /&gt;&amp;gt;               i (tell (push 2)) &amp;gt;&amp;gt;&lt;br /&gt;&amp;gt;               i (tell dup) &amp;gt;&amp;gt;&lt;br /&gt;&amp;gt;               i (tell add) &amp;gt;&amp;gt;&lt;br /&gt;&amp;gt;               i (tell swap) &amp;gt;&amp;gt;&lt;br /&gt;&amp;gt;               i (tell drop)&lt;br /&gt;&lt;br /&gt;&amp;gt; go4 = execWriter test4 ()&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;So what's going on here? The implementations of these instances require almost trivial changes to the original monads, or in two cases no changes at all apart from the type signature. I have my opinion: Haskell programmers have been using the wrong type class all along. In each case the type signature for &lt;tt&gt;return&lt;/tt&gt; and &lt;tt&gt;&amp;gt;&amp;gt;=&lt;/tt&gt; was too strict and so the functionality was being unnecessarily shackled. By writing the code without a signature, ghci tells us what the correct signature should have been all along. I think it might just possibly be time to consider making &lt;tt&gt;ParameterisedMonad&lt;/tt&gt; as important as &lt;tt&gt;Monad&lt;/tt&gt; to Haskell programming. At the very least, do-notation needs to be adapted to support &lt;tt&gt;ParameterisedMonad&lt;/tt&gt;.&lt;br /&gt;&lt;br /&gt;Update: You *can* use do-notation with ParameterisedMonad if you use the NoImplicitPrelude flag.&lt;br /&gt;&lt;br /&gt;Update2: Some credits and links:&lt;br /&gt;&lt;br /&gt;&lt;OL&gt;&lt;br /&gt;&lt;LI&gt;&lt;a href="http://blog.unsafeperformio.com/?p=3"&gt;The Polystate Monad&lt;/a&gt; is one of the independent discoveries I mentioned above.&lt;br /&gt;&lt;LI&gt;A more general approach to &lt;a href="http://comonad.com/reader/2007/parameterized-monads-in-haskell/"&gt;Parameterized Monads in Haskell&lt;/a&gt;.&lt;br /&gt;&lt;LI&gt;&lt;a href="http://computationalthoughts.blogspot.com/2009/02/comment-on-parameterized-monads.html"&gt;A comment on Parameterized Monads&lt;/a&gt; that shows explicitly how to make this work with NoImplicitPrelude.&lt;br /&gt;&lt;li&gt;Oleg's &lt;a href="http://okmij.org/ftp/Computation/monads.html#param-monad"&gt;Variable (type)state `monad'&lt;/a&gt;.&lt;br /&gt;&lt;LI&gt;Wadler discovered this design pattern back in 1993 in &lt;a href="http://www.brics.dk/~hosc/local/LaSC-7-1-pp39-56.pdf"&gt;Monads and composable continuations&lt;/a&gt;.&lt;br /&gt;&lt;/OL&gt;&lt;br /&gt;&lt;br /&gt;I didn't contribute anything, this article is just advocacy.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-5798722270156006834?l=blog.sigfpe.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sigfpe.com/feeds/5798722270156006834/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=11295132&amp;postID=5798722270156006834' title='22 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/5798722270156006834'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/5798722270156006834'/><link rel='alternate' type='text/html' href='http://blog.sigfpe.com/2009/02/beyond-monads.html' title='Beyond Monads'/><author><name>sigfpe</name><uri>http://www.blogger.com/profile/08096190433222340957</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='09401818062305273147'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>22</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11295132.post-7669937180420541947</id><published>2009-01-31T14:19:00.000-08:00</published><updated>2009-01-31T19:14:49.087-08:00</updated><title type='text'>Beyond Regular Expressions: More Incremental String Matching</title><content type='html'>In my last post I showed how to incrementally match long strings against regular expressions. I now want to apply similar methods to matching languages that can't be described by regular expressions. (Note that 'language' is just jargon for a set of strings that meet some criterion.) In particular, regular expressions can't be used to test a string for balanced parentheses. This is because we need some kind of mechanism to count how many open parentheses are still pending and a finite state machine can't represent arbitrary integers.&lt;br /&gt;&lt;br /&gt;So let's start with a slightly more abstract description of what was going on last time so we can see the bigger picture. We were storing strings in balanced trees with a kind of 'measurement' or 'digest' of the string stored in the nodes of the tree. Each character mapped to an element of a monoid via a function called &lt;tt&gt;measure&lt;/tt&gt; and you can think of the measurement function as acting on entire strings if you &lt;tt&gt;mappend&lt;/tt&gt; together all of the measurements for each of the characters. So what we have is a function &lt;tt&gt;f :: String -&amp;gt; M&lt;/tt&gt; taking strings to some type M (in the last post M was a type of array) with the properties&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;f (a ++ b) == f a `mappend` f b&lt;br /&gt;f [] == mempty&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;By noticing that String is itself a monoid we can write this as&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;f (a `mappend` b) == f a `mappend` f b&lt;br /&gt;f mempty == mempty&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Anything satisfying these laws is called a monoid homomorphism, or just homomorphism for short.&lt;br /&gt;&lt;br /&gt;So the technique I used worked like this: I found a homomorphism from &lt;tt&gt;String&lt;/tt&gt; to some type with the useful property that for any string s, &lt;tt&gt;f s&lt;/tt&gt; still contains all the information required to figure out if we're dealing with a member of our language. If &lt;tt&gt;f&lt;/tt&gt; turns a string into something more efficient to work with then we can make our string matching more efficient.&lt;br /&gt;&lt;br /&gt;Now I want to make the notion of "contains all the information required" more precise by considering an example. Consider strings that consist only of the characters &lt;tt&gt;(&lt;/tt&gt; and &lt;tt&gt;)&lt;/tt&gt;. Our language will be the set of strings whose parentheses balance. In other words the total number of &lt;tt&gt;(&lt;/tt&gt; must match the total number of &lt;tt&gt;)&lt;/tt&gt;, and as we scan from left to right we must never see more &lt;tt&gt;)&lt;/tt&gt; than &lt;tt&gt;(&lt;/tt&gt;. For example, &lt;tt&gt;()()()&lt;/tt&gt; and &lt;tt&gt;((()))()&lt;/tt&gt; are in our language, but &lt;tt&gt;)()()(&lt;/tt&gt; isn't. This language is called the Dyck language.&lt;br /&gt;&lt;br /&gt;Suppose we're testing whether or not some string is in the Dyck language. If we see &lt;tt&gt;()&lt;/tt&gt; as a substring then if we delete it from the string, it makes no difference to whether or not the string is in the Dyck language. In fact, if we see &lt;tt&gt;(())&lt;/tt&gt;, &lt;tt&gt;((()))&lt;/tt&gt;, &lt;tt&gt;(((())))&lt;/tt&gt; and so on they can all be deleted. On the other hand, you can't delete &lt;tt&gt;)(&lt;/tt&gt; without knowing about the rest of the string. Deleting it from &lt;tt&gt;()()&lt;/tt&gt; makes no difference to its membership in the Dyck language, but deleting it from &lt;tt&gt;)(()&lt;/tt&gt; certainly does.&lt;br /&gt;&lt;br /&gt;So given a language L, we can say that two strings, x and y, are interchangeable with respect to L if any time we see x as a substring of another string we can replace it with y, and vice versa, without making any difference to whether the string is in the language. Interchangeable strings are a kind of waste of memory. If we're testing for membership of L there's no need to distinguish between them. So we'd like our measurement homomorphism to map all interchangeable strings to the same values. But we don't want to map any more strings to the same value because then we lose information that tells us if a string is an element of L. A homomorphism that strikes this balance perfectly is called the 'canonical homomorphism' and the image of the set of all strings under this homomorphisms is called the &lt;a href="http://en.wikipedia.org/wiki/Syntactic_monoid"&gt;syntactic monoid&lt;/a&gt;. By 'image', I simply mean all the possible values that could arise from applying the homomorphism to all possible strings.&lt;br /&gt;&lt;br /&gt;So lets go back to the Dyck language. Any time we see &lt;tt&gt;()&lt;/tt&gt; we can delete it. But if we delete every occurence of &lt;tt&gt;()&lt;/tt&gt; from a string then all we have left is a bunch of &lt;tt&gt;)&lt;/tt&gt; followed by a bunch of &lt;tt&gt;(&lt;/tt&gt;. Let's say it's p of the former, and q of the latter. Every string of parentheses can be distilled down to a pair of integers &amp;ge;0, (p,q). But does this go far enough? Could we distill any further? Well for any choice of (p,q) it's a good exercise to see that for any other choice of (p',q') there's always a string in the Dyck language where if you have )&lt;sup&gt;p&lt;/sup&gt;(&lt;sup&gt;q&lt;/sup&gt; as a substring, replacing it with (p',q') gives you something not in the language. So you can't distill any further. Which means we have our syntactic monoid and canonical homomorphism. In this case the monoid is called the &lt;a href="http://en.wikipedia.org/wiki/Bicyclic_monoid"&gt;bicyclic monoid&lt;/a&gt; and we can implement it as follows:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; {-# LANGUAGE TypeSynonymInstances,FlexibleInstances,MultiParamTypeClasses #-}&lt;br /&gt;&amp;gt; import Data.Foldable&lt;br /&gt;&amp;gt; import Data.Monoid&lt;br /&gt;&amp;gt; import Data.FingerTree hiding (fromList)&lt;br /&gt;&amp;gt; import qualified Data.List as L&lt;br /&gt;&lt;br /&gt;&amp;gt; data Bicyclic = B Int Int deriving (Eq,Show)&lt;br /&gt;&lt;br /&gt;&amp;gt; hom '(' = B 0 1&lt;br /&gt;&amp;gt; hom ')' = B 1 0&lt;br /&gt;&lt;br /&gt;&amp;gt; instance Monoid Bicyclic where&lt;br /&gt;&amp;gt;   mempty = B 0 0&lt;br /&gt;&amp;gt;   B a b `mappend` B c d = B (a-b+max b c) (d-c+max b c)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Where did that code for &lt;tt&gt;mappend&lt;/tt&gt; come from? Consider )&lt;sup&gt;a&lt;/sup&gt;(&lt;sup&gt;b&lt;/sup&gt;)&lt;sup&gt;c&lt;/sup&gt;(&lt;sup&gt;d&lt;/sup&gt;. We can delete &lt;tt&gt;()&lt;/tt&gt; from the middle many times over.&lt;br /&gt;&lt;br /&gt;Now we can more or less reproduce the code of last week and get a Dyck language tester. Once we've distilled a string down to (p,q) we only need to test whether or not p=q=0 to see whether or not our parentheses are balanced:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; matches' s = x==B 0 0 where&lt;br /&gt;&amp;gt;   x = mconcat (map hom s)&lt;br /&gt;&lt;br /&gt;&amp;gt; data Elem a = Elem { getElem :: a } deriving Show&lt;br /&gt;&amp;gt; data Size = Size { getSize :: Int } deriving (Eq,Ord,Show)&lt;br /&gt;&lt;br /&gt;&amp;gt; instance Monoid Size where&lt;br /&gt;&amp;gt;    mempty = Size 0&lt;br /&gt;&amp;gt;    Size m `mappend` Size n = Size (m+n)&lt;br /&gt;&lt;br /&gt;&amp;gt; instance Measured (Size,Bicyclic) (Elem Char) where&lt;br /&gt;&amp;gt;    measure (Elem a) = (Size 1,hom a)&lt;br /&gt;&lt;br /&gt;&amp;gt; type FingerString = FingerTree (Size,Bicyclic) (Elem Char)&lt;br /&gt;&lt;br /&gt;&amp;gt; insert :: Int -&amp;gt; Char -&amp;gt; FingerString -&amp;gt; FingerString&lt;br /&gt;&amp;gt; insert i c z = l &amp;gt;&amp;lt; (Elem c &amp;lt;| r) where (l,r) = split (\(Size n,_) -&amp;gt; n&amp;gt;i) z&lt;br /&gt;&lt;br /&gt;&amp;gt; string = empty :: FingerString&lt;br /&gt;&lt;br /&gt;&amp;gt; matchesDyck string = snd (measure string)==B 0 0&lt;br /&gt;&lt;br /&gt;&amp;gt; loop string = do&lt;br /&gt;&amp;gt;   print $ map getElem (toList string)&lt;br /&gt;&amp;gt;   print $ "matches? " ++ show (matchesDyck string)&lt;br /&gt;&amp;gt;   print "(Position,Character)"&lt;br /&gt;&amp;gt;   r &amp;lt;- getLine&lt;br /&gt;&amp;gt;   let (i,c) = read r&lt;br /&gt;&amp;gt;   loop $ insert i c string&lt;br /&gt;&lt;br /&gt;&amp;gt; main = do&lt;br /&gt;&amp;gt;   loop string&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;There's a completely different way to test membership of the Dyck language. Replace each &lt;tt&gt;(&lt;/tt&gt; with 1 and &lt;tt&gt;)&lt;/tt&gt; with -1. Now scan from left to right keeping track of (1) the sum of all the numbers so far and (2) the minimum value taken by this sum. If the final sum and the final minimal sum are zero, then we have matching parentheses. But we need to do this on substrings without scanning from the beginning in one go. That's an example of a parallel prefix sum problem and it's what I talked about &lt;a href="http://sigfpe.blogspot.com/2008/11/approach-to-algorithm-parallelisation.html"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;So here's an extended exercise: adapt the parallel prefix sum approach to implement incremental Dyck language testing with fingertrees. You should end up with a canonical homomorphism that's similar to the one above. It'll probably be slightly different but ultimately equivalent.&lt;br /&gt;&lt;br /&gt;And here's an even more extended exercise: protein sequences are sequences from a 20 letter alphabet. Each letter can be assigned a hydrophobicity value from &lt;a href="http://www.vivo.colostate.edu/molkit/hydropathy/scales.html"&gt;certain tables&lt;/a&gt;. (Pick whichever table you want.) The hydrophobicity of a string is the sum of the hydrophobicities of its letters. Given a string, we can give it a score corresponding to the largest hydrophobicity of any contiguous substring in it. Use fingertrees and a suitable monoid to track this score as the string is incrementally edited. Note how widely separated substrings can suddenly combine together as stuff between them is adjusted.&lt;br /&gt;&lt;br /&gt;If you're interested in Dyck languages with multiple types of parenthesis that need to match you need something &lt;a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.18.1014"&gt;much more fiendish&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-7669937180420541947?l=blog.sigfpe.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sigfpe.com/feeds/7669937180420541947/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=11295132&amp;postID=7669937180420541947' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/7669937180420541947'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/7669937180420541947'/><link rel='alternate' type='text/html' href='http://blog.sigfpe.com/2009/01/beyond-regular-expressions-more.html' title='Beyond Regular Expressions: More Incremental String Matching'/><author><name>sigfpe</name><uri>http://www.blogger.com/profile/08096190433222340957</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='09401818062305273147'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11295132.post-4382746632400173667</id><published>2009-01-24T13:01:00.000-08:00</published><updated>2009-01-24T16:47:00.328-08:00</updated><title type='text'>Fast incremental regular expression matching with monoids</title><content type='html'>&lt;H3&gt;The Problem&lt;/H3&gt;&lt;br /&gt; Consider this problem: Fix a regular expression R. Suppose you have a string of length N. There's not much you can do about it, you'll likely have to scan all N characters to test so see if the string matches R. But once you've performed the test, how fast can you test the string again if a small edit is made to it? It seems that in general you'd have to rescan the entire string, or at least rescan from where the edit was made. But it turns out that you can do regular expression matching incrementally so that for many changes you might make to the string, you only require O(log N) time to recompute whether the string matches.  This is true even if characters at opposite ends of the string interact to make a succcessful match. What's more, it's remarkably straightforward to implement if we make use of fingertrees and monoids.&lt;br /&gt;&lt;br /&gt;I'm going to assume a bit of background for which resources can be found on the web: understanding some &lt;a href="http://sigfpe.blogspot.com/2009/01/haskell-monoids-and-their-uses.html"&gt;basics about monoids&lt;/a&gt;, understanding &lt;a href="http://apfelmus.nfshost.com/monoid-fingertree.html"&gt;apfelmus's inspiring introduction to fingertrees&lt;/a&gt; or the &lt;a href="http://www.soi.city.ac.uk/~ross/papers/FingerTree.html"&gt;original paper&lt;/a&gt;, and you'll need to be completely comfortable with the idea of compiling regular expressions to finite state machines.&lt;br /&gt;&lt;br /&gt;As usual, this is literate Haskell. But you need to have the &lt;a href="http://hackage.haskell.org/cgi-bin/hackage-scripts/package/fingertree"&gt;fingertree&lt;/a&gt; package installed and we need a bunch of imports.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; {-# LANGUAGE TypeSynonymInstances,FlexibleInstances,MultiParamTypeClasses #-}&lt;br /&gt;&amp;gt; import qualified Data.Array as B&lt;br /&gt;&amp;gt; import Data.Array.Unboxed as U&lt;br /&gt;&amp;gt; import Data.Foldable&lt;br /&gt;&amp;gt; import Data.Monoid&lt;br /&gt;&amp;gt; import Data.FingerTree hiding (fromList)&lt;br /&gt;&amp;gt; import qualified Data.List as L&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;H3&gt;A Finite State Machine&lt;/H3&gt;&lt;br /&gt;So let's start with an example regular expression: &lt;tt&gt;.*(.*007.*).*&lt;/tt&gt;. We're looking for "007" enclosed between parentheses, but the parentheses could be millions of characters apart.&lt;br /&gt;&lt;br /&gt;A standard technique for finding regular expressions is to compile them to a finite state automaton. It takes quite a bit of code to do that, but it is completely standard. So rather than do that, here's a finite state machine I constructed by hand for this regular expression:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_UdKHLrHa05M/SXuJZcQX39I/AAAAAAAAAR0/D2xW8JY66VY/s1600-h/fsa.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 388px; height: 365px;" src="http://2.bp.blogspot.com/_UdKHLrHa05M/SXuJZcQX39I/AAAAAAAAAR0/D2xW8JY66VY/s400/fsa.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5294976857078751186" /&gt;&lt;/a&gt;&lt;br /&gt;I've used the convention that an unlabelled edge corresponds to any input that isn't matched by another labelled edge. We can express the transitions as a function &lt;tt&gt;fsm&lt;/tt&gt;.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; fsm 0 '(' = 1&lt;br /&gt;&amp;gt; fsm 0 _ = 0&lt;br /&gt;&lt;br /&gt;&amp;gt; fsm 1 '0' = 2&lt;br /&gt;&amp;gt; fsm 1 _ = 1&lt;br /&gt;&lt;br /&gt;&amp;gt; fsm 2 '0' = 3&lt;br /&gt;&amp;gt; fsm 2 _ = 1&lt;br /&gt;&lt;br /&gt;&amp;gt; fsm 3 '7' = 4&lt;br /&gt;&amp;gt; fsm 3 '0' = 3&lt;br /&gt;&amp;gt; fsm 3 _ = 1&lt;br /&gt;&lt;br /&gt;&amp;gt; fsm 4 ')' = 5&lt;br /&gt;&amp;gt; fsm 4 _ = 4&lt;br /&gt;&lt;br /&gt;&amp;gt; fsm 5 _ = 5&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;The initial state is 0 and a match corresponds to state 5.&lt;br /&gt;&lt;br /&gt;We can test a string in the standard way using&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; matches s = Prelude.foldl fsm 0 s==5&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Try &lt;tt&gt;matches "(00 7)"&lt;/tt&gt; and &lt;tt&gt;matches "He(007xxxxxxxxxxxx)llo"&lt;/tt&gt;.&lt;br /&gt;&lt;br /&gt;We can thing of the inputs as being functions acting on the automaton. Each input character is a function that maps the automaton from one state to another. We could use the Haskell composition function, &lt;tt&gt;(.)&lt;/tt&gt;, to compose these functions. But &lt;tt&gt;(.)&lt;/tt&gt; doesn't really do anything, &lt;tt&gt;f . g&lt;/tt&gt; is just a closure that says "when the time comes, apply &lt;tt&gt;g&lt;/tt&gt; and then &lt;tt&gt;f&lt;/tt&gt;.&lt;br /&gt;&lt;br /&gt;On the other hand we could tabulate our transition functions as follows:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; tabulate f = array (0,5) [(x,f x) | x &amp;lt;- range (0,5)]&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;We have one such tabulated function for each letter in our alphabet:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; letters = array (' ','z') [(i,tabulate (flip fsm i)) | i &amp;lt;- range (' ','z')]&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Given two tabulated functions we can easily form the table of the composition function. In fact, our tabulated functions form a monoid with &lt;tt&gt;mappend&lt;/tt&gt; for composition. I used unboxed arrays for performance:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; type Table = UArray Int Int&lt;br /&gt;&lt;br /&gt;&amp;gt; instance Monoid Table where&lt;br /&gt;&amp;gt;    mempty = tabulate id&lt;br /&gt;&amp;gt;    f `mappend` g = tabulate (\state -&amp;gt; (U.!) g ((U.!) f state))&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Note that we've cheated a bit. An object of type &lt;tt&gt;Table&lt;/tt&gt; could be any array of &lt;tt&gt;Ints&lt;/tt&gt; indexed by &lt;tt&gt;Int&lt;/tt&gt;. But if we promise to only build arrays indexed by elements of &lt;tt&gt;[0..5]&lt;/tt&gt; and containing elements of the same range then our claim to monoidhood is valid.&lt;br /&gt;&lt;br /&gt;Given any string, we can compute whether it matches our regular expression by looking up the corresponding &lt;tt&gt;Table&lt;/tt&gt; in our &lt;tt&gt;letters&lt;/tt&gt; array, composing them, and then checking if the tabulated function maps the initial state 0 to the final state 5:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; matches' s = table!0==5 where&lt;br /&gt;&amp;gt;   table = mconcat (map ((B.!) letters) s)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;This is slower and more complex than our original implementation of &lt;tt&gt;matches&lt;/tt&gt;. But what we've now done is 'tease out' a monoid structure from the problem. If we store a string as a sequence of characters represented by a fingertree, we can store in each subtree the element of &lt;tt&gt;Table&lt;/tt&gt; corresponding to the substring it represents. Every time the tree is rebalanced we need to recompute the corresponding &lt;tt&gt;Table&lt;/tt&gt;s. But that's fine, it typically involves only O(log N) operations, and we don't need to write any code, the fingertree will do it for us automatically. Once we've done this, we end up with a representation of strings with the property that we always know what the corresponding &lt;tt&gt;Table&lt;/tt&gt; is. We can freely split and join such trees knowing that the &lt;tt&gt;Table&lt;/tt&gt; will always be up to date.&lt;br /&gt;&lt;br /&gt;The only slight complication is that I want to be able to randomly access the nth character of the tree. apfelmus &lt;a href="http://apfelmus.nfshost.com/monoid-fingertree.html"&gt;explains&lt;/a&gt; that in his post. The change I need to make is that I'm going to use both the &lt;tt&gt;Size&lt;/tt&gt; monoid and the &lt;tt&gt;Table&lt;/tt&gt; monoid, so I need the &lt;a href="http://sigfpe.blogspot.com/2009/01/haskell-monoids-and-their-uses.html"&gt;product monoid&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; data Elem a = Elem { getElem :: a } deriving Show&lt;br /&gt;&amp;gt; data Size = Size { getSize :: Int } deriving (Eq,Ord,Show)&lt;br /&gt;&lt;br /&gt;&amp;gt; instance Monoid Size where&lt;br /&gt;&amp;gt;    mempty = Size 0&lt;br /&gt;&amp;gt;    Size m `mappend` Size n = Size (m+n)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;We need to implement &lt;tt&gt;measure&lt;/tt&gt; as in the &lt;a href="http://www.soi.city.ac.uk/~ross/papers/FingerTree.html"&gt;fingertree paper&lt;/a&gt;:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; instance Measured (Size,Table) (Elem Char) where&lt;br /&gt;&amp;gt;    measure (Elem a) = (Size 1,(B.!) letters a)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;H3&gt;A Fingertree&lt;/H3&gt;&lt;br /&gt;And now we can define our strings as:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; type FingerString = FingerTree (Size,Table) (Elem Char)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;The insertion routine is more or less what's in the paper:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; insert :: Int -&amp;gt; Char -&amp;gt; FingerString -&amp;gt; FingerString&lt;br /&gt;&amp;gt; insert i c z = l &amp;gt;&amp;lt; (Elem c &amp;lt;| r) where (l,r) = split (\(Size n,_) -&amp;gt; n&amp;gt;i) z&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Note how I project out the size from the product monoid in order to insert at the correct position.&lt;br /&gt;&lt;br /&gt;Here's an example string. Adjust the length to suit your memory and CPU horsepower:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; fromList = L.foldl' (|&amp;gt;) empty&lt;br /&gt;&amp;gt; string = fromList (map Elem $ take 1000000 $ cycle "the quick brown fox jumped over the lazy dog")&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;(I use a strict form of &lt;tt&gt;fromList&lt;/tt&gt; to ensure the tree actually gets built.)&lt;br /&gt;&lt;br /&gt;The actual match function simply projects out the second component of the monoid and again tests to see if it maps the initial state to the final state:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; matches007 string = ((U.!) (snd (measure string)) 0)==5&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;H3&gt;An Interactive Loop&lt;/H3&gt;&lt;br /&gt;I recommend compiling with optimisation, something like &lt;tt&gt;ghc --make -O5 -o regexp regexp.lhs&lt;/tt&gt;:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; loop string = do&lt;br /&gt;&amp;gt;   print $ "matches? " ++ show (matches007 string)&lt;br /&gt;&amp;gt;   print "(Position,Character)"&lt;br /&gt;&amp;gt;   r &amp;lt;- getLine&lt;br /&gt;&amp;gt;   let (i,c) = read r&lt;br /&gt;&amp;gt;   loop $ insert i c string&lt;br /&gt;&lt;br /&gt;&amp;gt; main = do&lt;br /&gt;&amp;gt;   loop string&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Now you can run this interactively. Input values like &lt;tt&gt;(100,'f')&lt;/tt&gt; to insert an 'f' at position 100. It can take a good few seconds to compute the initial tree, but after that the matching process is instantaneous. (Actually, the second match might take a few seconds, that's because despite the &lt;tt&gt;foldl'&lt;/tt&gt; the tree hasn't been fully built.)&lt;br /&gt;&lt;br /&gt;A suitable sample input is:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;(100,'(')&lt;br /&gt;(900000,')')&lt;br /&gt;(20105,'0')&lt;br /&gt;(20106,'0')&lt;br /&gt;(20107,'7')&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;&lt;H3&gt;Discussion&lt;/H3&gt;&lt;br /&gt;Note there is quite an overhead for this example. I'm storing an entire &lt;tt&gt;Table&lt;/tt&gt; for each character. But you can easily store chunks of string (like in a &lt;a href="http://www.sgi.com/tech/stl/ropeimpl.html"&gt;rope&lt;/a&gt;). This means that some chunks will be rescanned when a string is edited - but rescanning a 1K chunk, say, is a lot less expensive than scanning a gigabyte file in its entirety. Working in blocks will probably speed up the initial scan too, a much smaller tree needs to be built.&lt;br /&gt;&lt;br /&gt;When Hinze and Patterson originally wrote the fingertree paper they were motivated by parallel prefix sum methods. Just about any parallel prefix algorithm can be converted to an incremental algorithm using fingertrees. This article is based on the idea of doing this with the parallel lexing scheme described by Hillis and Steele in their classic Connection Machine &lt;a href="http://portal.acm.org/citation.cfm?id=7903"&gt;paper&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;So why would you want to match against a fixed regular expression like this? Well this method extends to a full blown incremental lexer. This will lex quickly even if placing a character in a string changes the type of lexemes billions of characters away. See the Hillis and Steele paper for details.&lt;br /&gt;&lt;br /&gt;Note there's nothing especially Haskelly about this code except that Haskell made it easy to prototype. You can do this in C++, say, using mutable red-black trees.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-4382746632400173667?l=blog.sigfpe.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sigfpe.com/feeds/4382746632400173667/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=11295132&amp;postID=4382746632400173667' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/4382746632400173667'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/4382746632400173667'/><link rel='alternate' type='text/html' href='http://blog.sigfpe.com/2009/01/fast-incremental-regular-expression.html' title='Fast incremental regular expression matching with monoids'/><author><name>sigfpe</name><uri>http://www.blogger.com/profile/08096190433222340957</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='09401818062305273147'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_UdKHLrHa05M/SXuJZcQX39I/AAAAAAAAAR0/D2xW8JY66VY/s72-c/fsa.png' height='72' width='72'/><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11295132.post-7531218329312814569</id><published>2009-01-17T13:47:00.000-08:00</published><updated>2009-11-14T11:46:17.632-08:00</updated><title type='text'>Haskell Monoids and their Uses</title><content type='html'>Haskell is a great language for constructing code modularly from small but orthogonal building blocks. One of these small blocks is the monoid. Although monoids come from mathematics (algebra in particular) they are found everywhere in computing. You probably use one or two monoids implicitly with every line of code you write, whatever the language, but you might not know it yet. By making them explicit we find interesting new ways of constructing those lines of code. In particular, ways that are often easier to both read and write. So the following is an intro to monoids in Haskell. I'm assuming familiarity with type classes, because Haskell monoids form a type class. I also assume some familiarity with monads, though nothing too complex.&lt;br /&gt;&lt;br /&gt;This post is literate Haskell so you can play with the examples directly.&lt;br /&gt;&lt;br /&gt;&lt;H3&gt;Defining Monoids&lt;/H3&gt;&lt;br /&gt;In Haskell, a monoid is a type with a rule for how two elements of that type can be combined to make another element of the same type. To be a monoid there also needs to be an element that you can think of as representing 'nothing' in the sense that when it's combined with other elements it leaves the other element unchanged.&lt;br /&gt;&lt;br /&gt;A great example is lists. Given two lists, say &lt;tt&gt;[1,2]&lt;/tt&gt; and &lt;tt&gt;[3,4]&lt;/tt&gt;, you can join them together using &lt;tt&gt;++&lt;/tt&gt; to get &lt;tt&gt;[1,2,3,4]&lt;/tt&gt;. There's also the empty list &lt;tt&gt;[]&lt;/tt&gt;. Using &lt;tt&gt;++&lt;/tt&gt; to combine &lt;tt&gt;[]&lt;/tt&gt; with any list gives you back the same list, for example &lt;tt&gt;[]++[1,2,3,4]==[1,2,3,4]&lt;/tt&gt;.&lt;br /&gt;&lt;br /&gt;Another example is the type of integers, &lt;tt&gt;Integer&lt;/tt&gt;. Given two elements, say 3 and 4, we can combine them with + to get 7. We also have the element 0 which when added to any other integer leaves it unchanged.&lt;br /&gt;&lt;br /&gt;So here is a possible definition for the monoid type class:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;class Monoid m where&lt;br /&gt;    mappend :: m -&gt; m -&gt; m&lt;br /&gt;    mempty :: m&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;The function mappend is the function we use to combine pairs of elements, and mempty is the 'nothing' element. We can make lists an instance like this:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;instance Monoid [a] where&lt;br /&gt;    mappend = (++)&lt;br /&gt;    mempty = []&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;Because we want mempty to do nothing when combined with other elements we also require monoids to obey these two rules&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;a `mappend` mempty = a&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;and&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;mempty `mappend` a = a.&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;Notice how there are two ways to combine a and b using mappend. We can write &lt;tt&gt;a `mappend` b&lt;/tt&gt; or &lt;tt&gt;b `mappend` a&lt;/tt&gt;. There is no requirement on a monoid that these be equal to each other. (But see below.) But there is another property that monoids are required to have. Suppose we start with the list &lt;tt&gt;[3,4]&lt;/tt&gt;. And now suppose we want to concatenate it with &lt;tt&gt;[1,2]&lt;/tt&gt; on the left and &lt;tt&gt;[5,6]&lt;/tt&gt; on the right. We could do the left concatenation first to get &lt;tt&gt;[1,2]++[3,4]&lt;/tt&gt; and then form &lt;tt&gt;([1,2]++[3,4])++[5,6]&lt;/tt&gt;. But we could do the right one first and get &lt;tt&gt;[1,2]++([3,4]++[5,6])&lt;/tt&gt;. Because we're concatenating at opposite ends the two operations don't interfere and it doesn't matter which we do first. This gives rise to the third and last requirement we have of monoids:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;(a `mappend` b) `mappend` c == a `mappend` (b `mappend` c)&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;and you can summarise it with the slogan 'combining on the left doesn't interfere with combining on the right'. Notice how the integers, combined with +, also have this property. It's such a useful property it has a name: associativity.&lt;br /&gt;&lt;br /&gt;That's a complete specification of what a monoid is. Haskell doesn't enfore the three laws I've given, but anyone reading code using a monoid will expect these laws to hold.&lt;br /&gt;&lt;br /&gt;&lt;H3&gt;Some Uses of Monoids&lt;/H3&gt;&lt;br /&gt;But given that we already have individual functions like &lt;tt&gt;++&lt;/tt&gt; and &lt;tt&gt;+&lt;/tt&gt;, why would we ever want to use mappend instead?&lt;br /&gt;&lt;br /&gt;One reason is that with a monoid we get another function called mconcat for free. mconcat takes a list of values in a monoid and combines them all together. For example &lt;tt&gt;mconcat [a,b,c]&lt;/tt&gt; is equal to &lt;tt&gt;a `mappend` (b `mappend` c)&lt;/tt&gt;. Any time you have a monoid you have this quick and easy way to combine a whole list together. But note that there is some ambiguity in the idea behind &lt;tt&gt;mconcat&lt;/tt&gt;. To compute &lt;tt&gt;mconcat [a,b,...,c,d]&lt;/tt&gt; which order should we work in? Should we work from left to right and compute &lt;tt&gt;a `mappend` b&lt;/tt&gt; first? Or should we start with &lt;tt&gt;c `mappend` d&lt;/tt&gt;. That's one place where the associativity law comes in: it makes no difference.&lt;br /&gt;&lt;br /&gt;Another place where you might want to use a monoid is in code that is agnostic about how you want to combine elements. Just as mconcat works with any monoid, you might want to write your own code that works with any monoid.&lt;br /&gt;&lt;br /&gt;Explicitly using the Monoid type class for a function also tells the reader of your code what your intentions are. If a function has signature &lt;tt&gt;[a] -&amp;gt; b&lt;/tt&gt; you know it takes a list and constructs an object of type b from it. But it has considerable freedom in what it can do with your list. But if you see a function of type &lt;tt&gt;(Monoid a) =&amp;gt; a -&amp;gt; b&lt;/tt&gt;, even if it is only used with lists, we know what kind of things the function will do with the list. For example, we know that the function might add things to your list, but it's never going to pull any elements out of your list.&lt;br /&gt;&lt;br /&gt;The same type can give rise to a monoid in different ways. For example, I've already mentions that the integers form a monoid. So we could define:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;instance Monoid Integer where&lt;br /&gt;    mappend = (+)&lt;br /&gt;    mempty = 0&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;But there's a good reason not to do that: there's another natural way to make integers into a monoid:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&lt;br /&gt;instance Monoid Integer where&lt;br /&gt;    mappend = (*)&lt;br /&gt;    mempty = 1&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;We can't have both of these definitions at the same time. So the Data.Monoid library doesn't make Integer into a Monoid directly. Instead, it wraps them with Sum and Product. It also does so more generally so that you can make any Num type into a monoid in two different ways. We have both&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;Num a =&gt; Monoid (Sum a)&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;and&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;Num a =&gt; Monoid (Product a)&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;To use these we wrap our values in the appropriate wrapper and we can then use the monoid functions. For example &lt;tt&gt;mconcat [Sum 2,Sum 3,Sum 4]&lt;/tt&gt; is &lt;tt&gt;Sum 9&lt;/tt&gt;, but &lt;tt&gt;mconcat [Product 2,Product 3,Product 4]&lt;/tt&gt; is &lt;tt&gt;[Product 24]&lt;/tt&gt;.&lt;br /&gt;&lt;br /&gt;Using &lt;tt&gt;Sum&lt;/tt&gt; and &lt;tt&gt;Product&lt;/tt&gt; looks like a complicated way to do ordinary addition and multiplication. Why do things that way?&lt;br /&gt;&lt;br /&gt;&lt;H3&gt;The Writer Monad&lt;/H3&gt;&lt;br /&gt;You can think of monoids as being accumulators. Given a running total, n, we can add in a new value a to get a new running total n' = n `mappend` a. Accumulating totals is a very common design pattern in real code so it's useful to abstract this idea. This is exactly what the Writer monad allows. We can write monadic code that accumulates values as a "side effect". The function to perform the accumulation is (somewhat confusingly) called &lt;tt&gt;tell&lt;/tt&gt;. Here's an example where we're logging a trace of what we're doing.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; import Data.Monoid&lt;br /&gt;&amp;gt; import Data.Foldable&lt;br /&gt;&amp;gt; import Control.Monad.Writer&lt;br /&gt;&amp;gt; import Control.Monad.State&lt;br /&gt;&lt;br /&gt;&amp;gt; fact1 :: Integer -&amp;gt; Writer String Integer&lt;br /&gt;&amp;gt; fact1 0 = return 1&lt;br /&gt;&amp;gt; fact1 n = do&lt;br /&gt;&amp;gt;   let n' = n-1&lt;br /&gt;&amp;gt;   tell $ "We've taken one away from " ++ show n ++ "\n"&lt;br /&gt;&amp;gt;   m &amp;lt;- fact1 n'&lt;br /&gt;&amp;gt;   tell $ "We've called f " ++ show m ++ "\n"&lt;br /&gt;&amp;gt;   let r = n*m&lt;br /&gt;&amp;gt;   tell $ "We've multiplied " ++ show n ++ " and " ++ show m ++ "\n"&lt;br /&gt;&amp;gt;   return r&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;This is an implementation of the factorial function that tells us what it did. Each time we call &lt;tt&gt;tell&lt;/tt&gt; we combine its argument with the running log of all of the strings that we've 'told' so far. We use &lt;tt&gt;runWriter&lt;/tt&gt; to extract the results back out. If we run&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; ex1 = runWriter (fact1 10)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;we get back both 10! and a list of what it took to compute this.&lt;br /&gt;&lt;br /&gt;But Writer allows us to accumulate more than just strings. We can use it with any monoid. For example, we can use it to count how many multiplications and subtractions were required to compute a given factorial. To do this we simply tell a value of the appropriate type. In this case we want to add values, and the monoid for addition is Sum. So instead we could implement:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; fact2 :: Integer -&amp;gt; Writer (Sum Integer) Integer&lt;br /&gt;&amp;gt; fact2 0 = return 1&lt;br /&gt;&amp;gt; fact2 n = do&lt;br /&gt;&amp;gt;   let n' = n-1&lt;br /&gt;&amp;gt;   tell $ Sum 1&lt;br /&gt;&amp;gt;   m &amp;lt;- fact2 n'&lt;br /&gt;&amp;gt;   let r = n*m&lt;br /&gt;&amp;gt;   tell $ Sum 1&lt;br /&gt;&amp;gt;   return r&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;    &lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; ex2 = runWriter (fact2 10)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;There's another way we could have written this, using the state monad:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; fact3 :: Integer -&amp;gt; State Integer Integer&lt;br /&gt;&amp;gt; fact3 0 = return 1&lt;br /&gt;&amp;gt; fact3 n = do&lt;br /&gt;&amp;gt;   let n' = n-1&lt;br /&gt;&amp;gt;   modify (+1)&lt;br /&gt;&amp;gt;   m &amp;lt;- fact3 n'&lt;br /&gt;&amp;gt;   let r = n*m&lt;br /&gt;&amp;gt;   modify (+1)&lt;br /&gt;&amp;gt;   return r&lt;br /&gt;&lt;br /&gt;&amp;gt; ex3 = runState (fact3 10) 0&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;It works just as well, but there is a big advantage to using the &lt;tt&gt;Writer&lt;/tt&gt; version. It has type signature &lt;tt&gt;f :: Integer -&amp;gt; Writer (Sum Integer) Integer&lt;/tt&gt;. We can immediately read from this that our function has a side effect that involves accumulating a number in a purely additive way. It's never going to, for example, multiply the accumulated value. The type information tells us a lot about what is going on inside the function without us having to read a single line of the implementation. The version written with &lt;tt&gt;State&lt;/tt&gt; is free to do whatever it likes with the accumulated value and so it's harder to discern its purpose.&lt;br /&gt;&lt;br /&gt;Data.Monoid also provides an Any monoid. This is the Bool type with the disjunction operator, better known as ||. The idea behind the name is that if you combine together any collection of elements of type &lt;tt&gt;Any&lt;/tt&gt; then the result is &lt;tt&gt;Any True&lt;/tt&gt; precisely when at least any one of the original elements is &lt;tt&gt;Any True&lt;/tt&gt;. If we think of these values as accumulators then they provide a kind of one way switch. We start accumulating with mempty, ie. &lt;tt&gt;Any False&lt;/tt&gt;, and we can think of this as being the switch being off. Any time we accumulate Any True into our running 'total' the switch is turned on. This switch can never be switched off again by accumulating any more values. This models a pattern we often see in code: a flag that we want to switch on, as a side effect, if a certain condition is met at any point.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; fact4 :: Integer -&amp;gt; Writer Any Integer&lt;br /&gt;&amp;gt; fact4 0 = return 1&lt;br /&gt;&amp;gt; fact4 n = do&lt;br /&gt;&amp;gt;   let n' = n-1&lt;br /&gt;&amp;gt;   m &amp;lt;- fact4 n'&lt;br /&gt;&amp;gt;   let r = n*m&lt;br /&gt;&amp;gt;   tell (Any (r==120))&lt;br /&gt;&amp;gt;   return r&lt;br /&gt;&lt;br /&gt;&amp;gt; ex4 = runWriter (fact4 10)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;At the end of our calculation we get n!, but we are also told if at any stage in the calculation two numbers were multiplied to give 120. We can almost read the tell line as if it were English: "tell my caller if any value of r is ever 120". Not only do we get the plumbing for this flag with a minimal amount of code. If we look at the type for this version of f it tells us exactly what's going on. We can read off immediately that this function, as a "side effect", computes a flag that can be turned on, but never turned off. That's a lot of useful information from just a type signature. In many other programming languages we might expect to see a boolean in the type signature, but we'd be forced to read the code to get any idea of how it will be used.&lt;br /&gt;&lt;br /&gt;&lt;H3&gt;Commutative Monoids, Non-Commutative Monoids and Dual Monoids&lt;/H3&gt;&lt;br /&gt;&lt;br /&gt;Two elements of a monoid, x and y, are said to commute if &lt;tt&gt;x `mappend` y == y `mappend` x&lt;/tt&gt;. The monoid itself is said to be commutative if all of its elements commute with each other. A good example of a commutative monoid is the type of integers. For any pair of integers, &lt;tt&gt;a+b==b+a&lt;/tt&gt;.&lt;br /&gt;&lt;br /&gt;If a monoid isn't commutative, it's said to be non-commutative. If it's non-comuutative it means that for some x and y, &lt;tt&gt;x `mappend` y&lt;/tt&gt; isn't the same as &lt;tt&gt;y `mappend` x&lt;/tt&gt;, so &lt;tt&gt;mappend&lt;/tt&gt; and &lt;tt&gt;flip mappend&lt;/tt&gt; are not the same function. For example &lt;tt&gt;[1,2] ++ [3,4]&lt;/tt&gt; is different from &lt;tt&gt;[3,4] ++ [1,2]&lt;/tt&gt;. This has the interesting consequence that we can make another monoid in which the combination function is &lt;tt&gt;flip mappend&lt;/tt&gt;. We can still use the same &lt;tt&gt;mempty&lt;/tt&gt; element, so the first two monoid laws hold. Additionally, it's a nice exercise to prove that the third monoid law still holds. This flipped monoid is called the dual monoid and Data.Monoid provides the &lt;tt&gt;Dual&lt;/tt&gt; type constructor to build the dual of a monoid. We can use this to reverse the order in which the writer monad accumulates values. For example the following code collects the execution trace in reverse order:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; fact5 :: Integer -&amp;gt; Writer (Dual String) Integer&lt;br /&gt;&amp;gt; fact5 0 = return 1&lt;br /&gt;&amp;gt; fact5 n = do&lt;br /&gt;&amp;gt;   let n' = n-1&lt;br /&gt;&amp;gt;   tell $ Dual $ "We've taken one away from " ++ show n ++ "\n"&lt;br /&gt;&amp;gt;   m &amp;lt;- fact5 n'&lt;br /&gt;&amp;gt;   tell $ Dual $ "We've called f " ++ show m ++ "\n"&lt;br /&gt;&amp;gt;   let r = n*m&lt;br /&gt;&amp;gt;   tell $ Dual $ "We've multiplied " ++ show n ++ " and " ++ show m ++ "\n"&lt;br /&gt;&amp;gt;   return r&lt;br /&gt;&lt;br /&gt;&amp;gt; ex5 = runWriter (fact5 10)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;H3&gt;The Product Monoid&lt;/H3&gt;&lt;br /&gt;&lt;br /&gt;Suppose we want to accumulate two side effects at the same time. For example, maybe we want to both count instructions and leave a readable trace of our computation. We could use monad transformers to combine two writer monads. But there is a slightly easier way - we can combine two monoids into one 'product' monoid. It's defined like this:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;instance (Monoid a,Monoid b) =&gt; Monoid (a,b) where&lt;br /&gt;    mempty = (mempty,mempty)&lt;br /&gt;    mappend (u,v) (w,x) = (u `mappend` w,v `mappend` x)&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;Each time we use mappend on the product we actually perform a pair of mappends on each of the elements of the pair. With these small helper functions:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; tellFst a = tell $ (a,mempty)&lt;br /&gt;&amp;gt; tellSnd b = tell $ (mempty,b)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;we can now use two monoids simultaneously:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; fact6 :: Integer -&amp;gt; Writer (String,Sum Integer) Integer&lt;br /&gt;&amp;gt; fact6 0 = return 1&lt;br /&gt;&amp;gt; fact6 n = do&lt;br /&gt;&amp;gt;   let n' = n-1&lt;br /&gt;&amp;gt;   tellSnd (Sum 1)&lt;br /&gt;&amp;gt;   tellFst $ "We've taken one away from " ++ show n ++ "\n"&lt;br /&gt;&amp;gt;   m &amp;lt;- fact6 n'&lt;br /&gt;&amp;gt;   let r = n*m&lt;br /&gt;&amp;gt;   tellSnd (Sum 1)&lt;br /&gt;&amp;gt;   tellFst $ "We've multiplied " ++ show n ++ " and " ++ show m ++ "\n"&lt;br /&gt;&amp;gt;   return r&lt;br /&gt;&lt;br /&gt;&amp;gt; ex6 = runWriter (fact6 5)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;If we had simply implemented our code using one specific monoid, like lists, our code would be very limited in its application. But by using the general &lt;tt&gt;Monoid&lt;/tt&gt; type class we ensure that users of our code can use not just any individual monoid, but even multiple monoids. This can make for more efficient code because it means we can perform multiple accumulations while traversing a data structure once. And yet we still ensure readability because our code is written using the interface to a single monoid making our algorithms simpler to read.&lt;br /&gt;&lt;br /&gt;&lt;H3&gt;Foldable Data&lt;/H3&gt;&lt;br /&gt;&lt;br /&gt;One last application to mention is the Data.Foldable library. This provides a generic approach to walking through a datastructure, accumulating values as we go. The &lt;tt&gt;foldMap&lt;/tt&gt; function applies a function to each element of our structure and then accumulates the return values of each of these applications. An implementation of &lt;tt&gt;foldMap&lt;/tt&gt; for a tree structure might be:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; data Tree a = Empty | Leaf a | Node (Tree a) a (Tree a)&lt;br /&gt;&lt;br /&gt;&amp;gt; instance Foldable Tree where&lt;br /&gt;&amp;gt;   foldMap f Empty = mempty&lt;br /&gt;&amp;gt;   foldMap f (Leaf x) = f x&lt;br /&gt;&amp;gt;   foldMap f (Node l k r) = foldMap f l `mappend` f k `mappend` foldMap f r&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;    &lt;br /&gt;We can now use any of the monoids discussed above to compute properties of our trees. For example, we can use the function &lt;tt&gt;(== 1)&lt;/tt&gt; to test whether each element is equal to 1 and then use the Any monoid to find out if any element of the tree is equal to 1. Here are a pair of examples: one to compute whether or not an element is equal to 1, and another to test if every element is greater than 5:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; tree = Node (Leaf 1) 7 (Leaf 2)&lt;br /&gt;&lt;br /&gt;&amp;gt; ex7 = foldMap (Any . (== 1)) tree&lt;br /&gt;&amp;gt; ex8 = foldMap (All . (&amp;gt; 5)) tree&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Note, of course, that these expressions can be used, unmodified, with any foldable type, not just trees.&lt;br /&gt;&lt;br /&gt;I hope you agree that this expresses our intentions in a way that is easy to read.&lt;br /&gt;&lt;br /&gt;That suggests another exercise: write something similar to find the minimum or maximum element in a tree. You may need to construct a new monoid along the lines of &lt;tt&gt;Any&lt;/tt&gt; and &lt;tt&gt;All&lt;/tt&gt;. Try finding both in one traversal of the tree using the product monoid.&lt;br /&gt;&lt;br /&gt;The foldable example also illustrates another point. The implementor of &lt;tt&gt;foldMap&lt;/tt&gt; for the tree doesn't need to worry about whether the left tree should be combined with the central element before the right tree. Associativity means it can be implemented either way and give the same results.&lt;br /&gt;&lt;br /&gt;&lt;H3&gt;Recap&lt;/H3&gt;&lt;br /&gt;Monoids provide a general approach to combining and accumulating values. They allow us to write code that is agnostic about the method we will use to combine values, and that makes our code more reusable. By using named monoids we can write type signatures that express our intentions to people reading our code: for example by using Any instead of Bool we make it clear just how our boolean value is to be used. And we can combine the monoid-based building blocks provided by Haskell libraries to build useful and readable algorithms with a minimum of effort.&lt;br /&gt;&lt;br /&gt;Some final notes: mathematicians often refer to mappend as a 'binary operator' and often it's called 'multiplication'. Just like in ordinary algebra, it's often also written with abuttal or using the star operator, ie. ab and a*b might both represent &lt;tt&gt;a `mappend` b&lt;/tt&gt;. You can read more about monoids at &lt;a href="http://en.wikipedia.org/wiki/Monoid"&gt;Wikipedia&lt;/a&gt;. And I wish I had time to talk about monoid morphisms, and why the list monoid is free (and what consequences that might have for how you can your write code), and how compositing gives you &lt;a href="http://en.wikipedia.org/wiki/Alpha_compositing"&gt;monoids&lt;/a&gt; and a whole lot more.&lt;br /&gt;&lt;HR&gt;&lt;br /&gt;&lt;iframe src="http://rcm.amazon.com/e/cm?t=sigfpe-20&amp;o=1&amp;p=8&amp;l=as1&amp;asins=0262660717&amp;fc1=000000&amp;IS2=1&amp;lt1=_blank&amp;m=amazon&amp;lc1=0000FF&amp;bc1=000000&amp;bg1=FFFFFF&amp;f=ifr" style="width:120px;height:240px;" scrolling="no" marginwidth="0" marginheight="0" frameborder="0"&gt;&lt;/iframe&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-7531218329312814569?l=blog.sigfpe.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sigfpe.com/feeds/7531218329312814569/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=11295132&amp;postID=7531218329312814569' title='31 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/7531218329312814569'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/7531218329312814569'/><link rel='alternate' type='text/html' href='http://blog.sigfpe.com/2009/01/haskell-monoids-and-their-uses.html' title='Haskell Monoids and their Uses'/><author><name>sigfpe</name><uri>http://www.blogger.com/profile/08096190433222340957</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='09401818062305273147'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>31</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11295132.post-4935385668423688429</id><published>2009-01-02T14:24:00.000-08:00</published><updated>2009-01-02T18:00:09.206-08:00</updated><title type='text'>Rewriting Monadic Expressions with Template Haskell</title><content type='html'>The goal today is to implement an impossible Haskell function. But as this is a literate Haskell post we need to get some boilerplate out of the way:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; module Test where&lt;br /&gt;&lt;br /&gt;&amp;gt; import Language.Haskell.TH&lt;br /&gt;&amp;gt; import Control.Monad&lt;br /&gt;&amp;gt; import Control.Monad.Reader&lt;br /&gt;&amp;gt; import Control.Monad.Cont&lt;br /&gt;&amp;gt; import IO&lt;br /&gt;&lt;br /&gt;&amp;gt; infixl 1 #&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;So back to the impossible function: all monads &lt;tt&gt;m&lt;/tt&gt; come equipped with a function of type &lt;tt&gt;a -&amp;gt; m a&lt;/tt&gt;. But it's well known that you can't "extract elements back out of the monad" because there is no function of type &lt;tt&gt;m a -&amp;gt; a&lt;/tt&gt;. So my goal today will be to write such a function.  Clearly the first line of code ought to be:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; extract :: Monad m =&amp;gt; m a -&amp;gt; a&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;The rest of the post will fill in the details.&lt;br /&gt;&lt;br /&gt;The type declaration tells us what role it plays syntactically, ie. it tells us how we can write code with &lt;tt&gt;extract&lt;/tt&gt; that type checks. But what should the semantics be?&lt;br /&gt;&lt;br /&gt;For the IO monad an answer is easy to guess. &lt;tt&gt;1 + extract (readLn :: IO Int)&lt;/tt&gt; should execute &lt;tt&gt;readLn&lt;/tt&gt; by reading n integer from stdin, strip of the &lt;tt&gt;IO&lt;/tt&gt; part of the the return value and then add 1 to the result. In fact, Haskell already has a function that does exactly that, &lt;tt&gt;unsafePerformIO&lt;/tt&gt;. The goal here is to implement &lt;tt&gt;extract&lt;/tt&gt; in a way that works with any monad.&lt;br /&gt;&lt;br /&gt;What might we expect the value of &lt;tt&gt;1 + extract [1,2,3]&lt;/tt&gt; to be? The value of &lt;tt&gt;extract [1,2,3]&lt;/tt&gt; surely must be 1, 2 or 3. But which one? And what happens if the list is empty? There really isn't any way of answering this while remaining *purely* functional. But if we were running code on a suitable machine we could fork three threads returning one of 1, 2 and 3 in each thread, and then collecing th results together in a list. In other words, we'd expect the final result to be &lt;tt&gt;[2,3,4]&lt;/tt&gt;. This would make &lt;tt&gt;extract&lt;/tt&gt; a lot like McCarthy's &lt;a href="http://www.randomhacks.net/articles/2005/10/11/amb-operator"&gt;ambiguous operator&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;So it's clear that we can't interpret &lt;tt&gt;extract&lt;/tt&gt; as a pure function. But we could try implementing it on a new abstract machine. But there's another approach we could take. A couple of times I've talked about a function of type &lt;tt&gt;~~a -&amp;gt; a&lt;/tt&gt; where &lt;tt&gt;~a&lt;/tt&gt; is the type &lt;tt&gt;a -&amp;gt; Void&lt;/tt&gt; and &lt;tt&gt;Void&lt;/tt&gt; is the type with no elements. This corresponds to double negation elimination in classical logic. The Curry-Howard isomorphism tells us no such function can be implemented in a pure functional language, but we can translate expressions containing references to such a function into expressions that are completely pure. This is the so-called CPS translation. Anyway, I had this idea that we could do something limilar with monads so that we could translate expressions containing &lt;tt&gt;extract&lt;/tt&gt; into ordinary Haskell code. Turns out there's already a paper on how to do this: &lt;a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.43.8213"&gt;Representing Monads&lt;/a&gt; by Andrzej Filinski.&lt;br /&gt;&lt;br /&gt;To translate all of Haskell this way would be a messy business. But just for fun I thought I'd implement a simpler translation for a small subset of Haskell. It's simply this:&lt;br /&gt;&lt;br /&gt;For a choice of monad m denote the translation of both types and values by T. So x::a becomes T(x)::T(a). The translation is simply:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;&lt;br /&gt;T(a) = m a on types&lt;br /&gt;T(f x) = T(f) `ap` T(x)&lt;br /&gt;T(extract x) = join T(x)&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;The important thing is that &lt;tt&gt;extract&lt;/tt&gt; of type &lt;tt&gt;m a -&amp;gt; a&lt;/tt&gt; is replaced by &lt;tt&gt;join&lt;/tt&gt; of type &lt;tt&gt;m (m a) -&amp;gt; m a&lt;/tt&gt;.&lt;br /&gt;&lt;br /&gt;A great way to translate Haskell code is with Template Haskell. So here's some code:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; (#) x y = liftM2 AppE x y&lt;br /&gt;&lt;br /&gt;&amp;gt; rewrite :: Exp -&amp;gt; ExpQ&lt;br /&gt;&lt;br /&gt;&amp;gt; rewrite (AppE f x) = do&lt;br /&gt;&amp;gt;     e &amp;lt;- [| extract |]&lt;br /&gt;&amp;gt;     if f==e&lt;br /&gt;&amp;gt;         then [| join |] # rewrite x&lt;br /&gt;&amp;gt;         else [| ap |] # rewrite f # rewrite x&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Most of the rest is support for some forms of syntactic sugar. First the infix operators:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; rewrite (InfixE (Just x) f Nothing) =&lt;br /&gt;&amp;gt;     [| fmap |] # return f # rewrite x&lt;br /&gt;&lt;br /&gt;&amp;gt; rewrite (InfixE (Just x) f (Just y)) =&lt;br /&gt;&amp;gt;     [| liftM2 |] # return f # rewrite x # rewrite y&lt;br /&gt;&lt;br /&gt;&amp;gt; rewrite (InfixE Nothing f (Just y)) =&lt;br /&gt;&amp;gt;     [| fmap |] # ([| flip |] # return f) # rewrite y&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;And list operations. For example &lt;tt&gt;[a,b,c]&lt;/tt&gt; is sugar for &lt;tt&gt;a : b : c : []&lt;/tt&gt;.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; rewrite (ListE []) = [| return [] |]&lt;br /&gt;&lt;br /&gt;&amp;gt; rewrite (ListE (x:xs)) =&lt;br /&gt;&amp;gt;     [| liftM2 |] # [| (:) |] # rewrite x # rewrite (ListE xs)&lt;br /&gt;&lt;br /&gt;&amp;gt; rewrite x =&lt;br /&gt;&amp;gt;     [| return |] # return x&lt;br /&gt;&lt;br /&gt;&amp;gt; test :: ExpQ -&amp;gt; ExpQ&lt;br /&gt;&amp;gt; test = (&amp;gt;&amp;gt;= rewrite)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;tt&gt;extract&lt;/tt&gt; itself is just a placeholder that is supposd to be translated away:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; extract = undefined&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;If the above is placed in a file called Test.lhs then you can try compiling the following code.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;{-# LANGUAGE TemplateHaskell #-}&lt;br /&gt;&lt;br /&gt;import Test&lt;br /&gt;import Control.Monad&lt;br /&gt;import Control.Monad.Reader&lt;br /&gt;import Control.Monad.Cont&lt;br /&gt;&lt;br /&gt;ex1 = $(test [|&lt;br /&gt;        extract [1,2] + extract [10,20]&lt;br /&gt;    |])&lt;br /&gt;ex2 = runReader $(test [|&lt;br /&gt;        extract ask + 7&lt;br /&gt;    |]) 10&lt;br /&gt;ex3 = $(test [|&lt;br /&gt;        1+extract (readLn :: IO Int)&lt;br /&gt;    |] )&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;The big omission is the lack of translation for lambda abstractions. I think that to get this right might requires translating all of the code using -|extract|- from the ground up, not just isolated expressions like those above. And like with CPS, you lose referential transparency and the order in which expressions are evaluated makes a difference.&lt;br /&gt;&lt;br /&gt;Anyway, this is a partial answer to the question posed &lt;a href="http://www.haskell.org/pipermail/haskell-cafe/2006-December/020295.html"&gt;here&lt;/a&gt; on "automonadization".&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-4935385668423688429?l=blog.sigfpe.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sigfpe.com/feeds/4935385668423688429/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=11295132&amp;postID=4935385668423688429' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/4935385668423688429'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/4935385668423688429'/><link rel='alternate' type='text/html' href='http://blog.sigfpe.com/2009/01/rewriting-monadic-expressions-with.html' title='Rewriting Monadic Expressions with Template Haskell'/><author><name>sigfpe</name><uri>http://www.blogger.com/profile/08096190433222340957</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='09401818062305273147'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11295132.post-4013221249502217550</id><published>2008-12-24T13:19:00.001-08:00</published><updated>2008-12-24T17:01:20.986-08:00</updated><title type='text'>The Mother of all Monads</title><content type='html'>Suppose someone stole all the monads but one, which monad would you want it to be? If you're a Haskell programmer you wouldn't be too bothered, you could just roll your own monads using nothing more than functions.&lt;br /&gt;&lt;br /&gt;But suppose someone stole do-notation leaving you with a version that only supported one type of monad. Which one would you choose? Rolling your own Haskell syntax is hard so you really want to choose wisely. Is there a universal monad that encompasses the functionality of all other monads?&lt;br /&gt;&lt;br /&gt;I often find I learn more computer science by trying to decode random isolated sentences than from reading entire papers. About a year ago I must have skimmed this &lt;a href="http://sneezy.cs.nott.ac.uk/fplunch/weblog/?m=200712"&gt;post&lt;/a&gt; because the line "the continuation monad is in some sense the mother of all monads" became stuck in my head. So maybe &lt;tt&gt;Cont&lt;/tt&gt; is the monad we should choose. This post is my investigation of why exactly it's the best choice. Along the way I'll also try to give some insight into how you can make practical use the continuation monad. I'm deliberately going to avoid discussing the underlying mechanism that makes continuations work.&lt;br /&gt;&lt;br /&gt;So let's start with this simple piece of code&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; import Control.Monad.Cont&lt;br /&gt;&lt;br /&gt;&amp;gt; ex1 = do&lt;br /&gt;&amp;gt;   a &amp;lt;- return 1&lt;br /&gt;&amp;gt;   b &amp;lt;- return 10&lt;br /&gt;&amp;gt;   return $ a+b&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;I haven't specified the monad but in almost every case we'd expect the result to have something to do with the number 11. For the list monad we get &lt;tt&gt;[11]&lt;/tt&gt;, for the &lt;tt&gt;Maybe&lt;/tt&gt; monad we get &lt;tt&gt;Just 11&lt;/tt&gt; and so on. For the &lt;tt&gt;Cont&lt;/tt&gt; monad we get something that takes a function, and applies it to 11. Here's an example of its use:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; test1 = runCont ex1 show&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;tt&gt;ex1&lt;/tt&gt; is just a function that takes as argument &lt;tt&gt;show&lt;/tt&gt; and applies it to 11 to give the string &lt;tt&gt;"11"&lt;/tt&gt;. &lt;tt&gt;Cont&lt;/tt&gt; and &lt;tt&gt;runCont&lt;/tt&gt; are just wrapping and unwrapping functions that we can mostly ignore.&lt;br /&gt;&lt;br /&gt;We could have done that without continuations. So what exactly does the &lt;tt&gt;Cont&lt;/tt&gt; monad give us here? Well let's make a 'hole' in the code above:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_UdKHLrHa05M/SVKnlvBBkiI/AAAAAAAAARQ/ovyCuO3I3u0/s1600-h/hole1.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 270px; height: 196px;" src="http://4.bp.blogspot.com/_UdKHLrHa05M/SVKnlvBBkiI/AAAAAAAAARQ/ovyCuO3I3u0/s400/hole1.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5283469579576775202" /&gt;&lt;/a&gt;&lt;br /&gt;Whatever integer we place in the hole, the value of &lt;tt&gt;test1&lt;/tt&gt; will be the result of adding one and applying &lt;tt&gt;show&lt;/tt&gt;. So we can think of that picture as being a function whose argument we shove in the hole. Now Haskell is a functional programming language so we expect that we can somehow reify that function and get our hands on it. That's exactly what the continuation monad &lt;tt&gt;Cont&lt;/tt&gt; does. Let's call the function we're talking about by the name &lt;tt&gt;fred&lt;/tt&gt;. How can we get our hands on it? It's with this piece code:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;ex1 = do&lt;br /&gt;  a &lt;- return 1&lt;br /&gt;  b &lt;- Cont (\fred -&gt; ...)&lt;br /&gt;  return $ a+b&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;The &lt;tt&gt;...&lt;/tt&gt; is a context in which &lt;tt&gt;fred&lt;/tt&gt; represents "the entire surrounding computation". Such a computaton is known as a "continuation". It's a bit hard to get your head around but the &lt;tt&gt;Cont&lt;/tt&gt; monad allows you to write subexpressions that are able to "capture" the entirety of the code around them, as far as the function provided to &lt;tt&gt;runCont&lt;/tt&gt;. To show that this is the case let's apply &lt;tt&gt;fred&lt;/tt&gt; to the number 10:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; ex2 = do&lt;br /&gt;&amp;gt;   a &amp;lt;- return 1&lt;br /&gt;&amp;gt;   b &amp;lt;- Cont (\fred -&amp;gt; fred 10)&lt;br /&gt;&amp;gt;   return $ a+b&lt;br /&gt;&lt;br /&gt;&amp;gt; test2 = runCont ex2 show&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;The entire computation is applied to 10 and we get &lt;tt&gt;"11"&lt;/tt&gt;. Now you know what &lt;tt&gt;return&lt;/tt&gt; does in this monad. But that's a convoluted way of doing things. What other advantages do we get? Well the expression for &lt;tt&gt;b&lt;/tt&gt; can do whatever it wants with &lt;tt&gt;fred&lt;/tt&gt; as long as it returns the same type, ie. a string. So we can write this:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; ex3 = do&lt;br /&gt;&amp;gt;   a &amp;lt;- return 1&lt;br /&gt;&amp;gt;   b &amp;lt;- Cont (\fred -&amp;gt; "escape")&lt;br /&gt;&amp;gt;   return $ a+b&lt;br /&gt;&lt;br /&gt;&amp;gt; test3 = runCont ex3 show&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;tt&gt;fred&lt;/tt&gt; is completely ignored. The entire computation is thrown away and instead of applying &lt;tt&gt;show&lt;/tt&gt; to a number, we simply return &lt;tt&gt;"escape"&lt;/tt&gt;. In other words, we have a mechanism for throwing values out of a computation. So continuations provide, among other things, an exception handling mechanism. But that's curious, because that's exactly what the &lt;tt&gt;Maybe&lt;/tt&gt; monad provides. It looks like we might be able to simulate &lt;tt&gt;Maybe&lt;/tt&gt; this way. But rather than do that, let's do something even more radical.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; ex4 = do&lt;br /&gt;&amp;gt;   a &amp;lt;- return 1&lt;br /&gt;&amp;gt;   b &amp;lt;- Cont (\fred -&amp;gt; fred 10 ++ fred 20)&lt;br /&gt;&amp;gt;   return $ a+b&lt;br /&gt;&lt;br /&gt;&amp;gt; test4 = runCont ex4 show&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;We've used &lt;tt&gt;fred&lt;/tt&gt; twice. We've made the code around our "hole" run twice, each time executing with a different starting value. Continuations allow mere subexpressions to take complete control of the expressions within which they lie. That should remind you of something. It's just like the list monad. The above code is a lot like&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; test5 = do&lt;br /&gt;&amp;gt;   a &amp;lt;- return 1&lt;br /&gt;&amp;gt;   b &amp;lt;- [10,20]&lt;br /&gt;&amp;gt;   return $ a+b&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;So can we emulate the list monad? Well instead of converting our integer to a string at the end we want to convert it to a list. So this will work:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; ex6 = do&lt;br /&gt;&amp;gt;   a &amp;lt;- return 1&lt;br /&gt;&amp;gt;   b &amp;lt;- Cont (\fred -&amp;gt; fred 10 ++ fred 20)&lt;br /&gt;&amp;gt;   return $ a+b&lt;br /&gt;&lt;br /&gt;&amp;gt; test6 = runCont ex6 (\x -&amp;gt; [x])&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;We can avoid those &lt;tt&gt;++&lt;/tt&gt; operators by using &lt;tt&gt;concat&lt;/tt&gt;:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; ex7 = do&lt;br /&gt;&amp;gt;   a &amp;lt;- return 1&lt;br /&gt;&amp;gt;   b &amp;lt;- Cont (\fred -&amp;gt; concat [fred 10,fred 20])&lt;br /&gt;&amp;gt;   return $ a+b&lt;br /&gt;&lt;br /&gt;&amp;gt; test7 = runCont ex7 (\x -&amp;gt; [x])&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;But now you may notice we can remove almost every depepndence on the list type to get:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; ex8 = do&lt;br /&gt;&amp;gt;   a &amp;lt;- return 1&lt;br /&gt;&amp;gt;   b &amp;lt;- Cont (\fred -&amp;gt; [10,20] &amp;gt;&amp;gt;= fred)&lt;br /&gt;&amp;gt;   return $ a+b&lt;br /&gt;&lt;br /&gt;&amp;gt; test8 = runCont ex8 return&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Note, we're using monad related functions, but when we do so we're not using do-notation. We can now do one last thing to tidy this up:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; i x = Cont (\fred -&amp;gt; x &amp;gt;&amp;gt;= fred)&lt;br /&gt;&amp;gt; run m = runCont m return&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;And now we have something close to do-notation for the list monad at our disposal again:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; test9 = run $ do&lt;br /&gt;&amp;gt;   a &amp;lt;- i [1,2]&lt;br /&gt;&amp;gt;   b &amp;lt;- i [10,20]&lt;br /&gt;&amp;gt;   return $ a+b&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;I hope you can see how this works. &lt;tt&gt;i x&lt;/tt&gt; says that the continuation should be applied to &lt;tt&gt;x&lt;/tt&gt;, not as an ordinary function, but with &lt;tt&gt;&amp;gt;&amp;gt;=&lt;/tt&gt;. But that's just business as usual for monads. So the above should work for any monad.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; test10 = run $ do&lt;br /&gt;&amp;gt;   i $ print "What is your name?"&lt;br /&gt;&amp;gt;   name &amp;lt;- i getLine&lt;br /&gt;&amp;gt;   i $ print $ "Merry Xmas " ++ name&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;The Grinch has been foiled and we see that the continuation monad really is the mother of all monads.&lt;br /&gt;&lt;br /&gt;There are some interesting consequences of this beyond Haskell. Many languages with support for continuations should be extensible to support monads. In particular, if there is an elegant notation for continuations, there should be one for monads too. This is why I didn't want to talk about the underlying mechanism of the &lt;tt&gt;Cont&lt;/tt&gt; monad. Different languages can implement continuations in different ways. An extreme example is (non-portable) C where you can reify continuations by literally flushing out all registers to memory and grabbing the stack. In fact, I've used this to implement something like the list monad for searching in C. (Just for fun, not for real work.) Scheme has &lt;tt&gt;call-with-current-continuation&lt;/tt&gt; which can be used similarly. And even Python's &lt;tt&gt;yield&lt;/tt&gt; does something a little like reifying a continuation and might be usable this way. (Is that's what's going on &lt;a href="http://www.valuedlessons.com/2008/01/monads-in-python-with-nice-syntax.html"&gt;here&lt;/a&gt;? I haven't read that yet.).&lt;br /&gt;&lt;br /&gt;This post was also inspired by &lt;a href="http://www.diku.dk/~andrzej/papers/RM-abstract.html"&gt;this paper&lt;/a&gt; by Filinski. I haven't followed the details yet (it's tricky) but the gist is similar. I was actually looking at Filinski's paper because of something I'll mention in my next post.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-4013221249502217550?l=blog.sigfpe.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sigfpe.com/feeds/4013221249502217550/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=11295132&amp;postID=4013221249502217550' title='15 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/4013221249502217550'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/4013221249502217550'/><link rel='alternate' type='text/html' href='http://blog.sigfpe.com/2008/12/mother-of-all-monads.html' title='The Mother of all Monads'/><author><name>sigfpe</name><uri>http://www.blogger.com/profile/08096190433222340957</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='09401818062305273147'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_UdKHLrHa05M/SVKnlvBBkiI/AAAAAAAAARQ/ovyCuO3I3u0/s72-c/hole1.png' height='72' width='72'/><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>15</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11295132.post-7373595230260409306</id><published>2008-11-29T16:44:00.000-08:00</published><updated>2008-11-30T08:55:06.120-08:00</updated><title type='text'>An Approach to Algorithm Parallelisation</title><content type='html'>The other day I came across the paper &lt;a href="http://portal.acm.org/citation.cfm?id=773473.178255"&gt;Parallelizing Complex Scans and Reductions&lt;/a&gt; lying on a colleague's desk. The first part of the paper discussed how to make a certain family of algorithms run faster on parallel machines and the second half of the paper went on to show how, with some work, the method could be stretched to a wider class of algorithm. What the authors seemed to miss was that the extra work really wasn't necessary and the methods of the first half apply, with no change, to the second half. But don't take this as a criticism! I learnt a whole new way to approach algorithm design, and the trick to making the second half easy uses methods that have become popular in more recent years. Doing a web search I found lots of papers describing something similar to what I did.&lt;br /&gt;&lt;br /&gt;This is also a nice example of how the notion of abstraction in computing and the notion of abstraction in mathematics are exactly the same thing. But I'm getting ahead of myself.&lt;br /&gt;&lt;br /&gt;So here a direct translation from he paper of some procedural code to find the largest sum that can be made from a subsequence of a sequence. This will be our first implementation of the problem examined in the second half of the paper:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; f1 [] (sofar,max) = (sofar,max)&lt;br /&gt;&amp;gt; f1 (b:bs) (sofar,max) =&lt;br /&gt;&amp;gt;     let sofar' = if sofar+b&amp;lt;0&lt;br /&gt;&amp;gt;             then 0&lt;br /&gt;&amp;gt;             else sofar+b&lt;br /&gt;&amp;gt;         max' = if max&amp;lt;sofar'&lt;br /&gt;&amp;gt;             then sofar'&lt;br /&gt;&amp;gt;             else max&lt;br /&gt;&amp;gt;     in f1 bs (sofar',max')&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;tt&gt;sofar&lt;/tt&gt; is a running sum that is reset each time it dips below zero, and &lt;tt&gt;max&lt;/tt&gt; keeps track of the best sum so far. We initialise &lt;tt&gt;sofar&lt;/tt&gt; and &lt;tt&gt;sum&lt;/tt&gt; with &lt;tt&gt;0&lt;/tt&gt; and &lt;tt&gt;-infinity&lt;/tt&gt;. Here's an example of how to ue it:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; b :: [Double]&lt;br /&gt;&amp;gt; b = [1..5] ++ [5,4..(-10)] ++ [(-2)..6]&lt;br /&gt;&lt;br /&gt;&amp;gt; infinity :: Double&lt;br /&gt;&amp;gt; infinity = 1/0&lt;br /&gt;&amp;gt; test1 b = snd $ f1 b (0,-infinity)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Notice how we prime the algorithm with a starting vector. The &lt;tt&gt;0&lt;/tt&gt; corresponds to the fact that at the start we've summed over 0 elements and the &lt;tt&gt;-infinity&lt;/tt&gt; corresponds to the fact that we want the first sum computed to be the highest so far at that point.&lt;br /&gt;&lt;br /&gt;Test the code with &lt;tt&gt;test1 b&lt;/tt&gt;. We'll use a similar pattern all the way through this code.&lt;br /&gt;&lt;br /&gt;You may see the problem with making this parallelisable: we are maintaining running sums so that the final values of &lt;tt&gt;sofar&lt;/tt&gt; and &lt;tt&gt;max&lt;/tt&gt; all depend on what was computed earlier. It's not obvious that we can break this up into pieces. &lt;tt&gt;sofar&lt;/tt&gt; computes sums of subsequences between resets but chopping the array &lt;tt&gt;b&lt;/tt&gt; into pieces might split such subsequences between processors. How can we handle this cleanly?&lt;br /&gt;&lt;br /&gt;The first step is to write version two of the above function using &lt;tt&gt;max&lt;/tt&gt; instead of conditionals:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; f2 [] (sofar,max) = (sofar,max)&lt;br /&gt;&amp;gt; f2 (b:bs) (sofar,max) =&lt;br /&gt;&amp;gt;     let sofar' = Prelude.max (sofar+b) 0&lt;br /&gt;&amp;gt;         max'   = Prelude.max max sofar'&lt;br /&gt;&amp;gt;     in f2 bs (sofar',max')&lt;br /&gt;&lt;br /&gt;&amp;gt; test2 b = snd $ f2 b (0,-infinity)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;But that doesn't appear to make things any easier, we've just buried the conditionals inside &lt;tt&gt;max&lt;/tt&gt;, it doesn't make the serial dependency go away.&lt;br /&gt;&lt;br /&gt;So let's solve another problem instead. In &lt;tt&gt;f2&lt;/tt&gt; I'll replace &lt;tt&gt;max&lt;/tt&gt; with addition and addition with multiplication. &lt;tt&gt;0&lt;/tt&gt; is the identity for addition so we should replace it with the identity for multiplication, &lt;tt&gt;1&lt;/tt&gt;. Similarly, &lt;tt&gt;-infinity&lt;/tt&gt; is the identity for &lt;tt&gt;max&lt;/tt&gt; so we should replace that with &lt;tt&gt;0&lt;/tt&gt;. We get:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; f3 [] (sofar,max) = (sofar,max)&lt;br /&gt;&amp;gt; f3 (b:bs) (sofar,max) =&lt;br /&gt;&amp;gt;     let sofar' = sofar*b+1&lt;br /&gt;&amp;gt;         max'   = max+sofar'&lt;br /&gt;&amp;gt;     in f3 bs (sofar',max')&lt;br /&gt;&lt;br /&gt;&amp;gt; test3 b = snd $ f3 b (1,0)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;That's all very well but (1) it looks no more parallelisable and (2) it's solving the wrong problem. Let's ignore problem (2) for now.&lt;br /&gt;&lt;br /&gt;The thing that makes &lt;tt&gt;f3&lt;/tt&gt; easier to work with is that it's almost a linear function acting on the vector &lt;tt&gt;(sofar,max)&lt;/tt&gt;. Linear functions have one very nice property. If f and g are linear then we can compute f(g(x)) by acting with g first, and then applying f. But we can also compose f and g without reference to x giving us another linear function. We only have to know how f and g act on basis elements and we can immediately compute how &lt;tt&gt;f . g&lt;/tt&gt; acts on basis elements. This is usually expressed by writing f and g as matrices. So let's tweak &lt;tt&gt;f3&lt;/tt&gt; so it's linear in its last argument:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; f4 [] (sofar,max,i) = (sofar,max,i)&lt;br /&gt;&amp;gt; f4 (b:bs) (sofar,max,i) =&lt;br /&gt;&amp;gt;     let sofar' = (sofar * b) + i&lt;br /&gt;&amp;gt;         max'   = max + sofar'&lt;br /&gt;&amp;gt;         i'     = i&lt;br /&gt;&amp;gt;     in f4 bs (sofar',max',i')&lt;br /&gt;&lt;br /&gt;&amp;gt; test4 b = let (_,max,_) = f4 b (1,0,1) in max&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;So now I need to write some code to work with linear functions. I'll do it in a very direct style. Here are some tuples representing basis vectors. (I could have written some fancy vector/matrix code but I don't want to distract from the problem in hand.)&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; x,y,z :: Num a =&amp;gt; (a,a,a)&lt;br /&gt;&amp;gt; x = (1,0,0)&lt;br /&gt;&amp;gt; y = (0,1,0)&lt;br /&gt;&amp;gt; z = (0,0,1)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;And here's some code that computes how a function acts on a basis, in effect finding the matrix for our function with respect to the basis &lt;tt&gt;x,y,z&lt;/tt&gt;:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; matrix f = (f x,f y,f z)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Some simple operations on vectors:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; (a,b,c) .+ (d,e,f) = (a + d,b + e,c + f)&lt;br /&gt;&amp;gt; a .* (b,c,d) = (a * b,a * c,a * d)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;And now a little function that, given how &lt;tt&gt;f&lt;/tt&gt; acts on basis elements, can apply &lt;tt&gt;f&lt;/tt&gt; to any vector: &lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; apply (mx,my,mz) (sofar,max,i) = (sofar .* mx) .+ (max .* my) .+ (i .* mz)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Now we can redo the calculation with &lt;tt&gt;f4&lt;/tt&gt; by first making the matrix for &lt;tt&gt;f4&lt;/tt&gt;, and then applying that to our starting vector.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; test5 b = let m = matrix (f4 b)&lt;br /&gt;&amp;gt;               (_,max,_) = apply m (1,0,1)&lt;br /&gt;&amp;gt;           in max&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Note how by time we've computed &lt;tt&gt;m&lt;/tt&gt; we've done almost all of the work even though the code hasn't yet touched &lt;tt&gt;(1,0,1)&lt;/tt&gt;.&lt;br /&gt;&lt;br /&gt;But now comes the first bit of magic. We can split our list &lt;tt&gt;b&lt;/tt&gt; into pieces. Compute the corresponding matrix for each piece on a separate processor, and then apply the resulting matrices to our starting vector.&lt;br /&gt;&lt;br /&gt;Let's chop our list of reals into pieces of size &lt;tt&gt;n&lt;/tt&gt;:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; chop n [] = []&lt;br /&gt;&amp;gt; chop n l = let (a,b) = splitAt n l in a : chop n b&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;We'll use pieces of size 10:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; test6 b = max where&lt;br /&gt;&amp;gt;    (_,max,_) = foldr ($) (1,0,1) (reverse b_functions) where&lt;br /&gt;&amp;gt;    b_pieces = chop 10 b&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;The following &lt;tt&gt;map&lt;/tt&gt;s should be replaced with a parallel version. It's easy to do this.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt;    b_matrices = map (matrix . f4) b_pieces&lt;br /&gt;&amp;gt;    b_functions = map apply b_matrices&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Great, we've successfully parallelised the code, but it's the wrong algorithm. How can we use this to solve the correct problem? Remember how we replaced &lt;tt&gt;max&lt;/tt&gt; with addition and addition with multiplication. We just have to undo that. That's all! Everything required to prove that the above parallelisation is valid applies over any &lt;a href="http://en.wikipedia.org/wiki/Semiring"&gt;semiring&lt;/a&gt;. At no point did we divide or subtract, and we only used elementary properties of numbers like a*(b+c) = a*b+a*c. That property holds for &lt;tt&gt;max&lt;/tt&gt; and addition. In fact a+max(b,c) = max(a+b,a+c). We don't even have to modify the code. We can just define the max-plus semiring as follows:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; newtype MaxPlus = M Double deriving (Eq,Show)&lt;br /&gt;&lt;br /&gt;&amp;gt; instance Num MaxPlus where&lt;br /&gt;&amp;gt;     fromInteger 0 = M (-infinity)&lt;br /&gt;&amp;gt;     fromInteger 1 = M 0&lt;br /&gt;&amp;gt;     fromInteger _ = error "no conversion from general integer"&lt;br /&gt;&amp;gt;     M a + M b = M (max a b)&lt;br /&gt;&amp;gt;     M a * M b = M (a+b)&lt;br /&gt;&amp;gt;     abs _ = error "no abs"&lt;br /&gt;&amp;gt;     signum _ = error "no signum"&lt;br /&gt;&amp;gt;     negate _ = error "no negate"&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;And now all we need is&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; test7 b = test6 (fmap M b)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;(I wonder if ghc is smart enough to completely eliminate that &lt;tt&gt;fmap M&lt;/tt&gt;, after all, on a newtype, &lt;tt&gt;M&lt;/tt&gt; should do zero work.)&lt;br /&gt;&lt;br /&gt;And that's a completely parallelised version of the original algorithm.&lt;br /&gt;&lt;br /&gt;There is a ton of optimisation that can be performed here. In particular, &lt;tt&gt;matrix&lt;/tt&gt; applies a function to a fixed basis. For the particular function we're using here there's a big win from constant folding. The same constant folding applies in any semiring.&lt;br /&gt;&lt;br /&gt;And back to the point I made at the beginning. By abstracting from the reals to a general semiring we are able to make the same code perform multiple duties: it can work on functions linear over many semirings, not just the reals. Mathematicians don't work with abstractions just to make life hell for students - they do so because working with abstract entities allows the same words to be reused in a valid way in many different contexts. This benefits both mathematicians and computer scientists.&lt;br /&gt;&lt;br /&gt;Here's a &lt;a href="http://www.google.com/search?q=parallel+prefix"&gt;link&lt;/a&gt; you can use to find out more on this technique.&lt;br /&gt;&lt;br /&gt;Note that in real world usage you wouldn't use lists. -|chop|- would take longer than the rest of the algorithm.&lt;br /&gt;&lt;br /&gt;PS A curious aside. I spent ages trying to get ghc to compile this code and getting my homebrew HTML markup code to work reliably on it. But eventually I located the problem. I've just returned from Hawai`i where I wrote most of this code. Just for fun I'd put my keyboard into Hawai`ian mode and forgot about it. When I did that, the single quote key started generating the unicode symbol for the Hawai`ian glottal stop. It looks just like a regular single quote in my current terminal font so it was hard to see anything wrong with the code. But, quite reasonably, ghc and many other tools barf if you try to use one where a regular quote is expected.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-7373595230260409306?l=blog.sigfpe.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sigfpe.com/feeds/7373595230260409306/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=11295132&amp;postID=7373595230260409306' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/7373595230260409306'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/7373595230260409306'/><link rel='alternate' type='text/html' href='http://blog.sigfpe.com/2008/11/approach-to-algorithm-parallelisation.html' title='An Approach to Algorithm Parallelisation'/><author><name>sigfpe</name><uri>http://www.blogger.com/profile/08096190433222340957</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='09401818062305273147'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11295132.post-4054854230903521571</id><published>2008-11-15T12:52:00.000-08:00</published><updated>2008-11-15T16:09:46.397-08:00</updated><title type='text'>Some thoughts on reasoning and monads</title><content type='html'>Just a short note about some half-formed thoughts on the subject of monads and reasoning.&lt;br /&gt;&lt;br /&gt;Haskell types and total functions, at least for a suitable subset of Haskell, form a category, with types and functions as the objects and arrows. Haskell code is usually full  of expressions whose types don't correspond to arrows (e.g.. &lt;tt&gt;1::Int&lt;/tt&gt;). But in category theory we can only really talk of objects and arrows, not elements of objects. Many categories are made up of objects that have no reasonable notion of an element. By writing code in pointfree style we can eliminate reference to individual elements and talk only about arrows.&lt;br /&gt;&lt;br /&gt;For example, suppose &lt;tt&gt;f :: A -&amp;gt; B&lt;/tt&gt; and &lt;tt&gt;g :: B -&amp;gt; C&lt;/tt&gt;. Then we can define &lt;tt&gt;h :: A -&amp;gt; C&lt;/tt&gt; in a couple of different ways. Pointfree:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;h = g . f&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;or pointful:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;h x = let y = f x&lt;br /&gt;          z = g y&lt;br /&gt;      in z&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;For examples as simple as this it's not hard to write code that will translate between one style and the other. But the former definition has an advantage, it works in any category. We'll consider a very simple example. Let R be the category of real numbers, where there is an arrow from x to y if x&amp;le;y. We can think of f and g as arrows in this category. If we know A&amp;le;B and B&amp;le;C then the first definition above tells us how to construct an arrow from A to C and hence it proves that A&amp;le;C. The second definition makes no sense because it relies on the notion of x as an element of A and uses the functions f and g to construct elements of B and C. These words hold no meaning in the category R where the arrows aren't functions and the objects aren't containers of elements.&lt;br /&gt;&lt;br /&gt;Except that's not quite true.&lt;br /&gt;&lt;br /&gt;Because there is a scheme for translating the second pointful definition to pointfree form, the second definition does in fact provide a proof that A&amp;le;C. We just have to bear in mind that the proof needs to be translated into pointfree form first. In fact, we can happily spend our day using pointful style to generate proofs about R, as long as at the end of the day we translate our proofs to pointfree notation. In fact, Haskell programmers know that it's often much easier to write programs in pointful style so it seems reasonable to guess that there are many proofs that are easier to write in pointful style even though they can't be interpreted literally. Philosophically this is a bit weird. As long as we restrict ourselves to chains of reasoning that can be translated, we can use intuitions about elements of objects to make valid deductions about domains where these notions make no sense.&lt;br /&gt;&lt;br /&gt;Part I of &lt;a href="http://books.google.com.vc/books?id=6PY_emBeGjUC"&gt;Lambek and Scott&lt;/a&gt; is about the correct pointful language to use when talking about cartesian closed categories (CCCs). They use a form of typed lambda calculus. Every arrow in a cartesian category can be written as a pointful lambda expression. Even though there isn't a meaningful way to assign meaning to the 'points' individually, every lambda abstraction gives an arrow in a cartesian closed category that can be built using the standard parts that come with a CCC, and vice versa. The pointful definition of &lt;tt&gt;h&lt;/tt&gt; above is an example. Here's a (very) slightly less trivial example. Given &lt;tt&gt;f :: A -&amp;gt; C&lt;/tt&gt; and &lt;tt&gt;g :: B -&amp;gt; D&lt;/tt&gt; we can define &lt;tt&gt;h :: (A,C) -&amp;gt; Either B D&lt;/tt&gt; as&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;f x = let y = fst x&lt;br /&gt;          z = Left y&lt;br /&gt;      in z&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Again, that code can be translated into pointfree form using only the standard parts available in a cartesian closed category. Actually, we only need a category with products and coproducts for this example. So consider a lattice. The above code can be translated into a proof that A&amp;cap;C&amp;le;B&amp;cup;D. (I'm using cap and cup for join and meet.) Again, it makes no sense to talk of elements of a general lattice as containers of elements, and yet after translation to pointfree notation we have a valid proof. If you can restrict yourself to a suitable subset of set theory then your proofs involving elements of objects can carry over to many other categories.&lt;br /&gt;&lt;br /&gt;Part II of Lambek and Scott is about taking this idea to extremes. It's about categories known as toposes. A topos is a category that is even more like the category of sets than a CCC. It's still general enough that there are many kinds of toposes, but you can use a sizable portion of first order logic and ZF to make deductions about them. Again, the literal notion of membership of the objects of a topos might make no sense, but the proofs have a translation to pointfree notation. In fact, it's possible to write entire papers in what looks like conventional set theory language, and have them be valid for other toposes. &lt;a href="http://home.imf.au.dk/kock/"&gt;Anders Kock&lt;/a&gt;, for example, writes such papers. Chris Isham has been arguing that topos theory is the correct framework for physics. If you interpret your propositions as being in the category Set then you get classical physics. But those same propositions can be interpreted in other categories, such as one for quantum mechanics, giving a way to use and extend classical language to reason about quantum systems. This set theory-like language is known as the "internal language" of a topos.&lt;br /&gt;&lt;br /&gt;Anyway, I'm interested in the notion that Haskell do-notation provides another kind of pointful language that can be used to reason about situations where points don't seem at first to make sense. Consider the lattice of subsets of a topological space, ordered by inclusion. Let Cl(U) be the closure of U. Cl satisfies these properties:&lt;br /&gt;&lt;blockquote&gt;&lt;br /&gt;X&amp;sub;Y implies Cl(X)&amp;sub;Cl(Y)&lt;br /&gt;X&amp;sub;Cl(X)&lt;br /&gt;and&lt;br /&gt;Cl(CL(X))&amp;sub;Cl(X)&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;Look familiar? If we make this lattice into a category, with arrows being inclusions, then the first property states that Cl is a functor and the next two say that Cl is a monad. In fact, monads are a kind of generalised closure. So now suppose we're given A&amp;sub;Cl(B) and B&amp;sub;Cl(C) and wish to prove that A&amp;sub;Cl(C). We can rephrase this by saying that if &lt;tt&gt;f :: A -&amp;gt; Cl B&lt;/tt&gt; and &lt;tt&gt;g :: B -&amp;gt; Cl C&lt;/tt&gt; we need an arrow &lt;tt&gt;h :: A -&amp;gt; Cl C&lt;/tt&gt;. We can write one like this:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;h x = do&lt;br /&gt;    y &lt;- f x&lt;br /&gt;    z &lt;- g y&lt;br /&gt;    return z&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Now it's tempting to interpret the inclusions as functions with &lt;tt&gt;y &amp;lt;- f x&lt;/tt&gt; saying that &lt;tt&gt;y&lt;/tt&gt; is the image of &lt;tt&gt;x&lt;/tt&gt; under the inclusion. (I don’t know about you, but when I write &lt;tt&gt;x &amp;lt;- getChar&lt;/tt&gt; I think to myself “&lt;tt&gt;x&lt;/tt&gt; is the return value from calling &lt;tt&gt;getChar&lt;/tt&gt;”.) But that doesn't really work because the type of &lt;tt&gt;f x&lt;/tt&gt; is &lt;tt&gt;Cl B&lt;/tt&gt; but &lt;tt&gt;y&lt;/tt&gt; is of type &lt;tt&gt;B&lt;/tt&gt;. On the other hand, we can radically reinterpret the above as something like this: when arguing about chains of inclusions of subsets of a topological space, as long as at the end of the chain of inclusions you always end up in the closure of some subset, you're allowed to cheat and nudge a generic point in the closure of a subset back into the original subset. This is exactly parallel to the way do-notation seems to allow us to extract elements out of monadic objects as long as at the end of the do-block we always return an element of monadic type. I'm sure that with a bit of work we could produce a rigorous metatheorem from this. I also expect we can also produce something similar for comonads.&lt;br /&gt;&lt;br /&gt;Anyway, the moral is that when working with categories with monads there are may be some interesting and unusual ways to reason. The example of the lattice of subsets is fairly trivial but I'm sure there are other interesting examples. I also expect there's a nice connection with modal logic. I now think of Haskell do-notation as the "internal language" of a category with monads.&lt;br /&gt;&lt;br /&gt;Update: I left out the crucial sentences I meant to write. It's easy to see do-notation as a kind of Haskell specific trick for making things like IO heavy code look like traditional procedural code. But comparison with the theory in Lambek and Scott, and Topos Theory in general, makes it clear that do-notation is a member of a family of related languages.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-4054854230903521571?l=blog.sigfpe.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sigfpe.com/feeds/4054854230903521571/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=11295132&amp;postID=4054854230903521571' title='9 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/4054854230903521571'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/4054854230903521571'/><link rel='alternate' type='text/html' href='http://blog.sigfpe.com/2008/11/some-thoughts-on-reasoning-and-monads.html' title='Some thoughts on reasoning and monads'/><author><name>sigfpe</name><uri>http://www.blogger.com/profile/08096190433222340957</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='09401818062305273147'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>9</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11295132.post-1982948805908213964</id><published>2008-11-08T10:13:00.000-08:00</published><updated>2008-11-09T07:43:23.598-08:00</updated><title type='text'>From Monoids to Monads</title><content type='html'>&lt;H3&gt;Generalising Monoids&lt;/H3&gt;&lt;br /&gt;&lt;br /&gt;The word 'monad' is derived from the word 'monoid'. The explanation usually given is that there is an analogy between monoids and monads. On the surface, this seems a bit unlikely. The &lt;tt&gt;join&lt;/tt&gt; operation in a monad is supposed to correspond to the binary operator in the monoid, but &lt;tt&gt;join&lt;/tt&gt; is a completely different kind of thing, certainly not a binary operator in any usual sense.&lt;br /&gt;&lt;br /&gt;I'm going to make this analogy precise so that it's clear that both monoids and monads are examples of the same construction. In fact, I'm going to write some Haskell code to define monoids and monads in almost exactly the same way. I was surprised to find I could do this because instances of Haskell's &lt;tt&gt;Monoid&lt;/tt&gt; and &lt;tt&gt;Monad&lt;/tt&gt; aren't even the same kind of thing (where I'm using 'kind' in its &lt;a href="http://www.haskell.org/onlinereport/decls.html"&gt;technical&lt;/a&gt; sense). But it can be done.&lt;br /&gt;&lt;br /&gt;So let's start thinking about monoids. They are traditionally sets equipped with a special element and a binary operator so that the special element acts as an identity for the binary operator, and where the binary operator is associative. We expect type signatures something like &lt;tt&gt;one :: m&lt;/tt&gt; and &lt;tt&gt;mult :: m -&amp;gt; m -&amp;gt; m&lt;/tt&gt; so that, for example, &lt;tt&gt;m (m a b) c == m a (m b c)&lt;/tt&gt;. That's fine as it stands, but it doesn't generalise easily. In particular it'd be nice to generalise this definition to other categories. To do that we need to rephrase the definitions so that they are completely &lt;a href="http://www.haskell.org/haskellwiki/Pointfree"&gt;point-free&lt;/a&gt; and are written purely as the compositions of arrows.&lt;br /&gt;&lt;br /&gt;Let's start by thinking about the rule that says multiplying by the identity on the left should leave a value unchanged. Ie. we want &lt;tt&gt;mult one x == x&lt;/tt&gt; for all &lt;tt&gt;x&lt;/tt&gt;. We already have a problem, it refers to a couple of 'points', both &lt;tt&gt;one&lt;/tt&gt;, the identity, and &lt;tt&gt;x&lt;/tt&gt; the unknown. We can deal with the first one easily, we just replace &lt;tt&gt;one&lt;/tt&gt; with an arrow &lt;tt&gt;() -&amp;gt; m&lt;/tt&gt;. But we also need some plumbing to provide two arguments to the &lt;tt&gt;mult&lt;/tt&gt; function. Rather than belabour the point, I'll just give the full code:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; {-# OPTIONS_GHC -fglasgow-exts #-}&lt;br /&gt;&lt;br /&gt;&amp;gt; import Control.Monad&lt;br /&gt;&amp;gt; import Test.QuickCheck&lt;br /&gt;&lt;br /&gt;&amp;gt; class Monoid m where&lt;br /&gt;&amp;gt;     one :: () -&amp;gt; m&lt;br /&gt;&amp;gt;     mult :: (m,m) -&amp;gt; m&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;The law for multiplication on the left is then given by &lt;tt&gt;law1_left == law1_middle&lt;/tt&gt; and so on:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt;     law1_left,law1_middle,law1_right :: m -&amp;gt; m&lt;br /&gt;&amp;gt;     law1_left   = mult . (one &amp;lt;#&amp;gt; id) . lambda&lt;br /&gt;&amp;gt;     law1_middle = id&lt;br /&gt;&amp;gt;     law1_right  = mult . (id &amp;lt;#&amp;gt; one) . rho&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;The associativity law is then given by &lt;tt&gt;law2_left x = law2_right x&lt;/tt&gt;.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt;     law2_left,law2_right :: ((m,m),m) -&amp;gt; m&lt;br /&gt;&amp;gt;     law2_left = mult . (mult &amp;lt;#&amp;gt; id)&lt;br /&gt;&amp;gt;     law2_right = mult . (id &amp;lt;#&amp;gt; mult) . alpha&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;The left and right hand sides of the laws are now point-free. But in order to do this I've had to write some auxiliary functions:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; lambda :: a -&amp;gt; ((),a)&lt;br /&gt;&amp;gt; lambda x = ((),x)&lt;br /&gt;&lt;br /&gt;&amp;gt; rho :: a -&amp;gt; (a,())&lt;br /&gt;&amp;gt; rho x = (x,())&lt;br /&gt;&lt;br /&gt;&amp;gt; alpha :: ((a,b),c) -&amp;gt; (a,(b,c))&lt;br /&gt;&amp;gt; alpha ((x,y),z) = (x,(y,z))&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;I've also used the fact that &lt;tt&gt;(,)&lt;/tt&gt; is a bifunctor, ie. it's a functor in each of its arguments so &lt;tt&gt;(,)&lt;/tt&gt; doesn't just give a way to generate a new type &lt;tt&gt;(a,b)&lt;/tt&gt; from types &lt;tt&gt;a&lt;/tt&gt; and &lt;tt&gt;b&lt;/tt&gt;. It also combines pairs of arrows to make new arrows. I'll call the part of &lt;tt&gt;(,)&lt;/tt&gt; that acts on arrows by the name &lt;tt&gt;&amp;lt;#&amp;gt;&lt;/tt&gt;:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; (&amp;lt;#&amp;gt;) :: (a -&amp;gt; c) -&amp;gt; (b -&amp;gt; d) -&amp;gt; (a, b) -&amp;gt; (c, d)&lt;br /&gt;&amp;gt; (f &amp;lt;#&amp;gt; g) (x,y) = (f x,g y)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Intuitively, &lt;tt&gt;f &amp;lt;#&amp;gt; g&lt;/tt&gt; maps &lt;tt&gt;f&lt;/tt&gt; on the left and &lt;tt&gt;g&lt;/tt&gt; on the right of &lt;tt&gt;(a,b)&lt;/tt&gt;&lt;br /&gt;&lt;br /&gt;Try unpacking those definitions to see that we get the usual monoid laws. For example&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;law2_left ((x,y),z) == mult $ (mult &lt;#&gt; id) ((x,y),z) == mult (mult (x,y),z)&lt;br /&gt;law2_right ((x,y),z) == mult $ (id &lt;#&gt; mult) $ alpha ((x,y),z) == mult (x,mult (y,z))&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;So we get&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;mult (mult (x,y),z) ==  mult (x,mult (y,z))&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;the usual associativity law.&lt;br /&gt;&lt;br /&gt;Now that this definition is point-free it seems we could just carry it over to any category. In fact, we've implicitly done this because we've carried over the definition of a monoid from Set to Hask, the category of Haskell types and total functions. We're so used to treating Hask as a proxy for Set we can forget they are actually different categories. But this definition of a monoid works in both. But what about that &lt;tt&gt;lambda&lt;/tt&gt;, &lt;tt&gt;rho&lt;/tt&gt; and &lt;tt&gt;alpha&lt;/tt&gt;? Well they're easy to define in any Cartesian closed category (CCC). But we don't need all of the features of a CCC to define a monoid, we just need &lt;tt&gt;lambda&lt;/tt&gt;, &lt;tt&gt;rho&lt;/tt&gt; and &lt;tt&gt;alpha&lt;/tt&gt; and some kind of 'product' on the set of objects that also acts like a bifunctor. In fact, there's a bunch of 'obvious' laws that these functions satisfy in a CCC. Any category with these functions satisfying these same laws is called a monoidal category. The above definitions allow us to transfer the definition of a monoid to any such category. For the full definition, see the &lt;a href="http://en.wikipedia.org/wiki/Monoidal_category"&gt;wikipedia entry&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Anyway, let's check to see if the type &lt;tt&gt;Int&lt;/tt&gt; might be a monoid:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; instance Monoid Int where&lt;br /&gt;&amp;gt;     one _ = 1&lt;br /&gt;&amp;gt;     mult (a,b) = a*b&lt;br /&gt;&lt;br /&gt;&amp;gt; check1 = quickCheck $ \n -&amp;gt; law1_left n == law1_middle (n :: Int)&lt;br /&gt;&amp;gt; check2 = quickCheck $ \n -&amp;gt; law1_left n == law1_right (n :: Int)&lt;br /&gt;&amp;gt; check3 = quickCheck $ \n -&amp;gt; law2_left n == (law2_right n :: Int)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Of course that's no proof, but it gives us some confidence. (Eg. what about large numbers close to 2&lt;sup&gt;31&lt;/sup&gt;...?)&lt;br /&gt;&lt;br /&gt;&lt;H3&gt;Another Category in the World of Haskell&lt;/H3&gt;&lt;br /&gt;&lt;br /&gt;Most people around here will be familiar with how Haskell types and functions form a category Hask in the obvious way. But it's less well known that there is another category lurking in Haskell. Consider the set of all Haskell functors. These are endofunctors, ie. functors Hask&amp;rarr;Hask. Between any two functors is the set of natural transformations between those functors. (If &lt;tt&gt;f&lt;/tt&gt; and &lt;tt&gt;g&lt;/tt&gt; are functors, then the polymorphic functions &lt;tt&gt;f a -&amp;gt; g a&lt;/tt&gt; are the &lt;a href="http://sigfpe.blogspot.com/2008/05/you-could-have-defined-natural.html"&gt;natural transformations&lt;/a&gt;.) In addition, you can compose natural transformations and the composition is associative. In other words, Haskell functors form a category. (See the appendix for more abstract nonsense relating to this.)&lt;br /&gt;&lt;br /&gt;We'll call the category of endofunctors on Hask by the name E. Functors can be composed like so:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; type (f :&amp;lt;*&amp;gt; g) x = f (g x)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;It'd be cool if this was a product in the usual categorical sense, but it isn't. There isn't a natural way to map to both &lt;tt&gt;f a&lt;/tt&gt; and &lt;tt&gt;g a&lt;/tt&gt; from &lt;tt&gt;f (g a)&lt;/tt&gt; with the universal property of &lt;a href="http://en.wikipedia.org/wiki/Categorical_product"&gt;products&lt;/a&gt;. Instead it's a weaker type of product which is still a bifunctor. Here's how it acts on arrows (remembering that in E, arrows are natural transformations):&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; (&amp;lt;*&amp;gt;) f g = f. fmap g&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;(We could also have used &lt;tt&gt;fmap g . f&lt;/tt&gt;.) If you think of &lt;tt&gt;f &amp;lt;#&amp;gt; g&lt;/tt&gt; as making &lt;tt&gt;f&lt;/tt&gt; act on the left and &lt;tt&gt;g&lt;/tt&gt; act on the right, then you can think of &lt;tt&gt;f &amp;lt;*&amp;gt; g&lt;/tt&gt; as making &lt;tt&gt;f&lt;/tt&gt; and &lt;tt&gt;g&lt;/tt&gt; act on the outer and inner containers of a container of containers.&lt;br /&gt;&lt;br /&gt;Here's the identity element for this product, the identity functor. It plays a role similar to &lt;tt&gt;()&lt;/tt&gt; in Hask:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; data Id x = Id x deriving Show&lt;br /&gt;&amp;gt; instance Functor Id where&lt;br /&gt;&amp;gt;     fmap f (Id x) = Id (f x)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;We can now define some familiar looking natural transformations:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; lambda' :: Functor f =&amp;gt; f a -&amp;gt; (Id :&amp;lt;*&amp;gt; f) a&lt;br /&gt;&amp;gt; lambda' x = Id x&lt;br /&gt;&lt;br /&gt;&amp;gt; rho' :: Functor f =&amp;gt; f a -&amp;gt; (f :&amp;lt;*&amp;gt; Id) a&lt;br /&gt;&amp;gt; rho' x = fmap Id x&lt;br /&gt;&lt;br /&gt;&amp;gt; alpha' :: f (g (h a)) -&amp;gt; f (g (h a))&lt;br /&gt;&amp;gt; alpha' = id&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;With these we have turned E into a monoidal category.&lt;br /&gt;&lt;br /&gt;(OK, this may be confusing. We're 'multiplying' functors and we have associativity and a left- and right-identity. So functors form a monoid (modulo isomorphism). But that's a distraction, these are not the monoids that you're looking for. See the appendix for more on this.)&lt;br /&gt;&lt;br /&gt;So now we can define monoids in this category using almost the same code as above:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; class Functor m =&amp;gt; Monoid' m where&lt;br /&gt;&amp;gt;     one' :: Id a -&amp;gt; m a&lt;br /&gt;&amp;gt;     mult' :: (m :&amp;lt;*&amp;gt; m) a -&amp;gt; m a&lt;br /&gt;&lt;br /&gt;&amp;gt;     law1_left',law1_middle',law1_right' :: m a -&amp;gt; m a&lt;br /&gt;&amp;gt;     law1_left'   = mult' . (one' &amp;lt;*&amp;gt; id) . lambda'&lt;br /&gt;&amp;gt;     law1_middle' = id&lt;br /&gt;&amp;gt;     law1_right'  = mult' . (id &amp;lt;*&amp;gt; one') . rho'&lt;br /&gt;&lt;br /&gt;&amp;gt;     law2_left',law2_right' :: ((m :&amp;lt;*&amp;gt; m) :&amp;lt;*&amp;gt; m) a -&amp;gt; m a&lt;br /&gt;&amp;gt;     law2_left'  = mult' . (mult' &amp;lt;*&amp;gt; id)&lt;br /&gt;&amp;gt;     law2_right' = mult' . (id &amp;lt;*&amp;gt; mult') . alpha'&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;And here's the punch line. That's precisely a monad, laws 'n' all.&lt;br /&gt;&lt;br /&gt;If you want, you can unpack the definitions above and see that they correspond to the usual notion of a monad. We can write code to do the translation:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; data Monad m =&amp;gt; TranslateMonad m a = TM { unTM :: m a } deriving (Eq,Show)&lt;br /&gt;&lt;br /&gt;&amp;gt; translate :: Monad m =&amp;gt; m a -&amp;gt; TranslateMonad m a&lt;br /&gt;&amp;gt; translate x = TM x&lt;br /&gt;&lt;br /&gt;&amp;gt; instance (Monad m,Functor m) =&amp;gt; Functor (TranslateMonad m) where&lt;br /&gt;&amp;gt;     fmap f (TM x) = TM (fmap f x)&lt;br /&gt;&lt;br /&gt;&amp;gt; instance (Functor m,Monad m) =&amp;gt; Monoid' (TranslateMonad m) where&lt;br /&gt;&amp;gt;     one' (Id x) = TM $ return x&lt;br /&gt;&amp;gt;     mult' (TM x) = TM $ fmap unTM x &amp;gt;&amp;gt;= id&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;In other words, any instance of &lt;tt&gt;Monad&lt;/tt&gt; gives an instance of &lt;tt&gt;Monoid'&lt;/tt&gt;. I'll let you write the reverse mapping. We can even check (not prove!) the laws are satisfied by a particular monad by using &lt;tt&gt;QuickCheck&lt;/tt&gt;:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; instance Arbitrary a =&amp;gt; Arbitrary (Id a) where&lt;br /&gt;&amp;gt;     arbitrary = liftM Id arbitrary&lt;br /&gt;&lt;br /&gt;&amp;gt; instance (Monad m,Eq (m a),Arbitrary (m a)) =&amp;gt; Arbitrary (TranslateMonad m a) where&lt;br /&gt;&amp;gt;     arbitrary = liftM TM arbitrary&lt;br /&gt;&lt;br /&gt;&amp;gt; check4 = quickCheck $ \n -&amp;gt; law1_left' n == law1_middle' (n :: TranslateMonad [] Int)&lt;br /&gt;&amp;gt; check5 = quickCheck $ \n -&amp;gt; law1_left' n == law1_right' (n :: TranslateMonad [] Int)&lt;br /&gt;&amp;gt; check6 = quickCheck $ \n -&amp;gt; law2_left' n == (law2_right' n :: TranslateMonad [] Int)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;I don't know about you, but I find this mind-blowing. We took a definition of a simple concept, the monoid. We then abstractificated its definition so that it applied to any monoidal category, and in doing so we completely changed the meaning of &lt;tt&gt;one&lt;/tt&gt; and &lt;tt&gt;mult&lt;/tt&gt;. And yet the resulting object, including its laws, are precisely what you need to solve a whole host of problems in algebra and Haskell programming. This is an amazing example of the &lt;a href="http://www.ccrnp.ncifcrf.gov/~toms/Hamming.unreasonable.html"&gt;unreasonable effectiveness of mathematics&lt;/a&gt;. The concept might be a little tricky to grasp: monads are like ordinary monoids, but with outer and inner replacing left and right. But the payoff from this is that intuitions about monoids carry over to monads. I hope to say more about this in future episodes.&lt;br /&gt;&lt;br /&gt;&lt;H3&gt;Appendix&lt;/H3&gt;&lt;br /&gt;&lt;br /&gt;Consider the category, H, with one object, Hask, and whose arrows are the endofunctors on Hask. We've shown how the arrows on this category aren't just a set, they themselves form a category. This makes H a 2-category. A category with one object is a monoid. But this is a 2-category, so we have a 2-monoid. In fact, there are some extra details required to show we have a 2-category. For example, we need to show that composition of arrows (which now form a category, not a set) is a functor (not just a function). That follows from the &lt;a href="http://sigfpe.blogspot.com/2008/05/interchange-law.html"&gt;Interchange Law&lt;/a&gt; which I've already talked about. But note that H is a monoid in a completely conventional way, its arrows form a set with a binary operator on them. This is not the structure that corresponds to monads, although it plays a part in constructing them. Also, don't confuse this monoid with the one that appears in a &lt;tt&gt;MonadPlus&lt;/tt&gt;.&lt;br /&gt;&lt;br /&gt;&lt;H3&gt;Confession&lt;/H3&gt;&lt;br /&gt;&lt;br /&gt;I'm having trouble giving &lt;tt&gt;&amp;lt;*&amp;gt;&lt;/tt&gt; the correct type signature. I think it should be something like &lt;tt&gt;(forall x.a x -&amp;gt; c x) -&amp;gt; (forall x.b x -&amp;gt; d x) -&amp;gt; (forall x.a (b x) -&amp;gt; c (d x))&lt;/tt&gt;. GHC doesn't like it. Can anyone come up with the correct thing? The code still works.&lt;br /&gt;&lt;br /&gt;&lt;H3&gt;Links&lt;/H3&gt;&lt;br /&gt;Another &lt;a href="http://scienceblogs.com/goodmath/2008/03/meta_out_the_wazoo_monads_and.php"&gt;account&lt;/a&gt; of the same subject, modulo some details.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-1982948805908213964?l=blog.sigfpe.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sigfpe.com/feeds/1982948805908213964/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=11295132&amp;postID=1982948805908213964' title='13 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/1982948805908213964'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/1982948805908213964'/><link rel='alternate' type='text/html' href='http://blog.sigfpe.com/2008/11/from-monoids-to-monads.html' title='From Monoids to Monads'/><author><name>sigfpe</name><uri>http://www.blogger.com/profile/08096190433222340957</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='09401818062305273147'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>13</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11295132.post-5354353393632437147</id><published>2008-10-25T18:45:00.000-07:00</published><updated>2008-10-27T09:35:39.031-07:00</updated><title type='text'>Operads and their Monads</title><content type='html'>Hardly a day goes by at the &lt;a href="http://golem.ph.utexas.edu/category/"&gt;n-Category Cafe&lt;/a&gt; without &lt;a href="http://homepages.ulb.ac.be/~fschlenk/Maths/What/operad.pdf"&gt;operads&lt;/a&gt; being mentioned. So it's time to write some code illustrating them.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; {-# LANGUAGE FlexibleInstances #-}&lt;br /&gt;&lt;br /&gt;&amp;gt; import Data.Monoid&lt;br /&gt;&amp;gt; import Control.Monad.Writer&lt;br /&gt;&amp;gt; import Control.Monad.State&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Let's define a simple tree type:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; data Tree a = Leaf a | Tree [Tree a] deriving (Eq,Show)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Sometimes we want to apply a function to every element of the tree. That's provided by the &lt;tt&gt;fmap&lt;/tt&gt; member of the &lt;tt&gt;Functor&lt;/tt&gt; type class.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; instance Functor Tree where&lt;br /&gt;&amp;gt;     fmap f (Leaf x) = Leaf (f x)&lt;br /&gt;&amp;gt;     fmap f (Tree ts) = Tree $ map (fmap f) ts&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;But just as we can't use map to apply monadic functions to a list (we'd write &lt;tt&gt;mapM print [1,2,3]&lt;/tt&gt;), we can't use fmap to apply them to our tree. What we need is a monadic version of &lt;tt&gt;fmap&lt;/tt&gt;. Here's a suitable type class:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; class FunctorM c where&lt;br /&gt;&amp;gt;     fmapM :: Monad m =&amp;gt; (a -&amp;gt; m b) -&amp;gt; c a -&amp;gt; m (c b)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;(I could have used Data.Traversable but that entails Data.Foldable and that's too much work.)&lt;br /&gt;&lt;br /&gt;And now we can implement a monadic version of &lt;tt&gt;Tree&lt;/tt&gt;'s &lt;tt&gt;Functor&lt;/tt&gt; instance:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; instance FunctorM Tree where&lt;br /&gt;&amp;gt;     fmapM f (Leaf x) = do&lt;br /&gt;&amp;gt;         y &amp;lt;- f x&lt;br /&gt;&amp;gt;         return (Leaf y)&lt;br /&gt;&amp;gt;     fmapM f (Tree ts) = do&lt;br /&gt;&amp;gt;         ts' &amp;lt;- mapM (fmapM f) ts&lt;br /&gt;&amp;gt;         return (Tree ts')&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;We can use &lt;tt&gt;fmapM&lt;/tt&gt; to extract the list of elements of a container type:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; serialise a = runWriter $ fmapM putElement a&lt;br /&gt;&amp;gt;               where putElement x = tell [x]&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Not only does &lt;tt&gt;serialise&lt;/tt&gt; suck out the elements of a container, it also spits out an empty husk in which all of the elements have been replaced by &lt;tt&gt;()&lt;/tt&gt;. We can think of the latter as the 'shape' of the original structure with the original elements removed. We can formalise this as&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; type Shape t = t ()&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;So we have:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; serialise :: FunctorM t =&amp;gt; t a -&amp;gt; (Shape t,[a])&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Keeping the shape around allows is to invert &lt;tt&gt;serialise&lt;/tt&gt; to give:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; deserialise :: FunctorM t =&amp;gt; Shape t -&amp;gt; [a] -&amp;gt; (t a, [a])&lt;br /&gt;&amp;gt; deserialise t = runState (fmapM getElement t) where&lt;br /&gt;&amp;gt;   getElement () = do&lt;br /&gt;&amp;gt;       x:xs &amp;lt;- get&lt;br /&gt;&amp;gt;       put xs&lt;br /&gt;&amp;gt;       return x&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;(That's a bit like using the supply monad. This function also returns the leftovers.)&lt;br /&gt;&lt;br /&gt;We can also write (apologies for the slightly cryptic use of the writer monad):&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; size :: FunctorM t =&amp;gt; t a -&amp;gt; Int&lt;br /&gt;&amp;gt; size a = getSum $ execWriter $ fmapM incCounter a&lt;br /&gt;&amp;gt;          where incCounter _ = tell (Sum 1)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Let's try an example. Here's an empty tree:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; tree1 = Tree [Tree [Leaf (),Leaf ()],Leaf ()]&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;We can pack a bunch of integers into it:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; ex1 = fst $ deserialise tree1 [1,2,3]&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;And get them back out again:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; ex2 = serialise ex1&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;tt&gt;serialise&lt;/tt&gt; separates the shape from the data, something you can read lots more about at &lt;a href="http://www-staff.it.uts.edu.au/~cbj/Publications/shapes.html"&gt;Barry Jay&lt;/a&gt;'s web site.&lt;br /&gt;&lt;br /&gt;Remember that &lt;a href="http://sigfpe.blogspot.com/2006/11/variable-substitution-gives.html"&gt;trees are also monads&lt;/a&gt;:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; instance Monad Tree where&lt;br /&gt;&amp;gt;     return x = Leaf x&lt;br /&gt;&amp;gt;     t &amp;gt;&amp;gt;= f = join (fmap f t) where&lt;br /&gt;&amp;gt;         join (Leaf t) = t&lt;br /&gt;&amp;gt;         join (Tree ts) = Tree (fmap join ts)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;The &lt;tt&gt;join&lt;/tt&gt; operation for a tree grafts the elements of a tree of trees back into the tree.&lt;br /&gt;&lt;br /&gt;Right, that's enough about trees and shapes for now.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://en.wikipedia.org/wiki/Operad_theory"&gt;Operads&lt;/a&gt; are a bit like the plumbing involved in installing a sprinkler system. Suppose you have a piece, A that splits a single pipe into two:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;       ______&lt;br /&gt;      /&lt;br /&gt;     /   ____&lt;br /&gt;____/   /&lt;br /&gt;     A /&lt;br /&gt;____   \&lt;br /&gt;    \   \____&lt;br /&gt;     \&lt;br /&gt;      \______&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;And you have two more pipes B and C that look like this:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;        ______&lt;br /&gt;       /&lt;br /&gt;      /   ____&lt;br /&gt;     /   /&lt;br /&gt;____/    \____&lt;br /&gt;          &lt;br /&gt;    B or C&lt;br /&gt;____      ____&lt;br /&gt;    \    /&lt;br /&gt;     \   \____&lt;br /&gt;      \&lt;br /&gt;       \______&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;then you can 'compose' them to make a larger system like this:&lt;br /&gt;&lt;br /&gt;              &lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;                       ______&lt;br /&gt;                      /&lt;br /&gt;                     /   ____&lt;br /&gt;                    /   /&lt;br /&gt;          _________/    \____&lt;br /&gt;         /            B  &lt;br /&gt;        /   _______      ____&lt;br /&gt;       /   /       \    /&lt;br /&gt;      /   /         \   \____&lt;br /&gt;     /   /           \&lt;br /&gt;____/   /             \______&lt;br /&gt;     A /&lt;br /&gt;____   \               ______&lt;br /&gt;    \   \             /&lt;br /&gt;     \   \           /   ____&lt;br /&gt;      \   \         /   /&lt;br /&gt;       \   \_______/    \____&lt;br /&gt;        \             C  &lt;br /&gt;         \_________      ____&lt;br /&gt;                   \    /&lt;br /&gt;                    \   \____&lt;br /&gt;                     \&lt;br /&gt;                      \______&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;(Vim rectangular mode FTW!)&lt;br /&gt;&lt;br /&gt;The important thing to note here is that as A had two outputs (or inputs, depending on your point of view) you can attach two more pieces, like B and C, directly to it.&lt;br /&gt;&lt;br /&gt;Call the number of outlets the 'degree' of the system. If A has degree n then we can attach n more systems, A&lt;sub&gt;1&lt;/sub&gt;...A&lt;sub&gt;n&lt;/sub&gt; to it and the resulting system will have degree degree(A&lt;sub&gt;1&lt;/sub&gt;)+...+degree(A&lt;sub&gt;n&lt;/sub&gt;). We can write the result as A(A&lt;sub&gt;1&lt;/sub&gt;,...,A&lt;sub&gt;n&lt;/sub&gt;).&lt;br /&gt;&lt;br /&gt;We also have the 'identity' pipe that looks like this:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;_____________&lt;br /&gt;&lt;br /&gt;  identity&lt;br /&gt;_____________&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;Formally, an operad is a collection of objects, each of which has a 'degree' that's an integer n, n&amp;ge;0, depending on your application), an identity element of degree 1, and a composition law:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; class Operad a where&lt;br /&gt;&amp;gt;     degree :: a -&amp;gt; Int&lt;br /&gt;&amp;gt;     identity :: a&lt;br /&gt;&amp;gt;     o :: a -&amp;gt; [a] -&amp;gt; a&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;tt&gt;o&lt;/tt&gt; is the composition operation. If f has degree n then we can apply it to a list of n more objects. So we only expect to evaluate &lt;tt&gt;f `o` fs&lt;/tt&gt; successfully if &lt;tt&gt;degree f == length fs&lt;/tt&gt;.&lt;br /&gt;&lt;br /&gt;There are many identities we'd expect to hold. For example &lt;tt&gt;f `o` [identities,...,identity] == f&lt;/tt&gt;, because adding plain sections of pipe has no effect. We also expect some associativity conditions coming from the fact that it doesn't matter what order we build a pipe assembly in, it'll still function the same way.&lt;br /&gt;&lt;br /&gt;We can follow this pipe metaphor quite closely to define what I think of as the prototype Operad. A &lt;tt&gt;Fn a&lt;/tt&gt; is a function that takes n inputs of type a and returns one of type a. As we can't easily introspect and find out how many arguments such a function expects, we also store the degree of the function with the function:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; data Fn a = F { deg::Int, fn::[a] -&amp;gt; a }&lt;br /&gt;&lt;br /&gt;&amp;gt; instance Show a =&amp;gt; Show (Fn a) where&lt;br /&gt;&amp;gt;     show (F n _) = "&amp;lt;degree " ++ show n ++ " function&amp;gt;"&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;tt&gt;unconcat&lt;/tt&gt; is a kind of inverse to &lt;tt&gt;concat&lt;/tt&gt;. You give a list of integers and it chops up a list into pieces with lengths corresponding to your integers. We use this to unpack the arguments to &lt;tt&gt;f `o` gs&lt;/tt&gt; into pieces suitable for the elements of &lt;tt&gt;gs&lt;/tt&gt; to consume.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; unconcat [] [] = []&lt;br /&gt;&amp;gt; unconcat (n:ns) xs = take n xs : unconcat ns (drop n xs)&lt;br /&gt;&lt;br /&gt;&amp;gt; instance Operad (Fn a) where&lt;br /&gt;&amp;gt;     degree = deg&lt;br /&gt;&amp;gt;     f `o` gs = let n = sum (map degree gs)&lt;br /&gt;&amp;gt;                in F n (fn f . zipWith fn gs . unconcat (map degree gs))&lt;br /&gt;&amp;gt;     identity = F 1 head&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Now compute an example, f(f&lt;sub&gt;1&lt;/sub&gt;,f&lt;sub&gt;2&lt;/sub&gt;,f&lt;sub&gt;3&lt;/sub&gt;) applied to [1,2,3]. It should give 1+1+2*3=8.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; ex3 = fn (f `o`  [f1,f2]) [1,2,3] where&lt;br /&gt;&amp;gt;   f = F 2 (\[x,y] -&amp;gt; x+y)&lt;br /&gt;&amp;gt;   f1 = F 1 (\[x] -&amp;gt; x+1)&lt;br /&gt;&amp;gt;   f2 = F 2 (\[x,y] -&amp;gt; x*y)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;(That's a lot like lambda calculus without names. Operads are a bit like n-ary combinators.)&lt;br /&gt;&lt;br /&gt;Now I'm going to introduce a different looking operad. Think of &lt;tt&gt;V&lt;/tt&gt; as representing schemes for dicing the real line. Here are some examples:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;|0.25|0.25|   0.5   |&lt;br /&gt;&lt;br /&gt;|0.1|0.1|    0.8    |&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;If A divides up the real line into n pieces then you could divide up each of the n pieces using their own schemes. This means that dicing schemes compose. So if we define A, B and C as:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;A = |0.5|0.5|&lt;br /&gt;&lt;br /&gt;B = |0.75|0.25|&lt;br /&gt;C = |0.1|0.1|0.8|&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;Then A(B,C) is:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;|0.375|0.125|0.05|0.05|0.4|&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;We could implement &lt;tt&gt;V&lt;/tt&gt; as a list of real numbers, but it's more fun to generalise to any monoid and not worry about divisions summing to 1:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; data V m = V { unV :: [m] } deriving (Eq,Show)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;This becomes an operad by allowing the monoid value in a 'parent' scheme multiply the values in a 'child'.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; instance Monoid m =&amp;gt; Operad (V m) where&lt;br /&gt;&amp;gt;     degree (V ps) = length ps&lt;br /&gt;&amp;gt;     (V as) `o` bs = V $ op as (map unV bs) where&lt;br /&gt;&amp;gt;         op [] [] = []&lt;br /&gt;&amp;gt;         op (a:as) (b:bs) = map (mappend a) b ++ op as bs&lt;br /&gt;&amp;gt;     identity = V [mempty]&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;For example, if d&lt;sub&gt;1&lt;/sub&gt; cuts the real line in half, and d&lt;sub&gt;2&lt;/sub&gt; cuts it into thirds, then d&lt;sub&gt;1&lt;/sub&gt;(d&lt;sub&gt;1&lt;/sub&gt;,d&lt;sub&gt;2&lt;/sub&gt;) will cut it into five pieces of lengths 1/4,1/4,1/6,1/6,1/6:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; ex4 = d1 `o` [d1,d2] where&lt;br /&gt;&amp;gt;   d1 = V [Product (1/2),Product (1/2)]&lt;br /&gt;&amp;gt;   d2 = V [Product (1/3),Product (1/3),Product (1/3)]&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;If the elements in a &lt;tt&gt;V&lt;/tt&gt; are non-negative and sum to 1 we can think of them as probability distributions. The composition A(A&lt;sub&gt;1&lt;/sub&gt;,...,A&lt;sub&gt;n&lt;/sub&gt;) is the distribution of all possible outcomes you can get by selecting a value i in the range {1..n} using distribution A and then selecting a second&lt;br /&gt;value conditionally from distribution A&lt;sub&gt;i&lt;/sub&gt;. We connect with the recent &lt;a href="http://golem.ph.utexas.edu/category/2008/10/entropy_diversity_and_cardinal.html"&gt;n-category post&lt;/a&gt; on entropy.&lt;br /&gt;&lt;br /&gt;In fact we can compute the entropy of a distrbution as follows:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; h (V ps) = - (sum $ map (\(Product x) -&amp;gt; xlogx x) ps) where&lt;br /&gt;&amp;gt;   xlogx 0 = 0&lt;br /&gt;&amp;gt;   xlogx x = x*log x/log 2&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;We can now look at the 'aside' in that post. From an element of &lt;tt&gt;V&lt;/tt&gt; we can produce a function that computes a corresponding linear combination (at least for &lt;tt&gt;Num&lt;/tt&gt; types):&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; linear (V ps) xs = sum $ zipWith (*) (map (\(Product x) -&amp;gt; x) ps) xs&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;We can now compute the entropy of a distribution in two different ways:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; (ex5,ex6) = (h (d1 `o` [d1,d2]),h d1 + linear d1 (map h [d1,d2])) where&lt;br /&gt;&amp;gt;   d1 = V [Product 0.5,Product 0.5]&lt;br /&gt;&amp;gt;   d2 = V [Product 0.25,Product 0.75]&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Now according to &lt;a href="http://arxiv.org/pdf/math/0404016v1"&gt;this paper on operads&lt;/a&gt; we can build a monad from an operad. Here's the construction:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; data MonadWrapper op a = M { shape::op, value::[a] } deriving (Eq,Show)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;(The field names aren't from the paper but they do give away what's actually going on...)&lt;br /&gt;&lt;br /&gt;The idea is that an element of this construction consists of an element of the operad of degree n, and an n element list. It's a functor in an obvious way:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; instance Functor (MonadWrapper o) where&lt;br /&gt;&amp;gt;     fmap f (M o xs) = M o (map f xs)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;It's also a &lt;tt&gt;FunctorM&lt;/tt&gt;:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; instance FunctorM (MonadWrapper o) where&lt;br /&gt;&amp;gt;   fmapM f (M s c) = do&lt;br /&gt;&amp;gt;       c' &amp;lt;- mapM f c&lt;br /&gt;&amp;gt;       return $ M s c'&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;We can make the construction a monad as follows:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; instance Operad o =&amp;gt; Monad (MonadWrapper o) where&lt;br /&gt;&amp;gt;     return x = M identity [x]&lt;br /&gt;&amp;gt;     p &amp;gt;&amp;gt;= f = join (fmap f p) where&lt;br /&gt;&amp;gt;         join (M p xs) = M (p `o` map shape xs) (concatMap value xs)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Now for something to be a monad there are various laws that needs to be satisfied. These follow from the rules (which I haven't explicitly stated) for an operad. When I first looked at that paper I was confused - it seemed that the operad part and the list part didn't interact with each other. And then I suddenly realised what was happening. But hang on for a moment...&lt;br /&gt;&lt;br /&gt;Tree shapes make nice operads. The composition rule just grafts child trees into the leaves of the parent:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; instance Operad (Tree ()) where&lt;br /&gt;&amp;gt;     degree t = length (snd (serialise t))&lt;br /&gt;&amp;gt;     identity = Leaf ()&lt;br /&gt;&amp;gt;     t `o` ts = let (r,[]) = deserialise t ts in r &amp;gt;&amp;gt;= id&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;We can write that more generically so it works with more than trees:&lt;br /&gt;&lt;br /&gt;&amp;gt; data OperadWrapper m = O { unO::Shape m }&lt;br /&gt; &lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; instance (FunctorM m,Monad m) =&amp;gt; Operad (OperadWrapper m) where&lt;br /&gt;&amp;gt;     degree (O t) = size t&lt;br /&gt;&amp;gt;     identity = O (return ())&lt;br /&gt;&amp;gt;     (O t) `o` ts = let (r,[]) = deserialise t (map unO ts) in O (r &amp;gt;&amp;gt;= id)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;So let's use the construction above to make a monad. But what actually is this monad? Each element is a pair with (1) a tree shape of degree n and (2) an n-element list. In other words, it's just a serialised tree. We can define these isomorphisms to make that clearer:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; iso1 :: FunctorM t =&amp;gt; t x -&amp;gt; MonadWrapper (t ()) x&lt;br /&gt;&amp;gt; iso1 t = uncurry M (serialise t)&lt;br /&gt;&lt;br /&gt;&amp;gt; iso2 :: FunctorM t =&amp;gt; MonadWrapper (t ()) x -&amp;gt; t x&lt;br /&gt;&amp;gt; iso2 (M shape contents) = let (tree,[]) = deserialise shape contents in tree&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;So, for example:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; ex7 = iso2 (iso1 tree) where&lt;br /&gt;&amp;gt;   tree = Tree [Tree [Leaf "Birch",Leaf "Oak"],Leaf "Cypress",Leaf "Binary"]&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;That construction won't work for all monads, just those monads that come from operads. I'll leave you to characterise those.&lt;br /&gt;&lt;br /&gt;And now we have it: a way to think about operads from a computational perspective. They're the shapes of certain monadic serialisable containers. Operadic composition is the just the same grafting operation used in the &lt;tt&gt;join&lt;/tt&gt; operation, using values &lt;tt&gt;()&lt;/tt&gt; as the graft points.&lt;br /&gt;&lt;br /&gt;I have a few moments spare so let's actually do something with an operad. First we need the notion of a free operad. This is basically just a set of 'pipe' parts that we can stick together with no equations holding apart from those inherent in the definition of an operad. This is different from the &lt;tt&gt;V&lt;/tt&gt; operad where many different ways of apply the &lt;tt&gt;o&lt;/tt&gt; operator can result in the same result. We can use any set of parts, as long as we can associate an integer with each part:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; class Graded a where&lt;br /&gt;&amp;gt;   grade :: a -&amp;gt; Int&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;The free operad structure is just a tree:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; data FreeOperad a = I | B a [FreeOperad a] deriving Show&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;tt&gt;I&lt;/tt&gt; will be the identity, but it will also serve as a 'terminator' like the way there's always a &lt;tt&gt;[]&lt;/tt&gt; at the end of a list.&lt;br /&gt;&lt;br /&gt;An easy way to make a single part an element of an operad:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; b n = let d = grade n in B n (replicate d I)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Here's the instance:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; instance Graded a =&amp;gt; Operad (FreeOperad a) where&lt;br /&gt;&amp;gt;   degree I = 1&lt;br /&gt;&amp;gt;   degree (B _ xs) = sum (map degree xs)&lt;br /&gt;&amp;gt;   identity = I&lt;br /&gt;&amp;gt;   I `o` [x] = x&lt;br /&gt;&amp;gt;   B a bs `o` xs = let arities = map degree bs&lt;br /&gt;&amp;gt;                   in B a $ zipWith o bs (unconcat arities xs)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Now I'm going to use this to make an operad and then a monad:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; instance Graded [a] where&lt;br /&gt;&amp;gt;   grade = length&lt;br /&gt;&lt;br /&gt;&amp;gt; type DecisionTree = MonadWrapper (FreeOperad [Float])&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;What we get is a lot like the &lt;a href="http://www.randomhacks.net/articles/2007/02/21/randomly-sampled-distributions"&gt;probability monad&lt;/a&gt; except it doesn't give the final probabilities. Instead, it gives the actual tree of possibilities. (I must also point out &lt;a href="http://hpaste.org/1818"&gt;this hpaste&lt;/a&gt; by wli.)&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; test = do&lt;br /&gt;&amp;gt;   a &amp;lt;- M (b [Product 0.5,Product 0.5]) [1,2]&lt;br /&gt;&amp;gt;   b &amp;lt;- M (b [Product (1/3.0),Product (1/3.0),Product (1/3.0)]) [1,2,3]&lt;br /&gt;&amp;gt;   return $ a+b&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Now we can 'flattten' this tree so that the leaves have the final probabilities:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; flatten :: Monoid m =&amp;gt; FreeOperad [m] -&amp;gt; V m&lt;br /&gt;&amp;gt; flatten I = V [mempty]&lt;br /&gt;&amp;gt; flatten (B ms fs) = V $ concat $ zipWith (map . mappend) ms (map (unV . flatten) fs)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;This is a morphism of operads. (You can probably guess the definition of such a thing.) It induces a morphism of monads:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; liftOp :: (Operad a,Operad b) =&amp;gt; (a -&amp;gt; b) -&amp;gt; MonadWrapper a x -&amp;gt; MonadWrapper b x&lt;br /&gt;&amp;gt; liftOp f (M shape values) = M (f shape) values&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;tt&gt;liftOp flatten test&lt;/tt&gt; will give you the final probabilities.&lt;br /&gt;&lt;br /&gt;There may just be a possible application of this stuff. The point of separating shape from data is performance. You can store all of your data in flat arrays and do most of your work there. It means you can write fast tight loops and only rebuild the original datatype if needed at the end. If you're lucky you can precompute the shape of the result, allowing you to preallocate a suitable chunk of memory for your final answer to go into. What the operad does is allow you to extend this idea to monadic computations, for suitable monads. If the 'shape' of the computation is independent of the details of the computation, you can use an operad to compute that shape, and then compute the contents of the corresponding array separately. If you look at the instance for &lt;tt&gt;MonadWrapper&lt;/tt&gt; you'll see that the part of the computation that deals with the data is simply a &lt;tt&gt;concatMap&lt;/tt&gt;.&lt;br /&gt;&lt;br /&gt;BTW In some papers the definition restricts the degree to &amp;ge;1. But that's less convenient for computer science applications. If it really bothers you then you can limit yourself to thinking about containers that contain at least one element.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-5354353393632437147?l=blog.sigfpe.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sigfpe.com/feeds/5354353393632437147/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=11295132&amp;postID=5354353393632437147' title='10 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/5354353393632437147'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/5354353393632437147'/><link rel='alternate' type='text/html' href='http://blog.sigfpe.com/2008/10/operads-and-their-monads.html' title='Operads and their Monads'/><author><name>sigfpe</name><uri>http://www.blogger.com/profile/08096190433222340957</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='09401818062305273147'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>10</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11295132.post-4843004274672094532</id><published>2008-10-18T15:14:00.000-07:00</published><updated>2008-10-18T17:04:43.625-07:00</updated><title type='text'>What's the use of a transfinite ordinal?</title><content type='html'>This post is going to talk about one player games (which I'll call 1PGs, or just games). They seem pretty stupid at first, but it turns out that they have a lot of rich and applicable theory. The idea is that at any position in a 1PG there is a set of new positions that you can move to. You lose when you have no legal moves and the goal is to avoid losing for as long as you can. For convenience, let's identify your position in a 1PG with the set of positions you could move to. A position in a 1PG is just another 1PG. So we can give the following recursive formal definition of a 1PG:&lt;br /&gt;&lt;br /&gt;A one player game is either the losing position with no legal moves, denoted by &amp;empty;, or a set of one player games.&lt;br /&gt;&lt;br /&gt;A sequence of legal moves in a 1PG, say x, is then a sequence of the form x &amp;ni;x&lt;sub&gt;0&lt;/sub&gt; &amp;ni;x&lt;sub&gt;1&lt;/sub&gt; &amp;ni;x&lt;sub&gt;2&lt;/sub&gt; &amp;ni;....&lt;br /&gt;&lt;br /&gt;Did you notice that I sneaked in a really big restriction on 1PGs in that definition? My definition means that 1PGs are the same thing as sets. And sets are &lt;a href="http://en.wikipedia.org/wiki/Axiom_of_foundation"&gt;well-founded&lt;/a&gt; meaning that any sequence of the form x &amp;ni;x_0 &amp;ni;x&lt;sub&gt;1&lt;/sub&gt; &amp;ni;x&lt;sub&gt;2&lt;/sub&gt; &amp;ni;... eventually terminates. So my definition implicitly contains the restriction that you always eventually lose. That's fine for my purposes.&lt;br /&gt;&lt;br /&gt;Now suppose that x &amp;ni;y &amp;ni;z is a legal sequence of plays in some 1PG. You might as well assume that z is in x. The reason is that if z were a legal move from x, you wouldn't take it, because in order to delay the inevitable you're going to prefer to go via y. So we'll assume that any game x is "transitively closed", ie. if x&amp;ni;y and y&amp;ni;z then x&amp;ni;z.&lt;br /&gt;&lt;br /&gt;Now I want to restrict things even further to the most boring kind of 1PG of all, games where the successor moves are all totally ordered. Intuitively what I mean is that you can think of every move in the game as simply moving you along a 'line'. Given any two positions in the game, x and y, either x&amp;ni;y, or y&amp;isin;x, or x=y. In these games there are no branches, the only thing you can do in these games is delay the inevitable. A bit like life really. We'll also use the notation a&amp;lt;b to mean a&amp;isin;b.&lt;br /&gt;&lt;br /&gt;I'll call these games ordinal 1PGs. You may have noticed that this is precisely the same as the ordinary definition of an ordinal.&lt;br /&gt;&lt;br /&gt;Let's call the game &amp;empty; by the name 0. 0 is sudden death. Recursively define the game n={0,1,2,3,n-1}. In the game of n you have up to n steps to play before you die (though you could hurry things along if you wanted to).&lt;br /&gt;&lt;br /&gt;But what about this game: &amp;omega;={0,1,2,...}. There's nothing mysterious about it. In your first move you choose an integer, and then you get to delay the end for up to n moves. This is our first transfinite ordinal, but it just describes a 1PG that anyone can play and which is guaranteed to terminate after a finite number of moves. We can't say in advance how many moves it will take, and we can't even put an upper bound on the game length, but we do know it's finite.&lt;br /&gt;&lt;br /&gt;Given two games, a and b, we can define the sum a+b. This is simply the game where you play through b first, and then play through a. Because of the transitivity condition above you could simply abandon b at any stage and then jump into a, but that's not a good thing to do if you're trying to stay alive for as long as possible. The first example of such a 1PG is the game &amp;omega;+1. You get to make one move, then you choose an n, and then you get to play n.&lt;br /&gt;&lt;br /&gt;What about 1+&amp;omega;? Well you get to choose n, play n moves, and then play an extra move. But that's exactly like playing &amp;omega; and making the move n+1. So 1+&amp;omega; and &amp;omega; are the same game. Clearly addition of 1PGs isn't commutative.&lt;br /&gt;&lt;br /&gt;We can multiply two games as well. Given games a and b, ab is defined like this: we have game a and game b in front of us at the same time, a on the left and b on the right. At any turn we have a choice: make a move in the game on the left, or make a move in the game on the right, resetting the left game back to a again. It's like playing through b, but playing through a copy of a at each turn. Note how for integers a and b the game ab is just ab, where the former is game multiplication, and the latter is ordinary arithmetical multiplication.&lt;br /&gt;&lt;br /&gt;We can also define exponentiation. We can define a&lt;sup&gt;n&lt;/sup&gt;, for finite integer n, in the obvious way as &amp;omega;&amp;middot;&amp;omega;&amp;middot;...&amp;middot;&amp;omega; n times. Before getting onto more general exponentiation I need to classify ordinal games. There are three kinds. There's&lt;br /&gt;&lt;br /&gt;(1) the zero game 0=&amp;empty;.&lt;br /&gt;(2) the games where there is one 'largest' next move in the game, the one you should take. These are games of the form a+1 and the next position in the game, if you're playing to survive, will be a. For example, the game 7 is 6+1.&lt;br /&gt;(3) the games where there is no single best choice of next move. These are games where there is no largest next move, but instead an ever increasing sequence of possible next moves and you have to pick one. The first example is &amp;omega; where you're forced to pick an n. These are called limit ordinals.&lt;br /&gt;&lt;br /&gt;We can use this to recursively define a&lt;sup&gt;b&lt;/sup&gt; using the classification of b.&lt;br /&gt;&lt;br /&gt;(1) If b=0 then by definition a&lt;sup&gt;b&lt;/sup&gt;=1 and that's that.&lt;br /&gt;(2) if b=c+1, then a&lt;sup&gt;b&lt;/sup&gt;=a&lt;sup&gt;c&lt;/sup&gt;&amp;middot;a.&lt;br /&gt;(3) if b is a limit ordinal then a move in a&lt;sup&gt;b&lt;/sup&gt; works like this: you pick a move in b, say c, and now you get to play a&lt;sup&gt;c&lt;/sup&gt; (or, by transitive closure, any smaller game).&lt;br /&gt;&lt;br /&gt;For example, you play &amp;omega;&lt;sup&gt;&amp;omega;&lt;/sup&gt; like this: in your first move you pick an n, and then you play &amp;omega;&lt;sup&gt;n&lt;/sup&gt;. Note how the state of play never becomes infinite, you're always playing with finite rules on a finite 'board'. This is an important point. Even though &amp;omega;&lt;sup&gt;&amp;omega;3+2&lt;/sup&gt;2+&amp;omega;+4 seems like it must be a very big kind of thing, it describes a finite process that always terminates.&lt;br /&gt;&lt;br /&gt;So what use are these things?&lt;br /&gt;&lt;br /&gt;Suppose you have an algorithm that runs one step at a time, with each step taking a finite time, and where the internal state of the program at step t is s&lt;sub&gt;t&lt;/sub&gt;. Suppose that you can find a map f from program states to the ordinals such that f(s&lt;sub&gt;t+1&lt;/sub&gt;)&amp;lt;f(s&lt;sub&gt;t&lt;/sub&gt;), ie. the program always decreases the ordinal associated to a state. Then the program must eventually terminate.&lt;br /&gt;&lt;br /&gt;For example, suppose the program iterates 10 times. We can simply assign f(s&lt;sub&gt;t&lt;/sub&gt;)=10-t. Suppose instead your algorithm computes some number n at the first step, though you can't figure out what that number is (or maybe it accepts n as an input from a user), then we can use f(0)=&amp;omega; and so on. We don't know in advance what n is, but that doesn't matter. Whatever n is entered, the program will terminate.&lt;br /&gt;&lt;br /&gt;This technique is particularly useful for proving termination of rewrite rules. For example, with the rewrite rules generated by the &lt;a href="http://sigfpe.blogspot.com/2007/07/ill-have-buchburger-with-fries.html"&gt;Buchberger algorithm&lt;/a&gt; we can map the first step to &amp;omega;&lt;sup&gt;n&lt;/sup&gt;, for some n. And if we don't know what n is, we can just start with &amp;omega;&lt;sup&gt;&amp;omega;&lt;/sup&gt;. If you take a peek at the &lt;a href="http://www.cs.tau.ac.il/%7Enachumd/pub.html"&gt;papers of Nachum Dershowitz&lt;/a&gt; you'll see how he applies this technique to many other types of rewrite rule.&lt;br /&gt;&lt;br /&gt;Transfinite ordinals are not as esoteric as they may first appear to be. Using them to prove termination goes back to Turing and &lt;a href="http://en.wikipedia.org/wiki/Robert_Floyd"&gt;Floyd&lt;/a&gt; but I learnt about the method from reading Dershowitz. (I think Floyd suggested using the method to prove termination of the standard rewrite rules for computing derivatives of expressions. It's tricky because differentiating a large product, say, can result in many new terms, so there's a race between the derivative 'knocking down' powers, and products causing growth.)&lt;br /&gt;&lt;br /&gt;Probably the best example of using ordinals to prove the termination of a process is in the &lt;a href="http://math.andrej.com/2008/02/02/the-hydra-game/"&gt;Hydra game&lt;/a&gt;. What's amazing about this game is that you need quite powerful axioms of arithmetic to prove termination.&lt;br /&gt;&lt;br /&gt;One more thing. All of this theory generalises to &lt;a href="http://en.wikipedia.org/wiki/Surreal_number"&gt;two player games&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Update: I think &lt;a href="http://blog.plover.com/"&gt;mjd&lt;/a&gt; may have been about to write on the same thing. On the other hand, though he mentions program termination, he says of it "But none of that is what I was planning to discuss". So maybe what I've written can be seen as complementary.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-4843004274672094532?l=blog.sigfpe.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sigfpe.com/feeds/4843004274672094532/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=11295132&amp;postID=4843004274672094532' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/4843004274672094532'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/4843004274672094532'/><link rel='alternate' type='text/html' href='http://blog.sigfpe.com/2008/10/whats-use-of-transfinite-ordinal.html' title='What&apos;s the use of a transfinite ordinal?'/><author><name>sigfpe</name><uri>http://www.blogger.com/profile/08096190433222340957</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='09401818062305273147'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>3</thr:total></entry></feed>