tag:blogger.com,1999:blog-112951322009-07-07T07:16:50.395-07:00A Neighborhood of Infinitysigfpehttp://www.blogger.com/profile/08096190433222340957noreply@blogger.comBlogger229125tag:blogger.com,1999:blog-11295132.post-55000921555871958172009-07-04T17:35:00.000-07:002009-07-07T07:16:50.410-07:00A Monad for Combinatorial Search with HeuristicsHaskell provides a great way to perform combinatorial searching with backtracking: the list monad. Do-notation provides a nice DSL that makes it easy to express the trying out of different possibilities. But the list monad only performs a simple-minded walk through all of the alternatives giving little opportunity to direct that walk. In particular, it's not easy to provide heuristics to say things like "try this alternative first but if it starts going badly consider this alternative too". This post contains a monad that gives a simple scheme to allow programmers to direct searches in this way.<br /><br />First the Haskell administrativia...<br /><br /><pre><br />> import Data.Char<br />> import Control.Monad<br />> import Data.Monoid<br />> import Data.List<br /><br /></pre><br />When using the list monad, a list is interpreted as a list of candidates in a search. The <tt>join</tt> function for this monad takes a list of lists of candidates and flattens it into a list of candidates. This is all the list monad really does: you write code that generates new candidates from old, and the <tt>>>=</tt> function applies this code to all of the candidates it knows about and then flattens this back out to a single list of candidates. Importantly it does this in a lazy way so that you only need look at candidates as they are generated.<br /><br />This new monad will keep slightly more information: each candidate will have a 'penalty' value attached to it saying how attractive a candidate it is. Candidates with score 0 will be tried first, and those with score n will be tried after those with lower scores. We can represent a collection of candidates and their scores simply as a list of lists. The first list in the list will have those with score 0, the second will have those with score 1 and so on. We'll call these lists penalty lists and the positions within those lists slots.<br /><br />Here's the definiton of the penalty list type:<br /><br /><pre><br />> data PList a = P { unO :: [[a]] } deriving (Show,Eq)<br /><br /></pre><br />It's a functor in a straightforward way:<br /><br /><pre><br />> instance Functor PList where<br />> fmap f (P xs) = P (fmap (fmap f) xs)<br /><br /></pre><br />The rule we'll adopt is that if you're trying a combination of two candidates then the penalty associated with the combination is the sum of the penalties of the individual objects. To implement this we need an alternative version of the <tt>join</tt> operation. If we have a penalty list of penalty lists and we have an element in the mth slot in the nth penalty sublist then we want it to end up in the (m+n)th slot in the final penalty list. Within a slot we can just order the elements just like in the original list monad.<br /><br /><pre><br />> headm :: Monoid m => [m] -> m<br />> headm (a:as) = a<br />> headm [] = mempty<br /><br />> tailm :: Monoid m => [m] -> [m]<br />> tailm (a:as) = as<br />> tailm [] = []<br /><br />> zipm :: Monoid m => [[m]] -> [m]<br />> zipm ms | all null ms = []<br />> zipm ms = let<br />> heads = map headm ms<br />> tails = map tailm ms<br />> h = mconcat heads<br />> t = zipm (filter (not . null) tails)<br />> in h : t<br /><br />> instance Monad PList where<br />> return x = P [[x]]<br />> x >>= f = let P xs = (fmap (unO . f) x) in P (join xs) where<br />> join [] = []<br />> join (m:ms) = let<br />> part1 = zipm m<br />> part2 = join ms<br />> in headm part1 : zipm [tailm part1,part2]<br /><br /></pre><br />Explaining how <tt>join</tt> is implemented would take many words so I hope this picture of the computation of an example will do instead. I used the <tt>Monoid</tt> class simply to avoid directly referring to one level of nesting of brackets. It is intended to be a proper implementation of a <tt>Monad</tt> satisfying the three monad laws but I haven't proved this and it's possible that it occasionally leaves trailing empty lists around - which have no impact on search results.<br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_UdKHLrHa05M/SlDQReuY75I/AAAAAAAAAX0/r6LTYp0_8N0/s1600-h/errors.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 257px; height: 400px;" src="http://4.bp.blogspot.com/_UdKHLrHa05M/SlDQReuY75I/AAAAAAAAAX0/r6LTYp0_8N0/s400/errors.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5355008955667509138" /></a><br /><br /><pre><br />> instance MonadPlus PList where<br />> mzero = P []<br />> mplus (P xs) (P ys) = P (zipm [xs,ys])<br /><br /></pre><br />We can use this much like the list monad. First it will search for possibilities with zero penalty. When these are exhausted it'll backtrack to the last place where it can start finding possibilities with penalty 1. Then it'll try penalty 2 and so on. Importantly it manages to do this lazily so that we don't explore penalty n+1 until we've finished penalty n.<br /><br />So now we can start using it. We'll hunt for Pythagorean triples by simply hunting through all of the triples of integers. But we'll try to find solutions where the sum of the integers is as small as possible. So as list of candidate integers we use <tt>P [[1],[2],[3]...]</tt>. In other words, the integer n has penalty n-1. Here's the code:<br /><br /><pre><br />> ex1 = do<br />> x <- P $ map (\x -> [x]) [1..]<br />> y <- P $ map (\y -> [y]) [1..]<br />> z <- P $ map (\z -> [z]) [1..]<br />> guard $ x*x+y*y==z*z<br />> return $ (x,y,z)<br /><br /></pre><br />Of course we wouldn't really search for Pythagorean triples this way. This is just an illustration of how to use the code. But note, crucially, that the equivalent code using the regular list monad would give us back no solutions. It'd start with x=1 and y=1 and then go off to infinity finding candidates for z. So as a side effect the penalty list allows us to tame some infinite searches.<br /><br />Anyway, that was a simple numerical example. But this monad can be used with much more complex kinds of search. In fact it almost serves as a drop-in replacement for the list monad. This is a really nice example of the way separation of concerns is easy in Haskell. The task of generating candidates for search can easily be separated from the task of selecting from those candidates, even though the operations are highly interleaved during execution.<br /><br />So here's a more complex example: writing a parser that can tolerate errors without running into combinatorial explosion. The idea is that we associate a penalty with each error. The penalty will make the parser run on the assumption of no errors until it can no longer parse, and then it'll backtrack on the assumption of one error until that assumption is no longer tenable and so on. We can liberally sprinkle 'erroneous' parsings throughout our code confident that these branches will only be taken in the event that an error-free parsing can't be found.<br /><br />Firstly, here's a penalty list that we can use to introduce a penalty of just 1.<br /><br /><pre><br />> penalty :: PList ()<br />> penalty = P [[],[()]]<br /><br /></pre><br />If we stick that in the code path then anything following acquires a penalty of 1.<br /><br />Now we can write a parser. We can implement Hutton's parser in his <a href="http://www.cs.nott.ac.uk/~gmh/bib.html#pearl">monad parsers paper</a> with very little modification. We simply replace the usual list with the penalty list and do away with the <tt>+++</tt> operator to allow it to be a bit more liberal about backtracking. Here's the parser type:<br /><br /><pre><br />> newtype Parser a = Parser (String -> PList (a,String))<br /><br /></pre><br />We could have parameterised that with the underlying monad so that we could have parsers with a choice of search strategy.<br /><br />The rest is a lot like in Hutton's paper:<br /><br /><pre><br />> parse (Parser f) x = f x<br /><br />> instance Monad Parser where<br />> return a = Parser (\cs -> P [[(a,cs)]])<br />> p >>= f = Parser (\cs -> do<br />> (a,cs') <- parse p cs<br />> parse (f a) cs')<br /><br />> instance MonadPlus Parser where<br />> mzero = Parser (\cs -> mzero)<br />> p `mplus` q = Parser (\cs -> parse p cs `mplus` parse q cs)<br /><br />> item :: Parser Char<br />> item = Parser (\cs -> case cs of<br />> "" -> mzero<br />> (c:cs) -> P [[(c,cs)]])<br /><br />> sat :: (Char -> Bool) -> Parser Char<br />> sat p = do<br />> c <- item<br />> if p c then return c else mzero<br /><br />> char :: Char -> Parser Char<br />> char c = sat (c ==)<br /><br /></pre><br />Now for a simple parsing problem. We'll parse simple arithmetical expressions a lot like in Hutton's paper. But I'm going to tolerate two kinds of error:<br />1. The shift key doesn't always work so occasionally a shifted or unshifted version of a character may appear and<br />2. parentheses are occasionally left out by the clumsy user.<br /><br />Now we can code up a simple grammar for this. First the mapping between shifted and unshifted characters (on a Mac US keyboard):<br /><br /><pre><br />> lowers = "1234567890-=/"<br />> uppers = "!@#$%^&*()_+?"<br />> lower x = lookup x (zip uppers lowers)<br />> upper x = lookup x (zip lowers uppers)<br /><br />> upperChar x = case upper x of<br />> Nothing -> mzero<br />> Just y -> char y >> return x<br /><br />> lowerChar x = case lower x of<br />> Nothing -> mzero<br />> Just y -> char y >> return x<br /><br /></pre><br />A version of <tt>penalty</tt> wrapped for the parser monad:<br /><br /><pre><br />> avoid :: Parser ()<br />> avoid = Parser $ \cs -> do<br />> penalty<br />> return ((),cs)<br /><br /></pre><br />Reading keys on the assumption that the shift key may have failed:<br /><br /><pre><br />> keyChar x = char x `mplus` (avoid >> upperChar x) `mplus` (avoid >> lowerChar x)<br /><br />> digit = do<br />> x <- foldl mplus mzero (map keyChar "0123456789")<br />> return (fromIntegral (ord x-ord '0'))<br /><br />> number1 :: Integer -> Parser Integer<br />> number1 m = return m `mplus` do<br />> n <- digit<br />> number1 (10*m+n)<br /><br />> number :: Parser Integer<br />> number = do<br />> n <- digit<br />> number1 n<br /><br />> chainl :: Parser a -> Parser (a -> a -> a) -> a -> Parser a <br />> chainl p op a = (p `chainl1` op) `mplus` return a <br />> chainl1 :: Parser a -> Parser (a -> a -> a) -> Parser a <br />> p `chainl1` op = do {a <- p; rest a} <br />> where <br />> rest a = (do<br />> f <- op<br />> b <- p<br />> rest (f a b)) `mplus` return a <br /><br /></pre><br />Optional parentheses:<br /><br /><pre><br />> shouldHave c = keyChar c `mplus` (avoid >> return c)<br /><br /></pre><br />And the main part of the expression grammar:<br /><br /><pre><br />> expr = term `chainl1` addop <br />> term = monomial `chainl1` mulop <br />> monomial = factor `chainl1` powop<br />> factor = number `mplus` do {shouldHave '('; n <- expr; shouldHave ')'; return n} <br />> powop = keyChar '^' >> return (^)<br />> addop = do {keyChar '+'; return (+)} `mplus` do {keyChar '-'; return (-)} <br />> mulop = do {keyChar '*'; return (*)} `mplus` do {keyChar '/'; return (div)} <br /><br /></pre><br />Match the end of a string:<br /><br /><pre><br />> end :: Parser ()<br />> end = Parser $ \cs -><br />> if null cs then P [[((),"")]] else mzero<br /><br /></pre><br />We can test it out with:<br /><br /><pre><br />> completeExpr = do<br />> n <- expr<br />> end<br />> return n<br /><br />> ex2 = parse completeExpr "2^(1+3"<br /><br /></pre><br />When we run this we get no error-free parsing but we do get 3 readings with one error. One comes from reading the '(' as 9, one comes from inserting the missing ')' at the end and one comes from inserting ')' after '1'. Note that even for complex expressions we'll quickly find a 1- or 2-error parsing. For the regular list monad we might never get a parsing because there are an infinite number of ways of inserting parentheses.<br /><br />Anyway, that was just a toy parsing problem. But a more complex application comes to my mind. Some written languages are tricky to parse because their orthography doesn't fully capture the phonetics of the original language, because there are few or no indicators of sentence or even word breaks, and because they have numerous optional orthographic and grammatical rules and use a script whose individual characters are occasionally hard to reliably identify. In such a situation it's good to have a parser driven by heuristics about what is likely to be intended and the penalty list monad might serve well. <a href="http://en.wikipedia.org/wiki/Egyptian_language">Here</a>'s an example of such a language.<br /><br />Update: I forgot to add some connections to previous monads I've talked about:<br /><OL><br /><LI><code>PList</code> is a variation of the convolution monad I described <a href="http://blog.sigfpe.com/2007/01/monads-hidden-behind-every-zipper.html">here</a>. It deals with the "wrong category" aspect so it is a true Haskell monad. Penalty lists form some kind of dual to the convolution comonad.<br /><LI>It has much in common with <a href="http://blog.sigfpe.com/2007/06/how-to-write-tolerably-efficient.html">this</a> monad. That monad doesn't do anything smart about ordering searches but it does have the neat ability to 'fuse' different branches of a search so that different ways to arrive at the same place don't add to the combinatorial explosion. It's good for searches where you want to know what the minimum penalty is to get somewhere, but don't care what the best path actually is.<br /></ol><br />Also, in response to a comment on #haskell I've made the <code>join</code> example more complex so it's easier to generalise from it.<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-5500092155587195817?l=blog.sigfpe.com'/></div>sigfpehttp://www.blogger.com/profile/08096190433222340957noreply@blogger.com4tag:blogger.com,1999:blog-11295132.post-35059886208473647052009-06-20T13:01:00.000-07:002009-06-27T15:09:26.533-07:00Automata and the A-D-E classification.<H3>Introduction</H3><br />The <a href="http://en.wikipedia.org/wiki/ADE_classification">A-D-E </a> classification is a strange ubiquitous pattern that appears in many branches of mathematics. Typically it appears when you try to classify certain types of mathematical construction. If the A-D-E classification applies then you end up with two infinite sequences of cryptically named objects (A<sub>1</sub>,A<sub>2</sub>,A<sub>3</sub>,...) and (D<sub>1</sub>,D<sub>2</sub>,D<sub>3</sub>,...) as well as three leftover objects called E<sub>6</sub>, E<sub>7</sub> and E<sub>8</sub>. Unfortunately, most of these objects and their classifications are tricky to define using only elementary mathematics. However, there is one type of object that is classified in this way that can be given a relatively straightforward computational description involving a little linear algebra and assuming you know a tiny bit about automata.<br /><br />But first: why care about the A-D-E classification? Well I tried to say a little bit about how symmetries relate to nature a <a href="http://blog.sigfpe.com/2007/11/whats-all-this-e8-stuff-about-then-part.html">while back</a>. <a href="http://en.wikipedia.org/wiki/Simple_Lie_group#Simply_laced_groups">Certain types</a> of possible symmetry of particle physics can be classified the A-D-E way. The symmetry group corresponding to E8 is the now famous exceptional group <a href="http://en.wikipedia.org/wiki/E8_(mathematics)">E8</a>. I won't be able to get to an explanation of how groups are involved. But at least I'll be able to give a hint about the bigger picture that E8 is part of.<br /><H3>Non-deterministic Finite State Automata</H3><br />Here's a diagram representing a very simple <a href="http://en.wikipedia.org/wiki/Nondeterministic_finite_state_machine">non-deterministic finite state automaton</a> (NDA):<br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_UdKHLrHa05M/Sj6cXr8ACoI/AAAAAAAAAXE/QXGI9NG6KR0/s1600-h/nda1.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 392px; height: 197px;" src="http://3.bp.blogspot.com/_UdKHLrHa05M/Sj6cXr8ACoI/AAAAAAAAAXE/QXGI9NG6KR0/s400/nda1.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5349885338108037762" /></a><br />It can be in one of two states. When in state A it can transition to state B and in state B it can only transition back to state B, but it can do so in two different ways.<br /><H3>Vector Automata</H3><br />Now I'll introduce a more general kind of automaton: a <em>vector automaton</em> (VA). (I made that term up, it's not meant to correspond with anyone else's terminology.) Every vector automaton is built from an NDA. But each state corresponds to a finite dimensional vector space and each transition corresponds to a linear function mapping from the vector space of the source state to the vector space of the destination state. We could turn the above example into a VA by assigning a 1D vector space V<sub>A</sub> to A, a 2D vector space to V<sub>B</sub> and defining linear functions:<br /><center><br />f : (x) -> (x,0)<br />g : (x,y) -> (-y,x)<br />h : (x,y) -> (y,-x)<br /></center><br />A VA is just like an NDA in that it transitions from state to state according to the given transitions. But additionally it keeps track of a vector in the vector space corresponding to the current state. Each time it makes a transition the linear function corresponding to that function is applied to the vector. So in the example above, the NDA might start in state A with a scalar value x (ie. a 1D vector). When it makes its first transition its vector becomes the 2D vector (x,0) and after that each transition rotates the vector through 90 degrees clockwise or anticlockwise.<br /><br />There's a lot of freedom in defining a VA given its underlying NDA. For each node you can pick any vector space you like of any finite dimension, and for each transition you can pick any linear function you like mapping between the source and target vector spaces.<br /><br />Let's make this a little more formal. An NDA is a finite set of states combined with a finite set of transitions. Each transition has a source and destination, each of which is a state. That's it. You're allowed any finite number of transitions between states and a transition can have the same state as source and destination.<br /><br />A VA is an NDA combined with a finite dimensional vector space attached to each state and a linear function for each transition such that the function maps from the vector space of the of the source to the vector space of the target.<br /><br />Now consider this really simple NDA:<br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_UdKHLrHa05M/Sj6guAo1jpI/AAAAAAAAAXM/r012tH1TMuk/s1600-h/nda2.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 296px; height: 98px;" src="http://2.bp.blogspot.com/_UdKHLrHa05M/Sj6guAo1jpI/AAAAAAAAAXM/r012tH1TMuk/s400/nda2.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5349890119668436626" /></a><br />When we build a VA from this NDA, for convenience I'll call the vector spaces corresponding to A and B, A and B. And I'll call the linear function from A to B, f. How many VAs can be made from this VA? Clearly an infinite number. But a lot of them are very similar. <br /><br />Suppose we assume A and B are 2-dimensional and write their elements as pairs (x,y). Suppose f(1,0)=u and f(0,1)=v and that u is not a multiple of v and both are non-zero. Then we can use u and v as a basis for B. If we write f as f' in this new basis we get f'(1,0)=(1,0) and f'(0,1)=(0,1). So by relabelling the basis of B we have actually revealed that an infinite number of choices for f reduce to the same thing apart from a change of basis in B.<br /><br />Up to change of basis in A and B we find there are only three possibilities:<br />(1) u and v are distinct, non-zero, and not multiples of each other.<br />(2) u is non-zero but v is zero<br />(3) both u and v are zero<br /><br />We started with an infinite number of possibilities for our particular choice of dimensions and ended up with just 3. We'll say that two VAs are equivalent if we can get one from the other by changing basis like this.<br /><br />On the other hand, consider this NDA:<br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_UdKHLrHa05M/Sj6kMMJkhNI/AAAAAAAAAXU/bIbn87y3_MU/s1600-h/nda3.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 236px; height: 382px;" src="http://1.bp.blogspot.com/_UdKHLrHa05M/Sj6kMMJkhNI/AAAAAAAAAXU/bIbn87y3_MU/s400/nda3.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5349893936689480914" /></a><br />Let's choose A, B and C to be 1-dimensional vector spaces and define f, g and h as:<br /><center><br />f(x) = x<br />g(x) = x<br />h(x) = λx<br /></center><br />where λ is any real number. Then h(g(f(x)))=λx. We can choose any λ we like so we have an infinity of possibilities. No amount of basis change is going to change this fact. This is different from the case above because now we're comparing x and h(g(f(x))) which lie in the same vector space. So when our NDAs have loops in them, the space of possible VAs, for most choices of dimension, is infinite.<br /><br /><H3>The sum of two VAs</H3><br />If we have two VAs corresponding to the same NDA we can combine them together to make a single machine. The state is given by a state in the shared underlying NDA, but we now have a pair of vectors. Each time there is a transition we apply the pair of transforms to transform the two vector simultaneously. But we can encode a pair of vectors as a single vector simply by concatenating together the vector components in some basis. So this machine is just another VA. The dimension of the vector space for each state is simply the sum of the dimensions of the vector spaces in the original VAs. So, given two VAs we can sum them to get a third.<br /><br />Given a VA for an NDA it may or may not be equivalent to the sum of two simpler VAs. If it isn't, it's said to be irreducible. If it is, then we can ask the same question about the simpler VAs. In this way, every VA is the sum of irreducible VAs.<br /><br /><H3>Main Theorem</H3><br />Now comes the main result I want to give:<br /><br />If the graph underlying the NDA is one of the following list then (and only then) there is a finite number of inequivalent irreducible VAs for the NDA. All other VAs for that NDA are simply finite sums of machines from this finite set. Here's the list:<br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_UdKHLrHa05M/SkaNuePxB6I/AAAAAAAAAXk/T6feU--U6k8/s1600-h/dynkin.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 317px;" src="http://3.bp.blogspot.com/_UdKHLrHa05M/SkaNuePxB6I/AAAAAAAAAXk/T6feU--U6k8/s400/dynkin.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5352121036709103522" /></a><br />That's it! Weird huh? (Note that diagram X<sub>n</sub> has n nodes.)<br /><br />Strangely, those same <a href="http://en.wikipedia.org/wiki/Coxeter-Dynkin_diagram">diagrams</a> (and a few more) appear (in a quite different way) when classifying the possible symmetries of fundamental particles. The symmetries are given the same names as these diagrams. And in just the same way we get those 'sporadic' symmetries leading up to <a href="http://blog.sigfpe.com/2007/11/whats-all-this-e8-stuff-about-then-part.html">E<sub>8</sub></a>. Those same diagrams also arise in <a href="http://en.wikipedia.org/wiki/Catastrophe_theory">Catastrophe Theory</a>.<br /><br />I'd like to sketch the proof, at least in one direction, but I seem to have run out of time. Another day perhaps. But note that none of these diagrams have loops for the reasons I gave above, and I've already shown that A<sub>2</sub> gives only a finite number of choices for a certain choice of dimension.<br /><br />Meanwhile, I should give the proper mathematician names for the things above. The diagrams listed above are examples of <a href="http://en.wikipedia.org/wiki/Coxeter-Dynkin_diagram">Dynkin diagrams</a>. A non-deterministic automaton as described above is known as a <a href="http://en.wikipedia.org/wiki/Quiver_(mathematics)">quiver</a>. A vector automaton is normally known as a representation of a quiver. And the theorem above is half of Gabriel's theorem.<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-3505988620847364705?l=blog.sigfpe.com'/></div>sigfpehttp://www.blogger.com/profile/08096190433222340957noreply@blogger.com11tag:blogger.com,1999:blog-11295132.post-20734432364406190742009-06-07T10:03:00.000-07:002009-06-07T15:08:18.194-07:00Hashing MoleculesTwenty or so years ago I worked for a pharmaceutical company that had a large database of compounds. That got me thinking about the problem of how to perform lookups based on molecular structures. If you can find a bunch of numbers that encapsulate the molecular structure then you can use them as database keys. But you need to ensure that the same molecule entered in two different ways gets mapped to the same numbers, and you'd like different molecules, such as <a href="http://en.wikipedia.org/wiki/Stereoisomer">stereoisomers</a>, or even <a href="http://en.wikipedia.org/wiki/Enantiomer">enantiomers</a> to get mapped to different values.<br /><br />That got me thinking and around then I came up with an idea for hashing molecules inspired by some of the mathematics I'd been doing not long before. I never got around to coding it up but twenty years later it dawned on me that I could easily do it using a similar technique to what I used for <a href="http://blog.sigfpe.com/2008/10/untangling-with-continued-fractions.html">untangling tangles</a> and translating <a href="http://blog.sigfpe.com/2009/05/trace-diagrams-with-monads.html">trace diagrams</a>. Anyway, as I've already given examples of how to translate diagrams to monadic expressions I'm going to skimp on the details in this post and just talk about things specific to molecular structures. For this post to make sense you have to understand those earlier posts.<br /><br />So first comes the usual Haskell preamble:<br /><pre><br /><br />> {-# LANGUAGE MultiParamTypeClasses,FlexibleInstances,FunctionalDependencies,GeneralizedNewtypeDeriving #-}<br /><br />> module Main where<br /><br />> import Data.HashTable<br />> import Data.List as L<br />> import Data.Int<br />> import Data.Array<br />> import GHC.Arr<br />> import qualified Data.Map as M<br />> import Control.Monad<br />> infixl 5 .+<br />> infixl 6 .*<br /><br /></pre><br />I'm going to be using the <a href="http://blog.sigfpe.com/2007/02/monads-for-vector-spaces-probability.html">vector space monad</a> with a 4-dimensional vector space. This type labels the dimensions:<br /><pre><br /><br />> data I = A | B | C | D deriving (Eq,Ord,Show,Ix,Enum)<br /><br /></pre><br />n-valent atoms will be represented by functions that consume n-tuples. We'll start with a simple hash:<br /><pre><br /><br />> c' (a,b,c,d) = hashString ("C" ++ show (a,b,c,d))<br /><br /></pre><br />My prime motivation for using Haskell for this problem is that the code was super-easy to write and investigate. But it's inefficient. I'll talk about how to remedy this properly at the end. But for now I'm going to memoise many functions as arrays using a <code>Memoisable</code> type class with a <code>memo</code> method. So I'll be using <code>c</code> instead of <code>c'</code>:<br /><br />> c = memo $ \x -> symmetrise a4 c' x .* return ()<br /><br />Note the use of the <code>symmetrise</code> function. The idea is that the 4 bonds coming out from (singly-bonded) carbon can be thought of as lying at the corners of a tetrahedron.<br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_UdKHLrHa05M/Siv_DTt45II/AAAAAAAAAWs/lqLNr4wTQYI/s1600-h/c.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 224px; height: 199px;" src="http://3.bp.blogspot.com/_UdKHLrHa05M/Siv_DTt45II/AAAAAAAAAWs/lqLNr4wTQYI/s400/c.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5344645815102661762" /></a><br />They have <a href="http://en.wikipedia.org/wiki/Tetrahedral_symmetry">tetrahedral symmetry</a>. So I'd like my hash to also have this symmetry so that, for example, <code>c (i,j,k,l) == c (j,i,l,k)</code>. We can enforce this by summing over all 24 permutations of the arguments compatible with this symmetry, also known as A4. So <code>a4</code> lists all of these permutations:<br /><pre><br /><br />> a4 (i,j,k,l) = [<br />> (i,j,k,l),<br />> (j,i,l,k),<br />> (k,l,i,j),<br />> (l,k,j,i),<br /><br />> (i,k,l,j),<br />> (i,l,j,k),<br /><br />> (k,j,l,i),<br />> (l,j,i,k),<br /><br />> (j,l,k,i),<br />> (l,i,k,j),<br /><br />> (j,k,i,l),<br />> (k,i,j,l)<br />> ]<br /><br /></pre><br />And <code>symmetrise</code> performs the summation:<br /><pre><br /><br />> symmetrise group f x = sum (map f (group x))<br /><br /></pre><br />We can define other molecules too. Hydrogen is easy:<br /><pre><br /><br />> h = memo $ \a -> hashString ("H" ++ show a) .* return ()<br /><br /></pre><br />Oxygen has <a href="http://en.wikipedia.org/wiki/Permutation_group">S2</a> symmetry:<br /><pre><br /><br />> s2 (i,j) = [ (i,j), (j,i) ]<br />> o' (a,b) = hashString ("O" ++ show (a,b))<br />> o = memo $ \x -> symmetrise s2 o' x .* return ()<br /><br /></pre><br />You can think of a carbon atom and a hydrogen atom, say, as a pair of arrays c<sub>ijkl</sub> and h<sub>i</sub>, and bonding them together as summation over a shared index. So methane would be the sum over i,j,k,l = A..D of c<sub>ijkl</sub>h<sub>i</sub>h<sub>j</sub>h<sub>k</sub>h<sub>l</sub>. To this end, define a bond as:<br /><pre><br /><br />> bond :: V Int32 I<br />> bond = return A .+ return B .+ return C .+ return D<br /><br /></pre><br />We can make H<sub>2</sub> like so:<br /><pre><br /><br />> h2 = simp $ do<br />> i <- bond<br />> h ! i<br />> h ! i<br /><br /></pre><br /><code>i <- bond</code> makes <code>i</code> a bond which we then attach to two hydrogen atoms. Evaluating <code>h2</code> will give us the hash for hydrogen gas.<br /><br />Rather than dive straight into CH<sub>4</sub> we can construct some useful building blocks. Hydrogen with a bond already attached:<br /><pre><br /><br />> h_ :: V Int32 I<br />> h_ = do<br />> m <- bond<br />> h ! m<br />> return m<br /><br /></pre><br />I'm using a trailing underscore _ to indicate a free bond.<br /><br /><code>ch2_</code> accepts one bond and returns another. Once memoised it is, in effect, just a 4x4 matrix and can be used to rapidly build chains.<br /><pre><br /><br />> ch2_ = memo $ \i -> simp $ do<br />> k <- bond<br />> l <- bond<br />> m <- bond<br />> h ! l<br />> h ! m<br />> c ! (i,k,l,m)<br />> return k<br /><br /></pre><br />So here's code to make an <a href = "http://en.wikipedia.org/wiki/Alkyl">alkyl</a> chain with a free bind at the end.<br /><pre><br /><br />> alkyl_ 0 = h_<br /><br />> alkyl_ n = simp $ do<br />> i <- alkyl_ (n-1)<br />> ch2_ ! i<br /><br /></pre><br />We can now make <a href="http://en.wikipedia.org/wiki/Alkanes">alkanes</a> by attaching a hydrogen atom at the end and make methane as a special case:<br /><pre><br /><br />> alkane n = simp $ alkyl_ n >>= (h ! )<br />> methane = alkane 1<br /><br /></pre><br />Now a <a href="http://en.wikipedia.org/wiki/Hydroxyl">hydroxyl group</a>:<br /><pre><br /><br />> oh = memo $ \i -> simp $ do<br />> j <- bond<br />> h ! j<br />> o ! (i,j)<br /><br /></pre><br />And you can have a drink on me:<br /><pre><br /><br />> ethanol = simp $ do<br />> i <- alkyl_ 2<br />> oh ! i<br /><br /></pre><br />Carbon double bonds turn out to be straightforward. We can simply use a pair of bonds:<br /><pre><br /><br />> doubleBond :: V Int32 (I,I)<br />> doubleBond = simp $ do<br />> i <- bond<br />> j <- bond<br />> return (i,j)<br /><br /></pre><br />Carbon-carbon double bonds <a href="http://en.wikipedia.org/wiki/Alkene#Bonding">tend not to twist</a> and this is reflected in the hash. We'd have to apply <code>symmetrise</code> if we wanted twistable bonds.<br /><br />We can make a pre-canned doubly bonded carbon atom pair:<br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_UdKHLrHa05M/SiwDA51O_fI/AAAAAAAAAW8/qe1h3D4ApW0/s1600-h/c_c.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 296px; height: 200px;" src="http://4.bp.blogspot.com/_UdKHLrHa05M/SiwDA51O_fI/AAAAAAAAAW8/qe1h3D4ApW0/s400/c_c.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5344650171840921074" /></a><br /><pre><br /><br />> c_c = memo $ \(i,j,k,l) -> simp $ do<br />> (m,n) <- doubleBond<br />> c ! (i,j,m,n)<br />> c ! (m,n,k,l)<br /><br /></pre><br />So we can build ethene like this<br /><pre><br /><br />> ethene = simp $ do<br />> i <- h_<br />> j <- h_<br />> k <- h_<br />> l <- h_<br />> c_c ! (i,j,k,l)<br /><br /></pre><br />Here's a methyl group with a free bond<br /><pre><br /><br />> ch3_ = simp $ do<br />> l <- h_<br />> ch2_ ! l<br /><br /></pre><br />So we can build a bunch more compounds<br /><pre><br /><br />> propene = simp $ do<br />> j <- ch3_<br />> i <- h_<br />> k <- h_<br />> l <- h_<br />> c_c ! (i,j,k,l)<br /><br />> cisbut2ene = simp $ do<br />> i <- ch3_<br />> j <- h_<br />> k <- ch3_<br />> l <- h_<br />> c_c ! (i,j,k,l)<br /><br />> transbut2ene = simp $ do<br />> i <- h_<br />> j <- ch3_<br />> k <- ch3_<br />> l <- h_<br />> c_c ! (i,j,k,l)<br /><br />> cisbut2ene' = simp $ do<br />> i <- h_<br />> j <- ch3_<br />> k <- h_<br />> l <- ch3_<br />> c_c ! (i,j,k,l)<br /><br />> _2methylpropene = simp $ do<br />> i <- ch3_<br />> j <- ch3_<br />> k <- h_<br />> l <- h_<br />> c_c ! (i,j,k,l)<br /><br />> _2methylpropene' = simp $ do<br />> i <- h_<br />> j <- h_<br />> k <- ch3_<br />> l <- ch3_<br />> c_c ! (i,j,k,l)<br /><br /></pre><br />An interesting problem is building a benzene ring. Here's a first attempt with six free bonds:<br /><pre><br /><br />> ring1 (p,q,r,s,t,u) = simp $ do<br />> i <- bond<br />> j <- bond<br />> k <- bond<br />> c_c ! (j,q,i,p)<br />> c_c ! (i,u,k,t)<br />> c_c ! (k,s,j,r)<br /><br /></pre><br />The problem with that is that <a href="http://en.wikipedia.org/wiki/Benzene#Structure">benzene rings</a> are special. The electrons are 'delocalised' so that the ring has rotational symmetry. We need to sum over the two consistent ways to assign single and double bonds around the ring. For the more general case of interlocking benzene rings we must still sum over all consistent assignments.<br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_UdKHLrHa05M/SiwBpGsJWYI/AAAAAAAAAW0/e0JQZ3v-iIM/s1600-h/benzene.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 210px;" src="http://4.bp.blogspot.com/_UdKHLrHa05M/SiwBpGsJWYI/AAAAAAAAAW0/e0JQZ3v-iIM/s400/benzene.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5344648663463975298" /></a><br /><br /><pre><br /><br />> ring = memo $ \(p,q,r,s,t,u) -> simp $ ring1 (p,q,r,s,t,u) .+ ring1 (q,r,s,t,u,p)<br /><br /></pre><br />And some more compunds:<br /><pre><br /><br />> phenyl = memo $ \p -> simp $ do<br />> q <- h_<br />> r <- h_<br />> s <- h_<br />> t <- h_<br />> u <- h_<br />> ring ! (p,q,r,s,t,u)<br /><br />> benzene :: V Int32 ()<br />> benzene = simp $ do<br />> i <- bond<br />> phenyl ! i<br />> h ! i<br /><br />> phenol :: V Int32 ()<br />> phenol = simp $ do<br />> i <- bond<br />> phenyl ! i<br />> oh ! i<br /><br />> toluene :: V Int32 ()<br />> toluene = simp $ do<br />> i <- ch3_<br />> phenyl ! i<br /><br />> toluene' :: V Int32 ()<br />> toluene' = simp $ do<br />> p <- h_<br />> q <- h_<br />> r <- h_<br />> s <- ch3_<br />> t <- h_<br />> u <- h_<br />> ring ! (p,q,r,s,t,u)<br /><br />> toluene'' :: V Int32 ()<br />> toluene'' = simp $ do<br />> p <- h_<br />> q <- h_<br />> r <- ch3_<br />> s <- h_<br />> t <- h_<br />> u <- h_<br />> ring ! (p,q,r,s,t,u)<br /><br />> _1_2_dimethylbenzene = simp $ do<br />> p <- ch3_<br />> q <- ch3_<br />> r <- h_<br />> s <- h_<br />> t <- h_<br />> u <- h_<br />> ring ! (p,q,r,s,t,u)<br /><br />> _1_3_dimethylbenzene = simp $ do<br />> p <- ch3_<br />> q <- h_<br />> r <- ch3_<br />> s <- h_<br />> t <- h_<br />> u <- h_<br />> ring ! (p,q,r,s,t,u)<br /><br />> _1_4_dimethylbenzene = simp $ do<br />> p <- ch3_<br />> q <- h_<br />> r <- h_<br />> s <- ch3_<br />> t <- h_<br />> u <- h_<br />> ring ! (p,q,r,s,t,u)<br /><br />> _1_5_dimethylbenzene = simp $ do<br />> p <- ch3_<br />> q <- h_<br />> r <- h_<br />> s <- h_<br />> t <- ch3_<br />> u <- h_<br />> ring ! (p,q,r,s,t,u)<br /><br /></pre><br />As we might hope, the three different ways to define toluene give the same result. We also discover that 1,3- and 1,5-dimethylbenzene are the same compound (or at least <em>probably</em> are, this isn't a <a href="http://en.wikipedia.org/wiki/Perfect_hash_function">perfect hash</a>).<br /><pre><br /><br />> main = do<br />> print $ "toluene = " ++ show toluene<br />> print $ "toluene' = " ++ show toluene'<br />> print $ "toluene'' = " ++ show toluene''<br />> print $ "_1_2_dimethylbenzene = " ++ show _1_2_dimethylbenzene<br />> print $ "_1_3_dimethylbenzene = " ++ show _1_3_dimethylbenzene<br />> print $ "_1_4_dimethylbenzene = " ++ show _1_4_dimethylbenzene<br />> print $ "_1_5_dimethylbenzene = " ++ show _1_5_dimethylbenzene<br /><br /></pre><br />Now I need to say something about performance. The above code is naive and performs many unnecessary summations. For example, hashing a long chain should only take time linear in its length. But using the above code indiscriminately could give you exponential time. A good implementation might take a divide and conquer approach: slice the molecule in half through a bunch of bonds, compute partial hashes for each half and then sew the halves together in time exponential in the number of bonds you sliced through. For the types of molecules I've seen in real pharmaceutical databases (say) this is actually pretty cheap if you're smart about the slicing. The hashes in the above code could probably be computed many thousands of times faster. As it is, you'll probably need to compile the above with optimisation.<br /><br />I'm willing to bet that with small changes, and with suitable choice of matrices over the <em>reals</em>, we can get invariants of molecules that predict physical properties. These calulations are reminiscent of algorithms for various types of counting algorithm so at the very least they probably compute things that are meaningful from a statistical mechanics perspective.<br /><br />Incidentally, this approach to stitching together atoms was inspired by an old paper by R. C. Penner on fatgraphs - nothing to do with chemistry. A few days ago he put a paper online about an application to <a href="http://arxiv.org/abs/0902.1025">modelling proteins</a>.<br /><br />Update: Since writing this I've found a name for what I'm doing. I'm converting a chemical structure diagram into a <a href="http://www.google.com/search?q=tensor+networks">tensor network</a>. There seems to be lots of literature on how to evaluate these efficiently. In effect, everything I've been doing in this blog in terms of converting diagrams to code is an example of evaluating a tensor network.<br /><br /><hr><br /><br />Now comes the memoisation class:<br /><pre><br /><br />> class Memoisable ix where<br />> memo :: (ix -> a) -> Array ix a<br /><br />> instance Memoisable I where<br />> memo f = array (A,D) [(i,f i) | i <- [A .. D]]<br /><br />> instance Memoisable (I,I) where<br />> memo f = array ((A,A),(D,D)) [(i,f i) |<br />> p <- [A .. D],<br />> q <- [A .. D],<br />> let i = (p,q) ]<br /><br />> instance Memoisable (I,I,I,I) where<br />> memo f = array ((A,A,A,A),(D,D,D,D)) [(i,f i) |<br />> p <- [A .. D],<br />> q <- [A .. D],<br />> r <- [A .. D],<br />> s <- [A .. D],<br />> let i = (p,q,r,s) ]<br /><br />> instance Memoisable (I,I,I,I,I,I) where<br />> memo f = array ((A,A,A,A,A,A),(D,D,D,D,D,D)) [(i,f i) |<br />> p <- [A .. D],<br />> q <- [A .. D],<br />> r <- [A .. D],<br />> s <- [A .. D],<br />> t <- [A .. D],<br />> u <- [A .. D],<br />> let i = (p,q,r,s,t,u) ]<br /><br /></pre><br />Missing instances from <code>Data.Array</code>:<br /><pre><br /><br />> instance (Ix a1, Ix a2, Ix a3, Ix a4, Ix a5, Ix a6) => Ix (a1,a2,a3,a4,a5,a6) where<br />> range ((l1,l2,l3,l4,l5,l6),(u1,u2,u3,u4,u5,u6)) =<br />> [(i1,i2,i3,i4,i5,i6) | i1 <- range (l1,u1),<br />> i2 <- range (l2,u2),<br />> i3 <- range (l3,u3),<br />> i4 <- range (l4,u4),<br />> i5 <- range (l5,u5),<br />> i6 <- range (l6,u6)]<br /><br />> unsafeIndex ((l1,l2,l3,l4,l5,l6),(u1,u2,u3,u4,u5,u6)) (i1,i2,i3,i4,i5,i6) =<br />> unsafeIndex (l6,u6) i6 + unsafeRangeSize (l6,u6) * (<br />> unsafeIndex (l5,u5) i5 + unsafeRangeSize (l5,u5) * (<br />> unsafeIndex (l4,u4) i4 + unsafeRangeSize (l4,u4) * (<br />> unsafeIndex (l3,u3) i3 + unsafeRangeSize (l3,u3) * (<br />> unsafeIndex (l2,u2) i2 + unsafeRangeSize (l2,u2) * (<br />> unsafeIndex (l1,u1) i1)))))<br /><br />> inRange ((l1,l2,l3,l4,l5,l6),(u1,u2,u3,u4,u5,u6)) (i1,i2,i3,i4,i5,i6) =<br />> inRange (l1,u1) i1 && inRange (l2,u2) i2 &&<br />> inRange (l3,u3) i3 && inRange (l4,u4) i4 &&<br />> inRange (l5,u5) i5 && inRange (l6,u6) i6<br /><br /></pre><br />And the same vector space monad I've used many times before. Strictly speaking, it's more like a semiring module monad as <code>Int32</code> isn't a field.<br /><pre><br /><br />> swap (x,y) = (y,x)<br /><br />> class Num k => VectorSpace k v | v -> k where<br />> zero :: v<br />> (.+) :: v -> v -> v<br />> (.*) :: k -> v -> v<br />> (.-) :: v -> v -> v<br />> v1 .- v2 = v1 .+ ((-1).*v2)<br /><br />> data V k a = V { unV :: [(k,a)] } deriving (Show)<br /><br />> reduce x = filter ((/=0) . fst) $ fmap swap $ M.toList $ M.fromListWith (+) $ fmap swap $ x<br />> simp (V x) = V (reduce x)<br /><br />> instance (Ord a,Num k) => Eq (V k a) where<br />> V x==V y = reduce x==reduce y<br /><br />> instance (Ord a,Num k,Ord k) => Ord (V k a) where<br />> compare (V x) (V y) = compare (reduce x) (reduce y)<br /><br />> instance Num k => Functor (V k) where<br />> fmap f (V as) = V $ map (\(k,a) -> (k,f a)) as<br /><br />> instance Num k => Monad (V k) where<br />> return a = V [(1,a)]<br />> x >>= f = join (fmap f x)<br />> where join x = V $ concat $ fmap (uncurry scale) $ unV $ fmap unV x<br />> scale k1 as = map (\(k2,a) -> (k1*k2,a)) as<br /><br />> instance Num r => MonadPlus (V r) where<br />> mzero = V []<br />> mplus (V x) (V y) = V (x++y)<br /><br />> instance (Num k,Ord a) => VectorSpace k (V k a) where<br />> zero = V []<br />> V x .+ V y = V (x ++ y)<br />> (.*) k = (>>= (\a -> V [(k,a)]))<br /><br />> e = return :: Num k => a -> V k a<br />> coefficient b (V bs) = maybe 0 id (L.lookup b (map swap (reduce bs)))<br /><br /></pre><div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-2073443236440619074?l=blog.sigfpe.com'/></div>sigfpehttp://www.blogger.com/profile/08096190433222340957noreply@blogger.com3tag:blogger.com,1999:blog-11295132.post-72345714254864123352009-05-16T13:34:00.000-07:002009-05-23T07:24:31.882-07:00Trace Diagrams with Monads<a href="http://blog.sigfpe.com/2008/08/untangling-with-continued-fractions.html">Knot diagrams</a> aren't the only kind of diagram that can be translated nicely into Haskell monad notation. Other types of diagram include Penrose <a href="http://en.wikipedia.org/wiki/Spin_network">Spin Networks</a>, many kinds of <a href="http://en.wikipedia.org/wiki/Feynman_diagram">Feynman Diagram</a>, Penrose <a href="http://en.wikipedia.org/wiki/Penrose%27s_graphical_notation">Tensor Notation</a>, <a href="http://www.nbi.dk/GroupTheory/">birdtracks</a> and a type of closely related diagram I hadn't met before: <a href="http://en.wikipedia.org/wiki/Trace_diagram">Trace Diagrams</a>.<br /><br />I encourage readers to check out the Wikipedia page and associated papers on trace diagrams as they give a better tutorial than I could write. My aim here is to show how those diagrams can be translated directly into working code just like with knots.<br /><br />As usual, this is literate Haskell so I need these lines:<br /><br /><pre><br />> {-# LANGUAGE MultiParamTypeClasses,FunctionalDependencies,FlexibleInstances #-}<br /><br />> module Main where<br /><br />> import qualified Data.Map as M<br />> import Control.Monad<br />> infixl 5 .+<br />> infixl 6 .*<br /></pre><br /><br />I'll reuse my vector space monad code from before and work in a 3D space with the axes labelled X, Y and Z.<br /><pre><br />> data Space = X | Y | Z deriving (Eq,Show,Ord)<br /></pre><br /><br />We draw vectors as little boxes with connections emerging from them:<br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_UdKHLrHa05M/Sg8oG2BtxCI/AAAAAAAAAUU/DqvIkkZPRic/s1600-h/pica.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 45px; height: 61px;" src="http://2.bp.blogspot.com/_UdKHLrHa05M/Sg8oG2BtxCI/AAAAAAAAAUU/DqvIkkZPRic/s400/pica.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5336528181504361506" /></a><br /><br />Now recall from my knot posts that we represent a diagram with m legs at the top and n legs at the bottom as an expression that takes an m-tuple as input and returns an n-tuple "in the monad" as output.<br /><br />Vectors can be represented as elements of <code>V Float Space</code>, for example:<br /><pre><br />> u,v,w :: V Float Space<br />> u = return X .- return Y<br />> v = return X .+ 2.* return Y<br />> w = return Y .- return Z<br /></pre><br />I could have emphasised that there are zero inputs at the top by using type signature <code>() -> V Float Space</code> instead.<br /><br />Given two vectors we can form their dot product. The dot product itself is represented by a little u-shaped curve:<br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_UdKHLrHa05M/Sg8pi3wQ6cI/AAAAAAAAAUc/SzAWswQZc5g/s1600-h/picb.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 104px; height: 70px;" src="http://1.bp.blogspot.com/_UdKHLrHa05M/Sg8pi3wQ6cI/AAAAAAAAAUc/SzAWswQZc5g/s400/picb.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5336529762516003266" /></a><br />So the dot product of v and w is drawn as:<br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_UdKHLrHa05M/ShBQR7-7p1I/AAAAAAAAAWU/Q5P9YW0HQC4/s1600-h/picc.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 150px; height: 105px;" src="http://1.bp.blogspot.com/_UdKHLrHa05M/ShBQR7-7p1I/AAAAAAAAAWU/Q5P9YW0HQC4/s400/picc.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5336853827523684178" /></a><br />(The i and j are just so you can see what corresponds to what in the code below. They're not really part of the diagram.)<br /><br />We can implement the dot product as<br /><pre><br />> cup :: (Space,Space) -> V Float ()<br />> cup (i,j) = case (i,j) of<br />> (X,X) -> return ()<br />> (Y,Y) -> return ()<br />> (Z,Z) -> return ()<br />> otherwise -> 0 .* return ()<br /></pre><br />and compute an example using<br /><pre><br />> vdotw = do<br />> i <- v<br />> j <- w<br />> cup (i,j)<br /></pre><br />We hook up legs of the diagram using corresponding inputs and outputs in the code.<br /><br />Now consider this diagram:<br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_UdKHLrHa05M/Sg8rokViFLI/AAAAAAAAAUs/s5KiRxQ4N08/s1600-h/picd.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 125px; height: 104px;" src="http://2.bp.blogspot.com/_UdKHLrHa05M/Sg8rokViFLI/AAAAAAAAAUs/s5KiRxQ4N08/s400/picd.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5336532059406079154" /></a><br />If we attach another vector to the free leg then we get the dot product. So this object is a thing that maps vectors to scalars. Ie. it's a dual vector. So dual vectors are represented by diagrams with a free leg at the top. We can redraw this diagram:<br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_UdKHLrHa05M/ShAgiacskiI/AAAAAAAAAVE/kOAtUHdCS10/s1600-h/pice.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 45px; height: 61px;" src="http://4.bp.blogspot.com/_UdKHLrHa05M/ShAgiacskiI/AAAAAAAAAVE/kOAtUHdCS10/s400/pice.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5336801334021362210" /></a>In other words, turning a vector v upside down turns it into a dual vector that takes w to the dot product of v and w. Here's some code for making the dual of v.<br /><pre><br />> dual :: V Float Space -> Space -> V Float ()<br />> dual v i = do<br />> j <- v<br />> cup (i,j)<br /></pre><br />We can also consider cross products. These take two vectors as input and output one. So we're looking at a diagram with two legs at the top and one at the bottom. We'll use a bold dot to represent one of these:<br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_UdKHLrHa05M/ShBGGT5huYI/AAAAAAAAAWM/qwKyiwHotLk/s1600-h/picf.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 206px; height: 181px;" src="http://1.bp.blogspot.com/_UdKHLrHa05M/ShBGGT5huYI/AAAAAAAAAWM/qwKyiwHotLk/s400/picf.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5336842632668756354" /></a><br />Here's the implementation:<br /><pre><br />> cross :: (Space,Space) -> V Float Space<br />> cross (X,Y) = return Z<br />> cross (Y,Z) = return X<br />> cross (Z,X) = return Y<br /><br />> cross (Y,X) = (-1) .* return Z<br />> cross (Z,Y) = (-1) .* return X<br />> cross (X,Z) = (-1) .* return Y<br /><br />> cross _ = mzero<br /></pre><br />We can form a triple product u.(v×w) like this:<br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_UdKHLrHa05M/ShAjCOPX65I/AAAAAAAAAVM/2r7iKyOIUaY/s1600-h/picg.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 253px; height: 196px;" src="http://2.bp.blogspot.com/_UdKHLrHa05M/ShAjCOPX65I/AAAAAAAAAVM/2r7iKyOIUaY/s400/picg.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5336804079523326866" /></a><br />We can then abstract out the triple product bit that looks like this:<br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_UdKHLrHa05M/ShAkWQCQpmI/AAAAAAAAAVU/477hs9QtpXA/s1600-h/pich.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 206px; height: 102px;" src="http://4.bp.blogspot.com/_UdKHLrHa05M/ShAkWQCQpmI/AAAAAAAAAVU/477hs9QtpXA/s400/pich.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5336805523114206818" /></a><br />Implementing it as:<br /><pre><br />> trident :: (Space,Space,Space) -> V Float ()<br />> trident (i,j,k) = do<br />> l <- cross (i,j)<br />> cup (l,k)<br /></pre><br />Remember that if u, v and w give the rows of a 3x3 matrix, then u.(v×w) is the determinant of that matrix.<br /><br />We can also define a dot product for dual vectors that we can draw like this:<br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_UdKHLrHa05M/ShAlddOjZCI/AAAAAAAAAVc/52iZrXTcud4/s1600-h/pici.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 104px; height: 70px;" src="http://3.bp.blogspot.com/_UdKHLrHa05M/ShAlddOjZCI/AAAAAAAAAVc/52iZrXTcud4/s400/pici.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5336806746426139682" /></a><br />The code looks like this:<br /><pre><br />> cap :: () -> V Float (Space,Space)<br />> cap () = return (X,X) .+ return (Y,Y) .+ return (Z,Z)<br /></pre><br />We can now combine the two dot products in a diagram like this:<br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_UdKHLrHa05M/ShBWthiml-I/AAAAAAAAAWk/asdvRbzwshk/s1600-h/picj.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 246px; height: 157px;" src="http://3.bp.blogspot.com/_UdKHLrHa05M/ShBWthiml-I/AAAAAAAAAWk/asdvRbzwshk/s400/picj.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5336860898531645410" /></a><br />We can write that as:<br /><pre><br />> cupcap i = do<br />> (j,k) <- cap ()<br />> cup (i,j)<br />> return k<br /></pre><br />We'd hope that we could pull this diagram taut and get the identity linear map. If you try applying <code>cupcap</code> to X, Y and Z you'll see it has exactly the same effect as <code>return</code>, which does indeed represent the identity.<br /><br />(If you allow me to digress, I'll point out that there's something really deep going on with this almost trivial looking identity. It represents the identity map in the sense that it copies the input i to the output k. Imagine we were dealing with the <a href="http://blog.sigfpe.com/2007/04/trivial-monad.html">trivial monad</a>, ie. the one that just wraps values. Then no matter how <code>cup</code> and <code>cap</code> were implemented it would be impossible for k to be a copy of i. If you follow the flow of information through that code then i disappears into <code>cup</code> and k is read from <code>cap</code> without it seeing i. If we read from top to bottom we can think of cap as emitting a pair of objects and of cup as absorbing two. There is no way that any information about i can be communicated to k. But in the vector space monad, k <em>can</em> depend on i. As I've mentioned a few times over the years, the universe is described by quantum mechanics which can be described using the <a href="http://blog.sigfpe.com/2007/02/monads-for-vector-spaces-probability.html">vector space monad</a>. Amazingly the above piece of code, or at least something like it, can be physically realised in terms of particles. It describes a process that is fundamentally quantum, and not classical. In fact, Coecke shows that this is a precursor to quantum teleportation in section 3c of <a href="http://arxiv.org/pdf/quant-ph/0510032v1">this paper</a>. You could also think in terms of information about i being sent back in time through the cap. That's the idea behind this paper on <a href="http://arxiv.org/pdf/0902.4898">Effective Quantum Time Travel</a>.)<br /><br />Now we can make a fork by bending down the tines of the cross product:<br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_UdKHLrHa05M/ShAqHfpD0dI/AAAAAAAAAVs/Ggt5XyhcOz8/s1600-h/pickk.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 206px; height: 102px;" src="http://3.bp.blogspot.com/_UdKHLrHa05M/ShAqHfpD0dI/AAAAAAAAAVs/Ggt5XyhcOz8/s400/pickk.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5336811866675204562" /></a><br /><pre><br />> fork () = do<br />> (i,j) <- cap ()<br />> (k,l) <- cap ()<br />> m <- cross (j,k)<br />> return (i,l,m)<br /></pre><br /><br />We can write matrices as boxes with a leg for input and a leg for output. Here's an example matrix:<br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_UdKHLrHa05M/ShArlywlSMI/AAAAAAAAAV0/IsvSKCKGw9M/s1600-h/picl.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 53px; height: 79px;" src="http://3.bp.blogspot.com/_UdKHLrHa05M/ShArlywlSMI/AAAAAAAAAV0/IsvSKCKGw9M/s400/picl.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5336813486714734786" /></a><br /><pre><br />> a :: Space -> V Float Space<br />> a X = 2 .* return X<br />> a Y = return Z<br />> a Z = (-1) .* return Y<br /></pre><br />It rotates by 90 degrees around the X axis and scales the X axis by a factor of two.<br /><br />With the help of our two dot products we can turn a matrix upside-down:<br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_UdKHLrHa05M/ShAtUAI2MaI/AAAAAAAAAV8/JNRV8VMC7zQ/s1600-h/picm.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 154px;" src="http://1.bp.blogspot.com/_UdKHLrHa05M/ShAtUAI2MaI/AAAAAAAAAV8/JNRV8VMC7zQ/s400/picm.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5336815380091777442" /></a><br />The corresponding code is:<br /><pre><br />> b :: Space -> V Float Space<br />> b l = do<br />> (i,j) <- cap ()<br />> k <- a j<br />> cup (k,l)<br />> return i<br /></pre><br />Turning a matrix upside down gives its transpose. You'll find that matrix B rotates in the opposite direction to A but still scales by a factor of two.<br /><br />Surprisingly, 3! times the determinant of a 3x3 matrix A can be represented by this diagram:<br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_UdKHLrHa05M/ShBRffcw89I/AAAAAAAAAWc/reNZfgEVFx4/s1600-h/picn.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 257px; height: 227px;" src="http://4.bp.blogspot.com/_UdKHLrHa05M/ShBRffcw89I/AAAAAAAAAWc/reNZfgEVFx4/s400/picn.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5336855159893980114" /></a><br />So we can write a determinant routine as follows:<br /><pre><br />> det a = do<br />> (i,j,k) <- fork ()<br />> i' <- a i<br />> j' <- a j<br />> k' <- a k<br />> (1/6.0) .* trident (i',j',k')<br /></pre><br />(Again I've labelled the diagram so you can easily see what corresponds where in the code.)<br /><br />I could keep going, but at this point I'll just defer to Elisha Peterson's <a href="http://arxiv.org/pdf/0712.2058">paper</a>. I hope that I've given you enough clues to be able to translate his diagrams into Haskell code, in effect giving semantics for his syntax. As an exercise, try writing code to compute the adjugate of a 3x3 matrix.<br /><br />And a reminder: none of the above is intended as production-worthy code for working with 3-vectors. It is intended purely as a way to give a practical realisation of trace diagrams allow people to experimentally investigate their properties and make testable conjectures.<br /><br />And now comes the library code needed to make the above code work:<br /><br /><pre><br />> swap (x,y) = (y,x)<br /><br />> class Num k => VectorSpace k v | v -> k where<br />> zero :: v<br />> (.+) :: v -> v -> v<br />> (.*) :: k -> v -> v<br />> (.-) :: v -> v -> v<br />> v1 .- v2 = v1 .+ ((-1).*v2)<br /><br />> data V k a = V { unV :: [(k,a)] }<br />> instance (Num k,Ord a,Show a) => Show (V k a) where<br />> show (V x) = show (reduce x)<br /><br />> reduce x = filter ((/=0) . fst) $ fmap swap $ M.toList $ M.fromListWith (+) $ fmap swap $ x<br /><br />> instance (Ord a,Num k) => Eq (V k a) where<br />> V x==V y = reduce x==reduce y<br /><br />> instance (Ord a,Num k,Ord k) => Ord (V k a) where<br />> compare (V x) (V y) = compare (reduce x) (reduce y)<br /><br />> instance Num k => Functor (V k) where<br />> fmap f (V as) = V $ map (\(k,a) -> (k,f a)) as<br /><br />> instance Num k => Monad (V k) where<br />> return a = V [(1,a)]<br />> x >>= f = join (fmap f x)<br />> where join x = V $ concat $ fmap (uncurry scale) $ unV $ fmap unV x<br />> scale k1 as = map (\(k2,a) -> (k1*k2,a)) as<br /><br />> instance Num r => MonadPlus (V r) where<br />> mzero = V []<br />> mplus (V x) (V y) = V (x++y)<br /><br />> instance (Num k,Ord a) => VectorSpace k (V k a) where<br />> zero = V []<br />> V x .+ V y = V (x ++ y)<br />> (.*) k = (>>= (\a -> V [(k,a)]))<br /><br />> e = return :: Num k => a -> V k a<br />> coefficient b (V bs) = maybe 0 id (lookup b (map swap (reduce bs)))<br /></pre><div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-7234571425486412335?l=blog.sigfpe.com'/></div>sigfpehttp://www.blogger.com/profile/08096190433222340957noreply@blogger.com4tag:blogger.com,1999:blog-11295132.post-39300700472006726362009-05-03T08:30:00.000-07:002009-05-03T16:05:47.089-07:00The Three Projections of Doctor Futamura<H3>Introduction</H3><br />The <a href="http://en.wikipedia.org/wiki/Partial_evaluation#Futamura_projections">Three Projections of Futamura</a> are a sequence of applications of a programming technique called 'partial evaluation' or 'specialisation', each one more mind-bending than the previous one. But it shouldn't be programmers who have all the fun. So I'm going to try to explain the three projections in a way that non-programmers can maybe understand too. But whether you're a programmer or not, this kind of self-referential reasoning can hurt your brain. At least it hurts mine. But it's a good pain, right?<br /><br />So rather than talk about computer programs, I'll talk about machines of the mechanical variety. A bit like computer programs, these machines will have some kind of slot for inputting stuff, and some kind of slot where output will come out. But unlike computer programs, I'll be able to draw pictures of them to show what I'm talking about. I'll also assume these machines have access to an infinite supply of raw materials for manufacturing purposes and I'll also assume that these machines can replicate stuff - because in a computer we can freely make copies of data, until we run out of memory at least.<br /><br /><H3>Minting coins</H3><br />A really simple example of a machine is one that has a slot for inputting blanks, and outputs newly minted coins:<br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_UdKHLrHa05M/Sf3BzV77HfI/AAAAAAAAATc/flT5FsYW8_M/s1600-h/dollar_minting.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 340px; height: 177px;" src="http://4.bp.blogspot.com/_UdKHLrHa05M/Sf3BzV77HfI/AAAAAAAAATc/flT5FsYW8_M/s400/dollar_minting.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5331630621682507250" /></a><br />That's a dedicated $1 manufacturing machine. We could imagine that internally it stamps the appropriate design onto the blank and spits out the result.<br /><br />It'd be more interesting if we could make a machine with another input slot that allowed us to input the description of the coin. By providing different inputs we could mint a variety of different coins with one machine. I'm going to adopt the convention that when we want to input a description we input a picture of the result we want. I'll draw pictures as rectangles with the subject inside them. Here's a general purpose machine manufacturing pound coins:<br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_UdKHLrHa05M/Sf3EiSy7iNI/AAAAAAAAATk/GX6t98LFgPY/s1600-h/general_minting.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 364px; height: 261px;" src="http://4.bp.blogspot.com/_UdKHLrHa05M/Sf3EiSy7iNI/AAAAAAAAATk/GX6t98LFgPY/s400/general_minting.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5331633627316586706" /></a><br />The same machine could make dollars, zlotys or yen. You could imagine this machine works by taking the description and then milling the coin <a href="http://en.wikipedia.org/wiki/CNC">CNC style</a>. We call such a machine an <a href="http://en.wikipedia.org/wiki/Interpreter_(computing)">interpreter</a>. It interprets the instructions and produces its result.<br /><br />The interpreter has a great advantage over the dedicated dollar mint. You make make any kind of coin. But it's going to run a lot slower. The dedicated minter can just stamp a coin in one go. The interpreter can't do this because every input might be different. It has to custom mill each coin individually. Is there a way to get the benefits of both types of machine? We could do this: take the coin description and instead of milling the coin directly we mill negative reliefs for both sides of the coin. We then build a new dedicated minting machine that uses these negatives to stamp out the coin. In other words we could make a machine that takes as input a coin description and outputs a dedicated machine to make that type of coin. This kind of machine making machine is called a <a href="http://en.wikipedia.org/wiki/Compiler">compiler</a>. It takes a set of instructions, but instead of executing them one by one, it makes a dedicated machine to perform them. Here's one in action:<br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_UdKHLrHa05M/Sf3IsNROlyI/AAAAAAAAATs/Jn1cn9LrF1w/s1600-h/compiling.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 327px;" src="http://2.bp.blogspot.com/_UdKHLrHa05M/Sf3IsNROlyI/AAAAAAAAATs/Jn1cn9LrF1w/s400/compiling.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5331638195678254882" /></a><br /><br />So here are the two important concepts so far:<br /><br /><em>Interpreters</em>: these take descriptions or instructions and use them to make the thing described.<br /><em>Compilers</em>: these take descriptions or instructions and use them to make a machine dedicated to making the thing described. The process of making such a machine from a set of instructions is known as compiling.<br /><br />The Projections of Doctor Futamura help make clear the relationship between these kinds of things.<br /><br /><H3>Specialisation</H3><br />We need one more important concept: the specialiser. Suppose we have a machine that has two inputs slots, A and B. But now suppose that when we use the machine we find that we vary the kind of thing we put into slot B, but always end up putting the same thing into slot A. If we know that slot A will always get the same input then we could streamline the machine using our knowledge of the properties of A. This is similar to the minting situation - if we know we're always going to make $1 coins then we can dedicate our machine to that purpose. In fact, if we know that we're always going to input the same thing into slot A we don't even need slot A any more. We could just stick an A inside the machine and whenever the user inputs something to slot B, the machine would then replicate the A and then use it just as if it had been input.<br /><br />In summary, given a machine with two slots A and B, and given some input suitable for slot A, we could redesign it as a machine with just a B slot that automatically, internally self-feeds the chosen item to A. But we can often do better than this. We don't need to self-feed stuff to slot A. We might be able to redesign the way the machine works based on the assumption that we always get the same stuff going into slot A. For example, in the minting example a dedicate $1 minter was more specialised than just a general purpose minter that interpreted the instructions for making a $1 coin. This process of customising a machine for a particular input to slot A is called specialisation or <a href="http://en.wikipedia.org/wiki/Partial_evaluation">partial evaluation</a>.<br /><br />Now imagine we have a machine for automatically specialising designs for machines. It might have two slots: one for inputting a description for a two slot machine with slots A and B, and one for inputting stuff suitable for slot A. It would then print out a description for a customised machine with just a slot B. We could call it a specialisation machine. Here is one at work:<br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_UdKHLrHa05M/Sf3QNaRFfOI/AAAAAAAAAT0/ghjfjtM42q8/s1600-h/specialisation.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 332px;" src="http://2.bp.blogspot.com/_UdKHLrHa05M/Sf3QNaRFfOI/AAAAAAAAAT0/ghjfjtM42q8/s400/specialisation.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5331646462684396770" /></a><br />It's converting a description of a two input machine into a description of a one input machine.<br /><br /><H3>The First Projection</H3><br />The process of specialisation is similar to what I was talking about with dedicated minting machines. Rather than just have a similarity we can completely formalise this. Note that the interpreter above takes two inputs. So the design for an interpreter could be fed into the first input of a specialiser. Now we feed a description the coin we want into slot B. The specialiser whirrs away and eventually outputs a description of a machine that is an interpreter that is dedicated to making that one particular coin. The result will describe a machine with only one input suitable for blanks. In other words, we can use a specialiser as a compiler. This is the first of Doctor Futamura's Projections. Here's a picture of the process at work:<br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_UdKHLrHa05M/Sf3UEyLB5zI/AAAAAAAAAT8/rEQ6HMe0xDo/s1600-h/projection1.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 332px;" src="http://4.bp.blogspot.com/_UdKHLrHa05M/Sf3UEyLB5zI/AAAAAAAAAT8/rEQ6HMe0xDo/s400/projection1.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5331650712529135410" /></a><br />What this shows is that you don't need to make compilers. You can make specialisers instead. This is actually a very practical thing to do in the computing world. For example there are <a href="http://www.rapidmind.net/">commercial products</a> (I'm not connected with that product in any way) that can specialise code to run on a specific architecture like <a href="http://en.wikipedia.org/wiki/CUDA">CUDA</a>. It's entirely practical to convert an interpreter to a compiler with such a tool. By writing a specialiser, the purveyors of such tools allow third parties to develop their own compilers and so this is more useful than just writing a dedicated compiler.<br /><br /><H3>The Second Projection</h3><br />Time to kick it up a notch. The first input to the specialiser is a description of a two input machine. But the specialiser is itself a two input machine. Are you thinking what I'm thinking yet? We could stuff a description of a specialiser into the specialiser's own first input! In the first projection we provided an interpreter as input to the specialiser. If we know we're always going to want to use the same interpreter then we could streamline the specialiser to work specifically with this input. So we can specialise the specialiser to work with our interpreter like this:<br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_UdKHLrHa05M/Sf3e7h4AtjI/AAAAAAAAAUE/inyy38iprQs/s1600-h/projection2a.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 352px;" src="http://2.bp.blogspot.com/_UdKHLrHa05M/Sf3e7h4AtjI/AAAAAAAAAUE/inyy38iprQs/s400/projection2a.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5331662648163481138" /></a><br />But what is that machine whose description it has output? An interpreter takes as input a description of how to operate on some stuff, like turning blanks into coins. In effect, the output machine has the interpreter built into it. So it takes descriptions and outputs a machine for performing those instructions. In other words it's a compiler. If the specialiser is any good then the compiler will be good too. It won't just hide an interpreter in a box and feed it your description, it will make dedicated parts to ensure your compiler produces a fast dedicated machine. And that is Doctor Futamura's Second Projection.<br /><br /><H3>The Third Projection</H3><br />But we can go further. The specialiser can accept a description of a specialiser as its first input. That means we can specialise it specifically for this input. And to do that, we use a specialiser. In other words we can feed a descrption of a specialiser into <em>both</em> inputs of the specialiser! Here we go:<br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_UdKHLrHa05M/Sf3iAXhRJxI/AAAAAAAAAUM/uv4uRQXeWAk/s1600-h/projection3.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 358px;" src="http://2.bp.blogspot.com/_UdKHLrHa05M/Sf3iAXhRJxI/AAAAAAAAAUM/uv4uRQXeWAk/s400/projection3.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5331666029817964306" /></a><br />But what is the X machine that it outputs? In the second projection we pass in an interpreter as the second argument and get back a compiler. So the third projection gives us a dedicated machine for this task. The X machine accepts the description of an interpreter as input and outputs the description of a compiler. So the X machine is a dedicated interpreter-to-compiler converter. And that is the Third Projection of Doctor Futamura.<br /><br />If we have a specialiser we never need to make a compiler again. We need only design interpreters that we can automatically convert to compilers. In general it's easier to write interpreters than compilers and so in principle this makes life easier for programmers. It also allows us to compartmentalise the building of compilers. We can separate the interpreter bit from the bit that fashions specific parts for a task. The specialiser does the latter so our would-be compiler writer can concentrate on the former. But who would have guessed that passing a specialiser to itself <em>twice</em> would give us something so useful?<br /><H3>Summary</H3><br />So here are the projections:<br /><OL><br /><LI>Compiling specific programs to dedicated machines.<br /><LI>Making a dedicated machine for compilation from an interpreter.<br /><LI>Making a machine dedicated to the task of converting interpreters into compilers.<br /></OL><br />There are lots of variations we can play with. I've just talked about descriptions of things without saying much about what those descriptions look like. In practice there are lots of different 'languages' we can use to express our descriptions. So variations on these projections can generate descriptions in different languages, possibly converting between them. We might also have lots of different specialisers that are themselves optimised for specific types of specialisation. The Futamura projections give interesting ways to combine these. And there are also variations for generating dedicating machines for other tasks related to compiling - like parsing the descriptions we might feed in as input.<br /><br />If you want to read more on this subject there's a <a href="http://www.itu.dk/people/sestoft/pebook/">whole book</a> online with example code. They're not easy things to design.<br /><br />I think that specialisation is a killer feature that I'd like to see more of. Present day compilers (and here I'm talking about computers, not machines in general) are hard-coded black boxes for the task of compilation. They're not very good at allowing you to get in there and tweak the way compilation occurs - for example if you want to generate code according to a strategy you know. Specialisation is a nice alternative to simply bolting an API onto a compiler. It would make it easy for anyone to write optimising and optimised compilers for their own languages and combine such compilers with interpreters for interactive instead of offline compilation.<br /><br />I learnt about this stuff, as well as lots of other stuff in my blog, from the excellent <a href="http://books.google.com/books?id=Xut5JAAACAAJ">Vicious Circles</a>. The theory is closely related to the theory of writing quines that I used for my <a href="http://blog.sigfpe.com/2008/02/third-order-quine-in-three-languages.html">three language quine</a>.<br /><br />And if you keep your ears to the ground you can hear rumours of a fabled <a href="http://portal.acm.org/citation.cfm?id=1480954">fourth projection</a>...<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-3930070047200672636?l=blog.sigfpe.com'/></div>sigfpehttp://www.blogger.com/profile/08096190433222340957noreply@blogger.com21tag:blogger.com,1999:blog-11295132.post-53163879782749084802009-04-04T09:38:00.000-07:002009-04-04T16:06:08.789-07:00Faster than a speeding photon<h3>How to outrun a photon</H3><br />I thought it would be fun to try to give a readable account of Unruh effect. It's a surprising phenomenon, and there isn't universal agreement over what exactly the theory predicts, let alone whether the effect has ever been observed. It has important implications for physics and philosophy and may even give a way to test some aspects of quantum gravity in the lab.<br /><br />One way to start the story is consideration of this problem: if a photon is speeding towards you, can you outrun it? Let's simplify things a bit so that we're considering motion in one dimension.<br /><br />If we're confined to one dimension, we can't dodge the photon, we can only hope to remain ahead of it. As the only things that can travel at the speed of light, c, are massless things like photons, it seems that there is no hope for a massive thing like a person in a spaceship to avoid it. The photon will always be faster than you, and so it'll catch you.<br /><br />But in theory you can outrun a photon! Do you see the flaw in the above reasoning that made it seem impossible?<br /><br />The best way to make things clear is to draw a diagram. We'll plot some graphs of position vs. time for some photons and spaceships. We'll have time going up the vertical axis and position along the horizontal axis. Here's an example:<br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_UdKHLrHa05M/SdeNS5RaPYI/AAAAAAAAASs/3XsgY3097Zk/s1600-h/constant_velocity.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 400px;" src="http://3.bp.blogspot.com/_UdKHLrHa05M/SdeNS5RaPYI/AAAAAAAAASs/3XsgY3097Zk/s400/constant_velocity.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5320876840512535938" /></a><br />I've chosen units so that one second on the vertical axis is drawn the same size as one light-second on the horizontal axis. The net result is that photons always travel at 45 degree angles to the axes. Massive objects, that travel slower than light, are confined to travel on courses that have angles of smaller than 45 degrees with respect to the vertical axis. The path of the photon is the diagonal black line and the path of a spaceship is in red. It starts to the right of the photon but as we move up the time axis the photon eventually catches up with it.<br /><br />If the spaceship travels faster then it will follow an angle closer to 45 degrees. Here are a pair of paths corresponding to faster spaceships:<br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_UdKHLrHa05M/SdeOCbjzPXI/AAAAAAAAAS0/vAv2BpV1ZU4/s1600-h/constant_velocities.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 400px;" src="http://2.bp.blogspot.com/_UdKHLrHa05M/SdeOCbjzPXI/AAAAAAAAAS0/vAv2BpV1ZU4/s400/constant_velocities.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5320877657170328946" /></a><br />The faster the ship is, the further it gets before the photon catches up. But we're just putting off the inevitable. It seems that whatever we do, the photon will always catch up.<br /><br />But there's a hidden assumption in the above. By drawing straight lines for the spaceship I was assuming it was travelling at a constant velocity. But there's no reason for that to be true. Here's a different path the spaceship could follow:<br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_UdKHLrHa05M/SdeOrfKoy1I/AAAAAAAAAS8/9HrWx5ddQzM/s1600-h/accelerating.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 400px;" src="http://3.bp.blogspot.com/_UdKHLrHa05M/SdeOrfKoy1I/AAAAAAAAAS8/9HrWx5ddQzM/s400/accelerating.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5320878362513165138" /></a><br /><br />At no point does the red path of the spaceship meet the black path of the photon. And yet at no point does the red path reach 45 degrees to the vertical axis. In other words, the spaceship never travels at the speed of light, and yet the photon never catches up with it. Spaceships can outrun photons!<br /><br />So what kind of path is that? It's actually a hyperbola and it corresponds to a spaceship accelerating at a constant rate. You might wonder how it can be constant acceleration when the speed of the spaceship never exceeds that of light. From an external observer's point of view, after a while it does look like the ship is travelling at a more or less constant velocity close to the speed of light. But from the point of view of someone on the spaceship it feels exactly like constant acceleration. So that is the path that would be taken by a spaceship with its thrusters firing at a constant rate.<br /><br /><h3>An event horizon!</h3><br />I chose that path so that the spaceship stays just in front of the photon. A photon that starts slightly to the right will eventually catch up with the ship. But photons starting further to the left of the ship will never reach it. This means that absolutely nothing starting to the left of the black photon path can ever catch the ship. That should sound familiar. It's exactly like a black hole. From the point of view of someone on the ship, the diagonal black line is exactly like the <a href="http://en.wikipedia.org/wiki/Event_horizon">event horizon</a> of a black hole. Nothing to the left of it can ever be seen by observers on the ship.<br /><br />What does it look like if an observer in the ship watches something that crosses the event horizon? Here's another diagram:<br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_UdKHLrHa05M/SdeQwA36KZI/AAAAAAAAATE/ndoXPnKNDEE/s1600-h/falling_over_horizon.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 400px;" src="http://4.bp.blogspot.com/_UdKHLrHa05M/SdeQwA36KZI/AAAAAAAAATE/ndoXPnKNDEE/s400/falling_over_horizon.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5320880639304149394" /></a><br /><br />Again, the black diagonal line is the path of a photon, which we know is a bit like an event horizon. The blue line is the path of an object at rest. In effect, it's falling over our <a href="http://en.wikipedia.org/wiki/Event_horizon#Event_horizon_of_an_accelerated_particle">apparent event horizon</a>. Of course the blue object doesn't see any unusual phenomenon on approaching the event horizon because there's nothing really there - it's only something seen by observers in the ship. The blue object emits a series of photons (shown in green) at equal intervals. As long as these photons are emitted before the event horizon they eventually catch up with the red spaceship. But notice how they arrive at more and more widely spaced intervals. A photon released exactly at the event horizon never reaches the ship. So the viewers on the ship see the spacing between the photons get longer and longer. They'll never see the blue object cross the event horizon they'll just see it getting closer and closer until eventually it appears to freeze. Again, this is just like a black hole event horizon.<br /><br />The universe looks pretty weird from the <a href="http://en.wikipedia.org/wiki/Rindler_coordinates">point of view</a> of a constantly accelerating observer. Half of it is simply missing behind an event horizon. But that's just the start of the weirdness. When we throw Quantum Mechanics into the mix something much weirder happens.<br /><br /><h3>Matter from a vacuum</h3><br />It's well known that physicists expect black holes to emit particles as <a href="http://en.wikipedia.org/wiki/Hawking_radiation">Hawking radiation</a>. But our accelerating observer sees something like a black hole, so we might expect them to see something like Hawking radiation. We also know that an observer at rest sees no event horizon. Which means that we might predict that accelerating observers see particles that observers at rest don't. Can we take such a prediction seriously?<br /><br />Let's look a bit more closely at this. According to a popular view of quantum mechanics, the vacuum is teeming with vacuum fluctuations - ephemeral particle-antiparticle pairs that briefly come into existence and then annihilate each other. In the diagram below I've drawn some of these events:<br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_UdKHLrHa05M/SdelVVHlgZI/AAAAAAAAATM/SGzr_D7441w/s1600-h/pair_production.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 400px;" src="http://4.bp.blogspot.com/_UdKHLrHa05M/SdelVVHlgZI/AAAAAAAAATM/SGzr_D7441w/s400/pair_production.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5320903270626328978" /></a><br /><br />As we follow up the time axis, pairs of (complementary coloured) particles come into existence and then annihilate each other. These events are so fleeting that they have no effect on our particle detectors and we see a vacuum. But note that I've drawn one of these events straddling our apparent event horizon. From the point of view of an accelerating observer this looks like a pair of particles coming into existence, but because of the argument I sketched above, they seem to freeze near the event horizon. In other words, to an accelerating observer these fleeting events are no longer fleeting, they look like real particles coming into existence and sticking around forever. Accelerating observers appear to see particles in a vacuum!<br /><br />What I've described above is absolutely <em>not</em> a rigourous argument. But amazingly, when you use the machinery of quantum field theory you end up making exactly the same prediction: accelerating observers see particles. This is known as the <a href="http://en.wikipedia.org/wiki/Unruh_effect">Unruh effect</a>. When you do this properly you can compute a bit more detail. It turns out that the energies of the particles are random with exactly the same distribution as <a href="http://en.wikipedia.org/wiki/Black_body">black body radiation</a>. In other words, the vacuum looks like it has a glow corresponding to a particular temperature that is proportional to the acceleration. But it's not a bright glow. You need to accelerate at about 10<sup>20</sup> m/s<sup>2</sup> before the temperature appears to be 1K. Building a thermometer that can survive such accelerations is no mean feat. So it looks like the Unruh effect is a curiosity that might never be observed in the lab.<br /><br />But it has been suggested that the Unruh effect has already been observed. There aren't many things that can survive that kind of acceleration, but an electron can, and an electron can behave like a thermometer. Electrons in circular particle accelerators routinely undergo the kinds of accelerations we're talking about. They do so because they are driven by a magnetic field. Now an electron has spin, so you can think of it as a bit like a little electric current running round in a loop. That means an electron is like a little dipole electromagnet. Magnets in magnetic fields tend to want to line up along the field - that's how a compass works. So electrons that spend long enough in a particle accelerator, eg. those in a storage ring, should eventually line up with the field. Lining up like this is known as polarisation, and in this particular case it's known as he <a href="http://en.wikipedia.org/wiki/Sokolov-Ternov_effect">Sokolov Ternov</a> effect. But when we look at electrons in a storage ring it turns out they're not quite completely lined up, they're slightly depolarised. This is easily explained by Unruh radiation - they're constantly accelerating and so they feel themselves to be in a hot environment. The continual interaction with this hot environment causes the electron spins to be a bit randomised, so they don't all line up nicely.<br /><br />Unfortunately this isn't definitive evidence for Unruh radiation because when we carry out the full calculation of the Sokolov-Ternov effect it turns out that it predicts partial depolarisation anyway. Now it looks like we don't have evidence for the Unruh effect. But it's not that simple. The Unruh effect isn't a new effect made up by a physicist. It's a prediction based on a new way of looking at fairly conventional physics. The Sokolov-Ternov effect is also predicted from standard physics, just in a different frame of reference. So maybe the partial depolarisation predicted by this effect is in fact the very same thing as the Unruh effect, just looked at from a different point of view.<br /><br />What does this mean philosophically? We're used to the idea that looking at things from different angles changes how they look. Einstein extended this notion to spacetime so space and time seem different to moving observers. The Unruh effect goes one step further. Whether or not an individual particle exists depends on your point of view. Do particles not have any kind of existence independently of how we look at them? And how does it look to an observer at rest watching an accelerating observer fly by with a thermometer. Do they see a thermometer apparently responding to nothing? Or do they also see the particles once they've interacted with a thermometer? It's all so weird that some physicists take the view that the notion of the particle is outdated and we should only be talking about quantum mechanical wavefunctions.<br /><br />There's another reason why the Unruh effect is important. The Unruh effect doesn't involve General Relativity, which is all about curved spacetime. But it does use the same mathematical machinery so it gives a way to test out that mathematics. So even if we don't have laboratory black holes to play with, we may still be able to investigate the mathematical framework that predicts phenomena like the Hawking effect. But also note that at least one paper claims it's all a mathematical error and there is no Unruh effect in reality.<br /><br />In recent years there has been a dramatic increase in the number of papers on the Unruh effect. I expect this trend is going to keep going for a while<br /><br /><H3>References</H3><br /><OL><br /><LI>I learnt about the Unruh effect from Wald's book <a href="http://books.google.com/books?id=Iud7eyDxT1AC">Quantum field theory in curved spacetime and black hole thermodynamics</a>.<br /><LI>The diagram showing particle creation/annihilation events straddling the event horizon came from Susskind and Lindesay's <a href="http://books.google.com/books?id=cxJCBRUNmVYC">An Introduction to Black Holes, Information and the String Theory Revolution</a>.<br /><LI>I tried to catch up with recent developments by reading some of the paper <a href="http://arxiv.org/abs/0710.5373">The Unruh effect and its applications</a>.<br /></OL><div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-5316387978274908480?l=blog.sigfpe.com'/></div>sigfpehttp://www.blogger.com/profile/08096190433222340957noreply@blogger.com26tag:blogger.com,1999:blog-11295132.post-69871688947409660112009-03-07T13:13:00.000-08:002009-03-07T15:41:49.182-08:00Dinatural Transformations and CoendsAbstract nonsense warning: my goal here is simply to understand what the definition of a <a href="http://en.wikipedia.org/wiki/End_%28category_theory%29">coend</a> means in the context of Haskell.<br /><br /><pre><br />> {-# LANGUAGE Rank2Types,ExistentialQuantification #-}<br /><br /></pre><br />Pick a type. Let's call it <tt>X</tt>. Now consider the type<br /><pre><br />forall a . (a -> X,a) -> X<br /></pre><br />What can we say about elements of this type? There's an obvious example of such a function, <tt>eval</tt> that applies the first element of the pair to the second.<br /><br /><pre><br />> eval (f,x) = f x<br /><br /></pre><br />It's also easy to construct others. What property do they all share? Well there's a nice way to answer this question. We can use Janis Voigtländer's <a href="http://homepages.inf.ed.ac.uk/wadler/topics/parametricity.html">Free Theorem</a> <a href="http://linux.tcs.inf.tu-dresden.de/~voigt/ft/">Generator</a>. After a bit of simplification it tells us that if <tt>f</tt> is in<br /><pre><br />forall a . (a -> X,a) -> X<br /></pre><br />then for any compatible <tt>g</tt><br /><pre><br />f (x,g y) = f (x . g,y)<br /></pre><br />We can apply any function we like to the second argument and we get exactly the same result by pre-composing with the first argument.<br /><br />This is a pretty strong property - it puts a big constraint on what <tt>f</tt> can do. It holds because any function of type <tt>forall a . (a -> X,a) -> X</tt> maps to a type <tt>X</tt> that makes no reference to the quantified <tt>a</tt>. It can't let any information about the type <tt>a</tt> escape. And this means that <tt>f</tt> has to somehow eliminate an element of <tt>a</tt> and and element of <tt>a->X</tt>. There's only one non-trivial way to do that: provide the former to the latter as a function argument. So <tt>f</tt> must factor as <tt>h . eval</tt> for some function <tt>h</tt>. The free theorem comes from the fact that <tt>eval (x,g y) = eval (x.g,y)</tt>. Or, <tt>f</tt> could be the constant function, but that still factors in the same way.<br /><br />Let's try something slightly more complex. Consider the type<br /><pre><br />forall a . (a -> X,[a]) -> X<br /></pre><br />Again we can ask the free theorem generator to give us a property. We're told that if <tt>f</tt> is of this type, then for any compatible <tt>g</tt><br /><pre><br />f(x,map g y) = f(x . g,y)<br /></pre><br />The crucial point this time is that in order to eliminate the <tt>[a]</tt> we have to use <tt>fmap</tt> to apply the function of type <tt>a -> X</tt>. So any function of this type must factor through <tt>fmap</tt>, and that's why the free theorem follows.<br /><br />Now it's time to put this in a more general framework. We can write both of the above examples as elements of type <tt>forall a . S a a -> X</tt> where <tt>S</tt> is some type constructor. In both cases we also have that <tt>S a</tt> is a functor. But it's also a cofunctor in its first argument. Intuitively this says that <tt>S a b</tt> contains or produces <tt>b</tt>'s but consumes <tt>a</tt>'s. Here's a class to express this:<br /><br /><pre><br />> class Difunctor h where<br />> lmap :: (b -> a) -> h a c -> h b c<br />> rmap :: (c -> d) -> h a c -> h a d<br /><br /></pre><br />(I made up the word Difunctor.)<br /><br />For (co)functoriality we insist on the laws<br /><pre><br />lmap (f . g) = lmap g . lmap f<br />rmap (f . g) = rmap f . rmap g<br /></pre><br /><br />We can make both of the examples above instances:<br /><br /><pre><br />> data Ex1 x a b = Ex1 (a -> x) b<br /><br />> instance Difunctor (Ex1 x) where<br />> lmap f (Ex1 g x) = Ex1 (g . f) x<br />> rmap f (Ex1 g x) = Ex1 g (f x)<br /><br />> data Ex2 x a b = Ex2 (a -> x) [b]<br /><br />> instance Difunctor (Ex2 x) where<br />> lmap f (Ex2 g x) = Ex2 (g . f) x<br />> rmap f (Ex2 g x) = Ex2 g (map f x)<br /><br /></pre><br />Our free theorems were essentially about this type:<br /><br /><pre><br />> type DiNatural s x = forall a . (s a a -> x)<br /><br /></pre><br />and in both cases the free theorem could be written as<br /><pre><br />f . lmap g == f . rmap g<br /></pre><br />This is a highly non-trivial result. Look at the preconditions for this theorem: they say something about each of the arguments to <tt>s</tt> individually. And yet we deduce that the two arguments are in fact intimately connected. We can apply <tt>g</tt> using either <tt>lmap</tt> or <tt>rmap</tt> and get the same result. This property is known as dinaturality. More precisely, if for some <tt>X</tt>, <tt>f :: s a a -> X</tt>, where <tt>s</tt> is a difunctor, then <tt>f</tt> is dinatural if for all compatible <tt>g</tt>,<br /><pre><br />f . lmap g == f . rmap g<br /></pre><br /><br />We have this theorem: if <tt>s</tt> is an instance of the <tt>Difunctor</tt> class (obeying its laws) then for any <tt>X</tt>, elements of <tt>DiNatural s X</tt> are dinatural.<br /><br />(Compare with the definition of naturality: <tt>f . fmap g = fmap g . f</tt>.)<br /><br />I'm tempted to call this "Dinaturality for Free" but there's already a <a href="ftp://ftp.disi.unige.it/pub/person/RosoliniG/papers/dinff.ps.gz">paper</a> by that name and I don't know what it's about. And note that I only claim it's a theorem. I don't actually know how to prove this uniformly for all difunctors, but I'd stake a beer on it. At least as long as we don't do any weird haskell coding (so our free theorems are always valid) and as long as we restrict ourselves to functions that always terminate.<br /><br />In the examples I gave above the reason why this holds is related to the fact that any function of the given type can be factored as something following an evaluation type function. More generally, if <tt>s</tt> is a dinatural then if there is a single <tt>Y</tt>, and function <tt>i :: s a a -> Y</tt>, such that every function <tt>f :: s a a -> X</tt>, for any <tt>X</tt>, can be factored as <tt>f = h . i</tt>, then <tt>Y</tt> is said to be the coend of <tt>s</tt>.<br /><br />There are different approaches to computing a coend. Above I've used inspection. The coend of <tt>(a -> X,a)</tt> is <tt>X</tt>. But there's also a kind of cheating approach where we can use an existential type to get the type uniformly for all dinaturals:<br /><br /><pre><br />> data Coend s = forall a . Coend (s a a)<br /><br /></pre><br />For an explanation of why that works, see Edward Kmett's post on <a href="http://comonad.com/reader/2008/kan-extension-iii/">Kan Extensions</a>. For this example, this means we expect the types <tt>t</tt> and <tt>Coend (Ex1 t)</tt> to be isomorphic. Here is the isomorphism and its inverse:<br /><br /><pre><br />> iso :: Coend (Ex1 t) -> t<br />> iso (Coend (Ex1 f x)) = f x<br /><br />> iso' :: t -> Coend (Ex1 t)<br />> iso' x = Coend (Ex1 (const x) ())<br /><br /></pre><br />I'll leave the proof that <tt>iso</tt> and <tt>iso'</tt> are mutual inverses to you.<br /><br />I hope that gives some idea of what a coend is. Informally it captures the method by which a dinatural transformation of type <tt>forall a . s a a -> X</tt> is able to eliminate the quantified <tt>a</tt>. If you look through the Haskell libraries you'll find many dinaturals (or at least things that can be made dinatural through the use of <tt>uncurry</tt>).<br /><br />This code and description was inspired by the discussion at <a href="http://ncatlab.org/nlab/show/dinatural+transformation">nLab</a>.<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-6987168894740966011?l=blog.sigfpe.com'/></div>sigfpehttp://www.blogger.com/profile/08096190433222340957noreply@blogger.com7tag:blogger.com,1999:blog-11295132.post-57987222701560068342009-02-14T18:09:00.000-08:002009-02-15T16:14:09.201-08:00Beyond MonadsThe state monad gives an elegant way to thread state information through Haskell code. Unfortunately it has an annoying limitation: the state must have the same type throughout the monadic expression. In this post I want to look at how to fix this. Unfortunately, fixing <tt>State</tt> means it's no longer a monad, but we'll discover a new abstraction that replaces monads. And then we can look at what else this abstraction is good for. The cool bit is that we have to write virtually no new code, and we'll even coax the compiler into doing the hard work of figuring out what the new abstraction should be.<br /><br />This is all based on an idea that has been invented by a bunch of people independently, although in slightly different forms. I'm being chiefly guided by the paper <a href="http://lambda-the-ultimate.org/node/3210">Parameterized Notions of Computation</a>.<br /><br />The problem with the state monad is that it is defined by<br /><pre><br />newtype State s a = State { runState :: s -> (a, s) }<br /></pre><br />The state going into and out of one of these values is the same, <tt>s</tt>. We can't vary the type of the state as we pass through our code. But that's really easy to fix, just define:<br /><br /><pre><br />> import Prelude hiding (return,(>>=),(>>),(.),id,drop)<br />> import Control.Category<br /><br />> newtype State s1 s2 a = State { runState :: s1 -> (a, s2) }<br /><br /></pre><br />I can now just copy and paste the definitions (with name changes to avoid clashes) out of the ghc prelude source code<br /><br /><pre><br />> return' a = State $ \s -> (a, s)<br />> m >>>= k = State $ \s -> let<br />> (a, s') = runState m s<br />> in runState (k a) s'<br /><br />> get = State $ \s -> (s, s)<br />> put s = State $ \_ -> ((), s)<br /><br /></pre><br />We don't have to change a thing! The old code exactly matches the new type. We can now write code using the new <tt>State</tt>:<br /><br /><pre><br />> test1 = return' 1 >>>= \x -><br />> return' 2 >>>= \y -><br />> get >>>= \z -><br />> put (x+y*z) >>>= \_ -><br />> return' z <br />> go1 = runState test1 10<br /><br /></pre><br />But we're now also able to write code like:<br /><br /><pre><br />> test2 = return' 1 >>>= \x -><br />> return' 2 >>>= \y -><br />> get >>>= \z -><br />> put (show (x+y*z)) >>>= \_ -><br />> return' z <br />> go2 = runState test2 10<br /><br /></pre><br />The state starts of as an <tt>Integer</tt> but ends up as a <tt>String</tt>.<br /><br />Problem solved! Except that this definition of <tt>State</tt> doesn't give us a monad and so we lose the benefits of having an interface shared by many monads. Is there a new more appropriate abstraction we can use? Rather than scratch our heads over it, we can just ask ghci to tell us what's going on.<br /><pre><br />*Main> :t return'<br />return' :: a -> State s1 s1 a<br />*Main> :t (>>>=)<br />(>>>=) :: State s1 s11 t -> (t -> State s11 s2 a) -> State s1 s2 a<br /></pre><br />This immediately suggests a new abstraction:<br /><br /><pre><br />> class ParameterisedMonad m where<br />> return :: a -> m s s a<br />> (>>=) :: m s1 s2 t -> (t -> m s2 s3 a) -> m s1 s3 a<br /><br />> x >> f = x >>= \_ -> f<br /><br /></pre><br />It's a lot like the usual <tt>Monad</tt> class except that we're now parameterising uses of this class with a pair of types. Our new <tt>>>=</tt> operator also has a compatibility condition on it. We can think of an element of <tt>m s1 s2</tt> as having a 'tail' and 'head' living in <tt>s1</tt> and <tt>s2</tt> respectively. In order to use <tt>>>=</tt> we require the head of the first argument to match the tail given by the second argument.<br /><br />Anyway, we have:<br /><br /><pre><br />> instance ParameterisedMonad State where<br />> return = return'<br />> (>>=) = (>>>=)<br /><br /></pre><br />We didn't really design this class, we just used what ghci told us. Will it turn out to be a useful abstraction?<br /><br />First a category theoretical aside: in <a href="http://blog.sigfpe.com/2008/11/from-monoids-to-monads.html">this post</a> I talked about how monads were really a kind of abstract monoid. Well <tt>ParameterisedMonad</tt> is a kind of abstract category. If we were to implement <tt>join</tt> for this class it would play a role analogous to composition of arrows in a category. In a monoid you can multiply any old elements together to get a new element. In a category, you can't multiply two arrows together unless the tail of the second matches the head of the first.<br /><br />Now we can generalise the writer monad to a <tt>ParameterisedMonad</tt>. But there's a twist: every monoid gives rise to a writer. This time we'll find that every category gives rise to a <tt>ParameterisedMonad</tt>. Here's the definition. Again, it was lifted straight out of the source for the usual <tt>Writer</tt> monad. The main change is replacing <tt>mempty</tt> and <tt>mappend</tt> with <tt>id</tt> and <tt>flip (.)</tt>.<br /><br /><pre><br />> data Writer cat s1 s2 a = Writer { runWriter :: (a,cat s1 s2) }<br />> instance (Category cat) => ParameterisedMonad (Writer cat) where<br />> return a = Writer (a,id)<br />> m >>= k = Writer $ let<br />> (a, w) = runWriter m<br />> (b, w') = runWriter (k a)<br />> in (b, w' . w)<br />> tell w = Writer ((),w)<br />> execWriter m = snd (runWriter m)<br /><br /></pre><br />It's just like the usual <tt>Writer</tt> monad except that the type of the 'written' data may change. I'll borrow an example (modified a bit) from the <a href="http://lambda-the-ultimate.org/node/3210">paper</a>. Define some type safe stack machine operations that are guaranteed not to blow your stack:<br /><br /><pre><br />> push n x = (n,x)<br />> drop (_,x) = x<br />> dup (n,x) = (n,(n,x))<br />> add (m,(n,x)) = (m+n,x)<br />> swap (m,(n,x)) = (n,(m,x))<br /><br /></pre><br />We can now 'write' the composition of a bunch of these operations as a 'side effect':<br /><br /><pre><br />> test3 = tell (push 1) >><br />> tell (push 2) >><br />> tell dup >><br />> tell add >><br />> tell swap >><br />> tell drop<br />> go3 = execWriter test3 ()<br /><br /></pre><br />I guess there's one last thing I have to find. The <a href="http://blog.sigfpe.com/2008/12/mother-of-all-monads.html">mother of all parameterised monads</a>. Again, we lift code from the ghc libraries, this time from Control.Monad.Cont. I just tweak the definition ever so slightly. Normally when you hand a continuation to an element of the <tt>Cont</tt> type it gives you back an element of the continuation's range. We allow the return of any type. This time the implementations of <tt>return</tt> and <tt>(>>=)</tt> remain completely unchanged:<br /><br /><pre><br />> newtype Cont r1 r2 a = Cont { runCont :: (a -> r2) -> r1 }<br />> instance ParameterisedMonad Cont where<br />> return a = Cont ($ a)<br />> m >>= k = Cont $ \c -> runCont m $ \a -> runCont (k a) c<br /><br />> i x = Cont (\fred -> x >>= fred)<br />> run m = runCont m return<br /><br />> test4 = run $ i (tell (push 1)) >><br />> i (tell (push 2)) >><br />> i (tell dup) >><br />> i (tell add) >><br />> i (tell swap) >><br />> i (tell drop)<br /><br />> go4 = execWriter test4 ()<br /><br /></pre><br />So what's going on here? The implementations of these instances require almost trivial changes to the original monads, or in two cases no changes at all apart from the type signature. I have my opinion: Haskell programmers have been using the wrong type class all along. In each case the type signature for <tt>return</tt> and <tt>>>=</tt> was too strict and so the functionality was being unnecessarily shackled. By writing the code without a signature, ghci tells us what the correct signature should have been all along. I think it might just possibly be time to consider making <tt>ParameterisedMonad</tt> as important as <tt>Monad</tt> to Haskell programming. At the very least, do-notation needs to be adapted to support <tt>ParameterisedMonad</tt>.<br /><br />Update: You *can* use do-notation with ParameterisedMonad if you use the NoImplicitPrelude flag.<br /><br />Update2: Some credits and links:<br /><br /><OL><br /><LI><a href="http://blog.unsafeperformio.com/?p=3">The Polystate Monad</a> is one of the independent discoveries I mentioned above.<br /><LI>A more general approach to <a href="http://comonad.com/reader/2007/parameterized-monads-in-haskell/">Parameterized Monads in Haskell</a>.<br /><LI><a href="http://computationalthoughts.blogspot.com/2009/02/comment-on-parameterized-monads.html">A comment on Parameterized Monads</a> that shows explicitly how to make this work with NoImplicitPrelude.<br /><li>Oleg's <a href="http://okmij.org/ftp/Computation/monads.html#param-monad">Variable (type)state `monad'</a>.<br /><LI>Wadler discovered this design pattern back in 1993 in <a href="http://www.brics.dk/~hosc/local/LaSC-7-1-pp39-56.pdf">Monads and composable continuations</a>.<br /></OL><br /><br />I didn't contribute anything, this article is just advocacy.<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-5798722270156006834?l=blog.sigfpe.com'/></div>sigfpehttp://www.blogger.com/profile/08096190433222340957noreply@blogger.com22tag:blogger.com,1999:blog-11295132.post-76699371804205419472009-01-31T14:19:00.000-08:002009-01-31T19:14:49.087-08:00Beyond Regular Expressions: More Incremental String MatchingIn my last post I showed how to incrementally match long strings against regular expressions. I now want to apply similar methods to matching languages that can't be described by regular expressions. (Note that 'language' is just jargon for a set of strings that meet some criterion.) In particular, regular expressions can't be used to test a string for balanced parentheses. This is because we need some kind of mechanism to count how many open parentheses are still pending and a finite state machine can't represent arbitrary integers.<br /><br />So let's start with a slightly more abstract description of what was going on last time so we can see the bigger picture. We were storing strings in balanced trees with a kind of 'measurement' or 'digest' of the string stored in the nodes of the tree. Each character mapped to an element of a monoid via a function called <tt>measure</tt> and you can think of the measurement function as acting on entire strings if you <tt>mappend</tt> together all of the measurements for each of the characters. So what we have is a function <tt>f :: String -> M</tt> taking strings to some type M (in the last post M was a type of array) with the properties<br /><pre><br />f (a ++ b) == f a `mappend` f b<br />f [] == mempty<br /></pre><br />By noticing that String is itself a monoid we can write this as<br /><pre><br />f (a `mappend` b) == f a `mappend` f b<br />f mempty == mempty<br /></pre><br />Anything satisfying these laws is called a monoid homomorphism, or just homomorphism for short.<br /><br />So the technique I used worked like this: I found a homomorphism from <tt>String</tt> to some type with the useful property that for any string s, <tt>f s</tt> still contains all the information required to figure out if we're dealing with a member of our language. If <tt>f</tt> turns a string into something more efficient to work with then we can make our string matching more efficient.<br /><br />Now I want to make the notion of "contains all the information required" more precise by considering an example. Consider strings that consist only of the characters <tt>(</tt> and <tt>)</tt>. Our language will be the set of strings whose parentheses balance. In other words the total number of <tt>(</tt> must match the total number of <tt>)</tt>, and as we scan from left to right we must never see more <tt>)</tt> than <tt>(</tt>. For example, <tt>()()()</tt> and <tt>((()))()</tt> are in our language, but <tt>)()()(</tt> isn't. This language is called the Dyck language.<br /><br />Suppose we're testing whether or not some string is in the Dyck language. If we see <tt>()</tt> as a substring then if we delete it from the string, it makes no difference to whether or not the string is in the Dyck language. In fact, if we see <tt>(())</tt>, <tt>((()))</tt>, <tt>(((())))</tt> and so on they can all be deleted. On the other hand, you can't delete <tt>)(</tt> without knowing about the rest of the string. Deleting it from <tt>()()</tt> makes no difference to its membership in the Dyck language, but deleting it from <tt>)(()</tt> certainly does.<br /><br />So given a language L, we can say that two strings, x and y, are interchangeable with respect to L if any time we see x as a substring of another string we can replace it with y, and vice versa, without making any difference to whether the string is in the language. Interchangeable strings are a kind of waste of memory. If we're testing for membership of L there's no need to distinguish between them. So we'd like our measurement homomorphism to map all interchangeable strings to the same values. But we don't want to map any more strings to the same value because then we lose information that tells us if a string is an element of L. A homomorphism that strikes this balance perfectly is called the 'canonical homomorphism' and the image of the set of all strings under this homomorphisms is called the <a href="http://en.wikipedia.org/wiki/Syntactic_monoid">syntactic monoid</a>. By 'image', I simply mean all the possible values that could arise from applying the homomorphism to all possible strings.<br /><br />So lets go back to the Dyck language. Any time we see <tt>()</tt> we can delete it. But if we delete every occurence of <tt>()</tt> from a string then all we have left is a bunch of <tt>)</tt> followed by a bunch of <tt>(</tt>. Let's say it's p of the former, and q of the latter. Every string of parentheses can be distilled down to a pair of integers ≥0, (p,q). But does this go far enough? Could we distill any further? Well for any choice of (p,q) it's a good exercise to see that for any other choice of (p',q') there's always a string in the Dyck language where if you have )<sup>p</sup>(<sup>q</sup> as a substring, replacing it with (p',q') gives you something not in the language. So you can't distill any further. Which means we have our syntactic monoid and canonical homomorphism. In this case the monoid is called the <a href="http://en.wikipedia.org/wiki/Bicyclic_monoid">bicyclic monoid</a> and we can implement it as follows:<br /><br /><pre><br />> {-# LANGUAGE TypeSynonymInstances,FlexibleInstances,MultiParamTypeClasses #-}<br />> import Data.Foldable<br />> import Data.Monoid<br />> import Data.FingerTree hiding (fromList)<br />> import qualified Data.List as L<br /><br />> data Bicyclic = B Int Int deriving (Eq,Show)<br /><br />> hom '(' = B 0 1<br />> hom ')' = B 1 0<br /><br />> instance Monoid Bicyclic where<br />> mempty = B 0 0<br />> B a b `mappend` B c d = B (a-b+max b c) (d-c+max b c)<br /><br /></pre><br />Where did that code for <tt>mappend</tt> come from? Consider )<sup>a</sup>(<sup>b</sup>)<sup>c</sup>(<sup>d</sup>. We can delete <tt>()</tt> from the middle many times over.<br /><br />Now we can more or less reproduce the code of last week and get a Dyck language tester. Once we've distilled a string down to (p,q) we only need to test whether or not p=q=0 to see whether or not our parentheses are balanced:<br /><br /><pre><br />> matches' s = x==B 0 0 where<br />> x = mconcat (map hom s)<br /><br />> data Elem a = Elem { getElem :: a } deriving Show<br />> data Size = Size { getSize :: Int } deriving (Eq,Ord,Show)<br /><br />> instance Monoid Size where<br />> mempty = Size 0<br />> Size m `mappend` Size n = Size (m+n)<br /><br />> instance Measured (Size,Bicyclic) (Elem Char) where<br />> measure (Elem a) = (Size 1,hom a)<br /><br />> type FingerString = FingerTree (Size,Bicyclic) (Elem Char)<br /><br />> insert :: Int -> Char -> FingerString -> FingerString<br />> insert i c z = l >< (Elem c <| r) where (l,r) = split (\(Size n,_) -> n>i) z<br /><br />> string = empty :: FingerString<br /><br />> matchesDyck string = snd (measure string)==B 0 0<br /><br />> loop string = do<br />> print $ map getElem (toList string)<br />> print $ "matches? " ++ show (matchesDyck string)<br />> print "(Position,Character)"<br />> r <- getLine<br />> let (i,c) = read r<br />> loop $ insert i c string<br /><br />> main = do<br />> loop string<br /><br /></pre><br />There's a completely different way to test membership of the Dyck language. Replace each <tt>(</tt> with 1 and <tt>)</tt> with -1. Now scan from left to right keeping track of (1) the sum of all the numbers so far and (2) the minimum value taken by this sum. If the final sum and the final minimal sum are zero, then we have matching parentheses. But we need to do this on substrings without scanning from the beginning in one go. That's an example of a parallel prefix sum problem and it's what I talked about <a href="http://sigfpe.blogspot.com/2008/11/approach-to-algorithm-parallelisation.html">here</a>.<br /><br />So here's an extended exercise: adapt the parallel prefix sum approach to implement incremental Dyck language testing with fingertrees. You should end up with a canonical homomorphism that's similar to the one above. It'll probably be slightly different but ultimately equivalent.<br /><br />And here's an even more extended exercise: protein sequences are sequences from a 20 letter alphabet. Each letter can be assigned a hydrophobicity value from <a href="http://www.vivo.colostate.edu/molkit/hydropathy/scales.html">certain tables</a>. (Pick whichever table you want.) The hydrophobicity of a string is the sum of the hydrophobicities of its letters. Given a string, we can give it a score corresponding to the largest hydrophobicity of any contiguous substring in it. Use fingertrees and a suitable monoid to track this score as the string is incrementally edited. Note how widely separated substrings can suddenly combine together as stuff between them is adjusted.<br /><br />If you're interested in Dyck languages with multiple types of parenthesis that need to match you need something <a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.18.1014">much more fiendish</a>.<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-7669937180420541947?l=blog.sigfpe.com'/></div>sigfpehttp://www.blogger.com/profile/08096190433222340957noreply@blogger.com4tag:blogger.com,1999:blog-11295132.post-43827466324001736672009-01-24T13:01:00.000-08:002009-01-24T16:47:00.328-08:00Fast incremental regular expression matching with monoids<H3>The Problem</H3><br /> Consider this problem: Fix a regular expression R. Suppose you have a string of length N. There's not much you can do about it, you'll likely have to scan all N characters to test so see if the string matches R. But once you've performed the test, how fast can you test the string again if a small edit is made to it? It seems that in general you'd have to rescan the entire string, or at least rescan from where the edit was made. But it turns out that you can do regular expression matching incrementally so that for many changes you might make to the string, you only require O(log N) time to recompute whether the string matches. This is true even if characters at opposite ends of the string interact to make a succcessful match. What's more, it's remarkably straightforward to implement if we make use of fingertrees and monoids.<br /><br />I'm going to assume a bit of background for which resources can be found on the web: understanding some <a href="http://sigfpe.blogspot.com/2009/01/haskell-monoids-and-their-uses.html">basics about monoids</a>, understanding <a href="http://apfelmus.nfshost.com/monoid-fingertree.html">apfelmus's inspiring introduction to fingertrees</a> or the <a href="http://www.soi.city.ac.uk/~ross/papers/FingerTree.html">original paper</a>, and you'll need to be completely comfortable with the idea of compiling regular expressions to finite state machines.<br /><br />As usual, this is literate Haskell. But you need to have the <a href="http://hackage.haskell.org/cgi-bin/hackage-scripts/package/fingertree">fingertree</a> package installed and we need a bunch of imports.<br /><br /><pre><br />> {-# LANGUAGE TypeSynonymInstances,FlexibleInstances,MultiParamTypeClasses #-}<br />> import qualified Data.Array as B<br />> import Data.Array.Unboxed as U<br />> import Data.Foldable<br />> import Data.Monoid<br />> import Data.FingerTree hiding (fromList)<br />> import qualified Data.List as L<br /><br /></pre><br /><H3>A Finite State Machine</H3><br />So let's start with an example regular expression: <tt>.*(.*007.*).*</tt>. We're looking for "007" enclosed between parentheses, but the parentheses could be millions of characters apart.<br /><br />A standard technique for finding regular expressions is to compile them to a finite state automaton. It takes quite a bit of code to do that, but it is completely standard. So rather than do that, here's a finite state machine I constructed by hand for this regular expression:<br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_UdKHLrHa05M/SXuJZcQX39I/AAAAAAAAAR0/D2xW8JY66VY/s1600-h/fsa.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 388px; height: 365px;" src="http://2.bp.blogspot.com/_UdKHLrHa05M/SXuJZcQX39I/AAAAAAAAAR0/D2xW8JY66VY/s400/fsa.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5294976857078751186" /></a><br />I've used the convention that an unlabelled edge corresponds to any input that isn't matched by another labelled edge. We can express the transitions as a function <tt>fsm</tt>.<br /><br /><pre><br />> fsm 0 '(' = 1<br />> fsm 0 _ = 0<br /><br />> fsm 1 '0' = 2<br />> fsm 1 _ = 1<br /><br />> fsm 2 '0' = 3<br />> fsm 2 _ = 1<br /><br />> fsm 3 '7' = 4<br />> fsm 3 '0' = 3<br />> fsm 3 _ = 1<br /><br />> fsm 4 ')' = 5<br />> fsm 4 _ = 4<br /><br />> fsm 5 _ = 5<br /><br /></pre><br />The initial state is 0 and a match corresponds to state 5.<br /><br />We can test a string in the standard way using<br /><br /><pre><br />> matches s = Prelude.foldl fsm 0 s==5<br /><br /></pre><br />Try <tt>matches "(00 7)"</tt> and <tt>matches "He(007xxxxxxxxxxxx)llo"</tt>.<br /><br />We can thing of the inputs as being functions acting on the automaton. Each input character is a function that maps the automaton from one state to another. We could use the Haskell composition function, <tt>(.)</tt>, to compose these functions. But <tt>(.)</tt> doesn't really do anything, <tt>f . g</tt> is just a closure that says "when the time comes, apply <tt>g</tt> and then <tt>f</tt>.<br /><br />On the other hand we could tabulate our transition functions as follows:<br /><br /><pre><br />> tabulate f = array (0,5) [(x,f x) | x <- range (0,5)]<br /><br /></pre><br />We have one such tabulated function for each letter in our alphabet:<br /><br /><pre><br />> letters = array (' ','z') [(i,tabulate (flip fsm i)) | i <- range (' ','z')]<br /><br /></pre><br />Given two tabulated functions we can easily form the table of the composition function. In fact, our tabulated functions form a monoid with <tt>mappend</tt> for composition. I used unboxed arrays for performance:<br /><br /><pre><br />> type Table = UArray Int Int<br /><br />> instance Monoid Table where<br />> mempty = tabulate id<br />> f `mappend` g = tabulate (\state -> (U.!) g ((U.!) f state))<br /><br /></pre><br />Note that we've cheated a bit. An object of type <tt>Table</tt> could be any array of <tt>Ints</tt> indexed by <tt>Int</tt>. But if we promise to only build arrays indexed by elements of <tt>[0..5]</tt> and containing elements of the same range then our claim to monoidhood is valid.<br /><br />Given any string, we can compute whether it matches our regular expression by looking up the corresponding <tt>Table</tt> in our <tt>letters</tt> array, composing them, and then checking if the tabulated function maps the initial state 0 to the final state 5:<br /><br /><pre><br />> matches' s = table!0==5 where<br />> table = mconcat (map ((B.!) letters) s)<br /><br /></pre><br />This is slower and more complex than our original implementation of <tt>matches</tt>. But what we've now done is 'tease out' a monoid structure from the problem. If we store a string as a sequence of characters represented by a fingertree, we can store in each subtree the element of <tt>Table</tt> corresponding to the substring it represents. Every time the tree is rebalanced we need to recompute the corresponding <tt>Table</tt>s. But that's fine, it typically involves only O(log N) operations, and we don't need to write any code, the fingertree will do it for us automatically. Once we've done this, we end up with a representation of strings with the property that we always know what the corresponding <tt>Table</tt> is. We can freely split and join such trees knowing that the <tt>Table</tt> will always be up to date.<br /><br />The only slight complication is that I want to be able to randomly access the nth character of the tree. apfelmus <a href="http://apfelmus.nfshost.com/monoid-fingertree.html">explains</a> that in his post. The change I need to make is that I'm going to use both the <tt>Size</tt> monoid and the <tt>Table</tt> monoid, so I need the <a href="http://sigfpe.blogspot.com/2009/01/haskell-monoids-and-their-uses.html">product monoid</a>.<br /><br /><pre><br />> data Elem a = Elem { getElem :: a } deriving Show<br />> data Size = Size { getSize :: Int } deriving (Eq,Ord,Show)<br /><br />> instance Monoid Size where<br />> mempty = Size 0<br />> Size m `mappend` Size n = Size (m+n)<br /><br /></pre><br />We need to implement <tt>measure</tt> as in the <a href="http://www.soi.city.ac.uk/~ross/papers/FingerTree.html">fingertree paper</a>:<br /><br /><pre><br />> instance Measured (Size,Table) (Elem Char) where<br />> measure (Elem a) = (Size 1,(B.!) letters a)<br /><br /></pre><br /><H3>A Fingertree</H3><br />And now we can define our strings as:<br /><br /><pre><br />> type FingerString = FingerTree (Size,Table) (Elem Char)<br /><br /></pre><br />The insertion routine is more or less what's in the paper:<br /><br /><pre><br />> insert :: Int -> Char -> FingerString -> FingerString<br />> insert i c z = l >< (Elem c <| r) where (l,r) = split (\(Size n,_) -> n>i) z<br /><br /></pre><br />Note how I project out the size from the product monoid in order to insert at the correct position.<br /><br />Here's an example string. Adjust the length to suit your memory and CPU horsepower:<br /><br /><pre><br />> fromList = L.foldl' (|>) empty<br />> string = fromList (map Elem $ take 1000000 $ cycle "the quick brown fox jumped over the lazy dog")<br /><br /></pre><br />(I use a strict form of <tt>fromList</tt> to ensure the tree actually gets built.)<br /><br />The actual match function simply projects out the second component of the monoid and again tests to see if it maps the initial state to the final state:<br /><br /><pre><br />> matches007 string = ((U.!) (snd (measure string)) 0)==5<br /><br /></pre><br /><H3>An Interactive Loop</H3><br />I recommend compiling with optimisation, something like <tt>ghc --make -O5 -o regexp regexp.lhs</tt>:<br /><br /><pre><br />> loop string = do<br />> print $ "matches? " ++ show (matches007 string)<br />> print "(Position,Character)"<br />> r <- getLine<br />> let (i,c) = read r<br />> loop $ insert i c string<br /><br />> main = do<br />> loop string<br /><br /></pre><br />Now you can run this interactively. Input values like <tt>(100,'f')</tt> to insert an 'f' at position 100. It can take a good few seconds to compute the initial tree, but after that the matching process is instantaneous. (Actually, the second match might take a few seconds, that's because despite the <tt>foldl'</tt> the tree hasn't been fully built.)<br /><br />A suitable sample input is:<br /><pre><br />(100,'(')<br />(900000,')')<br />(20105,'0')<br />(20106,'0')<br />(20107,'7')<br /></pre><br /><br /><H3>Discussion</H3><br />Note there is quite an overhead for this example. I'm storing an entire <tt>Table</tt> for each character. But you can easily store chunks of string (like in a <a href="http://www.sgi.com/tech/stl/ropeimpl.html">rope</a>). This means that some chunks will be rescanned when a string is edited - but rescanning a 1K chunk, say, is a lot less expensive than scanning a gigabyte file in its entirety. Working in blocks will probably speed up the initial scan too, a much smaller tree needs to be built.<br /><br />When Hinze and Patterson originally wrote the fingertree paper they were motivated by parallel prefix sum methods. Just about any parallel prefix algorithm can be converted to an incremental algorithm using fingertrees. This article is based on the idea of doing this with the parallel lexing scheme described by Hillis and Steele in their classic Connection Machine <a href="http://portal.acm.org/citation.cfm?id=7903">paper</a>.<br /><br />So why would you want to match against a fixed regular expression like this? Well this method extends to a full blown incremental lexer. This will lex quickly even if placing a character in a string changes the type of lexemes billions of characters away. See the Hillis and Steele paper for details.<br /><br />Note there's nothing especially Haskelly about this code except that Haskell made it easy to prototype. You can do this in C++, say, using mutable red-black trees.<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-4382746632400173667?l=blog.sigfpe.com'/></div>sigfpehttp://www.blogger.com/profile/08096190433222340957noreply@blogger.com6tag:blogger.com,1999:blog-11295132.post-75312183293128145692009-01-17T13:47:00.000-08:002009-05-05T11:33:10.321-07:00Haskell Monoids and their UsesHaskell is a great language for constructing code modularly from small but orthogonal building blocks. One of these small blocks is the monoid. Although monoids come from mathematics (algebra in particular) they are found everywhere in computing. You probably use one or two monoids implicitly with every line of code you write, whatever the language, but you might not know it yet. By making them explicit we find interesting new ways of constructing those lines of code. In particular, ways that are often easier to both read and write. So the following is an intro to monoids in Haskell. I'm assuming familiarity with type classes, because Haskell monoids form a type class. I also assume some familiarity with monads, though nothing too complex.<br /><br />This post is literate Haskell so you can play with the examples directly.<br /><br /><H3>Defining Monoids</H3><br />In Haskell, a monoid is a type with a rule for how two elements of that type can be combined to make another element of the same type. To be a monoid there also needs to be an element that you can think of as representing 'nothing' in the sense that when it's combined with other elements it leaves the other element unchanged.<br /><br />A great example is lists. Given two lists, say <tt>[1,2]</tt> and <tt>[3,4]</tt>, you can join them together using <tt>++</tt> to get <tt>[1,2,3,4]</tt>. There's also the empty list <tt>[]</tt>. Using <tt>++</tt> to combine <tt>[]</tt> with any list gives you back the same list, for example <tt>[]++[1,2,3,4]==[1,2,3,4]</tt>.<br /><br />Another example is the type of integers, <tt>Integer</tt>. Given two elements, say 3 and 4, we can combine them with + to get 7. We also have the element 0 which when added to any other integer leaves it unchanged.<br /><br />So here is a possible definition for the monoid type class:<br /><pre><br />class Monoid m where<br /> mappend :: m -> m -> m<br /> mempty :: m<br /></pre><br />The function mappend is the function we use to combine pairs of elements, and mempty is the 'nothing' element. We can make lists an instance like this:<br /><br /><pre><br />instance Monoid [a] where<br /> mappend = (++)<br /> mempty = []<br /></pre><br /><br />Because we want mempty to do nothing when combined with other elements we also require monoids to obey these two rules<br /><pre><br />a `mappend` mempty = a<br /></pre><br />and<br /><pre><br />mempty `mappend` a = a.<br /></pre><br /><br />Notice how there are two ways to combine a and b using mappend. We can write <tt>a `mappend` b</tt> or <tt>b `mappend` a</tt>. There is no requirement on a monoid that these be equal to each other. (But see below.) But there is another property that monoids are required to have. Suppose we start with the list <tt>[3,4]</tt>. And now suppose we want to concatenate it with <tt>[1,2]</tt> on the left and <tt>[5,6]</tt> on the right. We could do the left concatenation first to get <tt>[1,2]++[3,4]</tt> and then form <tt>([1,2]++[3,4])++[5,6]</tt>. But we could do the right one first and get <tt>[1,2]++([3,4]++[5,6])</tt>. Because we're concatenating at opposite ends the two operations don't interfere and it doesn't matter which we do first. This gives rise to the third and last requirement we have of monoids:<br /><pre><br />(a `mappend` b) `mappend` c == a `mappend` (b `mappend` c)<br /></pre><br />and you can summarise it with the slogan 'combining on the left doesn't interfere with combining on the right'. Notice how the integers, combined with +, also have this property. It's such a useful property it has a name: associativity.<br /><br />That's a complete specification of what a monoid is. Haskell doesn't enfore the three laws I've given, but anyone reading code using a monoid will expect these laws to hold.<br /><br /><H3>Some Uses of Monoids</H3><br />But given that we already have individual functions like <tt>++</tt> and <tt>+</tt>, why would we ever want to use mappend instead?<br /><br />One reason is that with a monoid we get another function called mconcat for free. mconcat takes a list of values in a monoid and combines them all together. For example <tt>mconcat [a,b,c]</tt> is equal to <tt>a `mappend` (b `mappend` c)</tt>. Any time you have a monoid you have this quick and easy way to combine a whole list together. But note that there is some ambiguity in the idea behind <tt>mconcat</tt>. To compute <tt>mconcat [a,b,...,c,d]</tt> which order should we work in? Should we work from left to right and compute <tt>a `mappend` b</tt> first? Or should we start with <tt>c `mappend` d</tt>. That's one place where the associativity law comes in: it makes no difference.<br /><br />Another place where you might want to use a monoid is in code that is agnostic about how you want to combine elements. Just as mconcat works with any monoid, you might want to write your own code that works with any monoid.<br /><br />Explicitly using the Monoid type class for a function also tells the reader of your code what your intentions are. If a function has signature <tt>[a] -> b</tt> you know it takes a list and constructs an object of type b from it. But it has considerable freedom in what it can do with your list. But if you see a function of type <tt>(Monoid a) => a -> b</tt>, even if it is only used with lists, we know what kind of things the function will do with the list. For example, we know that the function might add things to your list, but it's never going to pull any elements out of your list.<br /><br />The same type can give rise to a monoid in different ways. For example, I've already mentions that the integers form a monoid. So we could define:<br /><pre><br />instance Monoid Integer where<br /> mappend = (+)<br /> mempty = 0<br /></pre><br />But there's a good reason not to do that: there's another natural way to make integers into a monoid:<br /><pre><br /><br />instance Monoid Integer where<br /> mappend = (*)<br /> mempty = 1<br /></pre><br />We can't have both of these definitions at the same time. So the Data.Monoid library doesn't make Integer into a Monoid directly. Instead, it wraps them with Sum and Product. It also does so more generally so that you can make any Num type into a monoid in two different ways. We have both<br /><pre><br />Num a => Monoid (Sum a)<br /></pre><br />and<br /><pre><br />Num a => Monoid (Product a)<br /></pre><br />To use these we wrap our values in the appropriate wrapper and we can then use the monoid functions. For example <tt>mconcat [Sum 2,Sum 3,Sum 4]</tt> is <tt>Sum 9</tt>, but <tt>mconcat [Product 2,Product 3,Product 4]</tt> is <tt>[Product 24]</tt>.<br /><br />Using <tt>Sum</tt> and <tt>Product</tt> looks like a complicated way to do ordinary addition and multiplication. Why do things that way?<br /><br /><H3>The Writer Monad</H3><br />You can think of monoids as being accumulators. Given a running total, n, we can add in a new value a to get a new running total n' = n `mappend` a. Accumulating totals is a very common design pattern in real code so it's useful to abstract this idea. This is exactly what the Writer monad allows. We can write monadic code that accumulates values as a "side effect". The function to perform the accumulation is (somewhat confusingly) called <tt>tell</tt>. Here's an example where we're logging a trace of what we're doing.<br /><br /><pre><br />> import Data.Monoid<br />> import Data.Foldable<br />> import Control.Monad.Writer<br />> import Control.Monad.State<br /><br />> fact1 :: Integer -> Writer String Integer<br />> fact1 0 = return 1<br />> fact1 n = do<br />> let n' = n-1<br />> tell $ "We've taken one away from " ++ show n ++ "\n"<br />> m <- fact1 n'<br />> tell $ "We've called f " ++ show m ++ "\n"<br />> let r = n*m<br />> tell $ "We've multiplied " ++ show n ++ " and " ++ show m ++ "\n"<br />> return r<br /><br /></pre><br />This is an implementation of the factorial function that tells us what it did. Each time we call <tt>tell</tt> we combine its argument with the running log of all of the strings that we've 'told' so far. We use <tt>runWriter</tt> to extract the results back out. If we run<br /><br /><pre><br />> ex1 = runWriter (fact1 10)<br /><br /></pre><br />we get back both 10! and a list of what it took to compute this.<br /><br />But Writer allows us to accumulate more than just strings. We can use it with any monoid. For example, we can use it to count how many multiplications and subtractions were required to compute a given factorial. To do this we simply tell a value of the appropriate type. In this case we want to add values, and the monoid for addition is Sum. So instead we could implement:<br /><br /><pre><br />> fact2 :: Integer -> Writer (Sum Integer) Integer<br />> fact2 0 = return 1<br />> fact2 n = do<br />> let n' = n-1<br />> tell $ Sum 1<br />> m <- fact2 n'<br />> let r = n*m<br />> tell $ Sum 1<br />> return r<br /></pre><br /> <br /><pre><br />> ex2 = runWriter (fact2 10)<br /><br /></pre><br />There's another way we could have written this, using the state monad:<br /><br /><pre><br />> fact3 :: Integer -> State Integer Integer<br />> fact3 0 = return 1<br />> fact3 n = do<br />> let n' = n-1<br />> modify (+1)<br />> m <- fact3 n'<br />> let r = n*m<br />> modify (+1)<br />> return r<br /><br />> ex3 = runState (fact3 10) 0<br /><br /></pre><br />It works just as well, but there is a big advantage to using the <tt>Writer</tt> version. It has type signature <tt>f :: Integer -> Writer (Sum Integer) Integer</tt>. We can immediately read from this that our function has a side effect that involves accumulating a number in a purely additive way. It's never going to, for example, multiply the accumulated value. The type information tells us a lot about what is going on inside the function without us having to read a single line of the implementation. The version written with <tt>State</tt> is free to do whatever it likes with the accumulated value and so it's harder to discern its purpose.<br /><br />Data.Monoid also provides an Any monoid. This is the Bool type with the disjunction operator, better known as ||. The idea behind the name is that if you combine together any collection of elements of type <tt>Any</tt> then the result is <tt>Any True</tt> precisely when at least any one of the original elements is <tt>Any True</tt>. If we think of these values as accumulators then they provide a kind of one way switch. We start accumulating with mempty, ie. <tt>Any False</tt>, and we can think of this as being the switch being off. Any time we accumulate Any True into our running 'total' the switch is turned on. This switch can never be switched off again by accumulating any more values. This models a pattern we often see in code: a flag that we want to switch on, as a side effect, if a certain condition is met at any point.<br /><br /><pre><br />> fact4 :: Integer -> Writer Any Integer<br />> fact4 0 = return 1<br />> fact4 n = do<br />> let n' = n-1<br />> m <- fact4 n'<br />> let r = n*m<br />> tell (Any (r==120))<br />> return r<br /><br />> ex4 = runWriter (fact4 10)<br /><br /></pre><br />At the end of our calculation we get n!, but we are also told if at any stage in the calculation two numbers were multiplied to give 120. We can almost read the tell line as if it were English: "tell my caller if any value of r is ever 120". Not only do we get the plumbing for this flag with a minimal amount of code. If we look at the type for this version of f it tells us exactly what's going on. We can read off immediately that this function, as a "side effect", computes a flag that can be turned on, but never turned off. That's a lot of useful information from just a type signature. In many other programming languages we might expect to see a boolean in the type signature, but we'd be forced to read the code to get any idea of how it will be used.<br /><br /><H3>Commutative Monoids, Non-Commutative Monoids and Dual Monoids</H3><br /><br />Two elements of a monoid, x and y, are said to commute if <tt>x `mappend` y == y `mappend` x</tt>. The monoid itself is said to be commutative if all of its elements commute with each other. A good example of a commutative monoid is the type of integers. For any pair of integers, <tt>a+b==b+a</tt>.<br /><br />If a monoid isn't commutative, it's said to be non-commutative. If it's non-comuutative it means that for some x and y, <tt>x `mappend` y</tt> isn't the same as <tt>y `mappend` x</tt>, so <tt>mappend</tt> and <tt>flip mappend</tt> are not the same function. For example <tt>[1,2] ++ [3,4]</tt> is different from <tt>[3,4] ++ [1,2]</tt>. This has the interesting consequence that we can make another monoid in which the combination function is <tt>flip mappend</tt>. We can still use the same <tt>mempty</tt> element, so the first two monoid laws hold. Additionally, it's a nice exercise to prove that the third monoid law still holds. This flipped monoid is called the dual monoid and Data.Monoid provides the <tt>Dual</tt> type constructor to build the dual of a monoid. We can use this to reverse the order in which the writer monad accumulates values. For example the following code collects the execution trace in reverse order:<br /><br /><pre><br />> fact5 :: Integer -> Writer (Dual String) Integer<br />> fact5 0 = return 1<br />> fact5 n = do<br />> let n' = n-1<br />> tell $ Dual $ "We've taken one away from " ++ show n ++ "\n"<br />> m <- fact5 n'<br />> tell $ Dual $ "We've called f " ++ show m ++ "\n"<br />> let r = n*m<br />> tell $ Dual $ "We've multiplied " ++ show n ++ " and " ++ show m ++ "\n"<br />> return r<br /><br />> ex5 = runWriter (fact5 10)<br /><br /></pre><br /><H3>The Product Monoid</H3><br /><br />Suppose we want to accumulate two side effects at the same time. For example, maybe we want to both count instructions and leave a readable trace of our computation. We could use monad transformers to combine two writer monads. But there is a slightly easier way - we can combine two monoids into one 'product' monoid. It's defined like this:<br /><pre><br />instance (Monoid a,Monoid b) => Monoid (a,b) where<br /> mempty = (mempty,mempty)<br /> mappend (u,v) (w,x) = (u `mappend` w,v `mappend` x)<br /></pre><br /><br />Each time we use mappend on the product we actually perform a pair of mappends on each of the elements of the pair. With these small helper functions:<br /><br /><pre><br />> tellFst a = tell $ (a,mempty)<br />> tellSnd b = tell $ (mempty,b)<br /><br /></pre><br />we can now use two monoids simultaneously:<br /><br /><pre><br />> fact6 :: Integer -> Writer (String,Sum Integer) Integer<br />> fact6 0 = return 1<br />> fact6 n = do<br />> let n' = n-1<br />> tellSnd (Sum 1)<br />> tellFst $ "We've taken one away from " ++ show n ++ "\n"<br />> m <- fact6 n'<br />> let r = n*m<br />> tellSnd (Sum 1)<br />> tellFst $ "We've multiplied " ++ show n ++ " and " ++ show m ++ "\n"<br />> return r<br /><br />> ex6 = runWriter (fact6 5)<br /><br /></pre><br />If we had simply implemented our code using one specific monoid, like lists, our code would be very limited in its application. But by using the general <tt>Monoid</tt> type class we ensure that users of our code can use not just any individual monoid, but even multiple monoids. This can make for more efficient code because it means we can perform multiple accumulations while traversing a data structure once. And yet we still ensure readability because our code is written using the interface to a single monoid making our algorithms simpler to read.<br /><br /><H3>Foldable Data</H3><br /><br />One last application to mention is the Data.Foldable library. This provides a generic approach to walking through a datastructure, accumulating values as we go. The <tt>foldMap</tt> function applies a function to each element of our structure and then accumulates the return values of each of these applications. An implementation of <tt>foldMap</tt> for a tree structure might be:<br /><br /><pre><br />> data Tree a = Empty | Leaf a | Node (Tree a) a (Tree a)<br /><br />> instance Foldable Tree where<br />> foldMap f Empty = mempty<br />> foldMap f (Leaf x) = f x<br />> foldMap f (Node l k r) = foldMap f l `mappend` f k `mappend` foldMap f r<br /></pre><br /> <br />We can now use any of the monoids discussed above to compute properties of our trees. For example, we can use the function <tt>(== 1)</tt> to test whether each element is equal to 1 and then use the Any monoid to find out if any element of the tree is equal to 1. Here are a pair of examples: one to compute whether or not an element is equal to 1, and another to test if every element is greater than 5:<br /><br /><pre><br />> tree = Node (Leaf 1) 7 (Leaf 2)<br /><br />> ex7 = foldMap (Any . (== 1)) tree<br />> ex8 = foldMap (All . (> 5)) tree<br /><br /></pre><br />Note, of course, that these expressions can be used, unmodified, with any foldable type, not just trees.<br /><br />I hope you agree that this expresses our intentions in a way that is easy to read.<br /><br />That suggests another exercise: write something similar to find the minimum or maximum element in a tree. You may need to construct a new monoid along the lines of <tt>Any</tt> and <tt>All</tt>. Try finding both in one traversal of the tree using the product monoid.<br /><br />The foldable example also illustrates another point. The implementor of <tt>foldMap</tt> for the tree doesn't need to worry about whether the left tree should be combined with the central element before the right tree. Associativity means it can be implemented either way and give the same results.<br /><br /><H3>Recap</H3><br />Monoids provide a general approach to combining and accumulating values. They allow us to write code that is agnostic about the method we will use to combine values, and that makes our code more reusable. By using named monoids we can write type signatures that express our intentions to people reading our code: for example by using Any instead of Bool we make it clear just how our boolean value is to be used. And we can combine the monoid-based building blocks provided by Haskell libraries to build useful and readable algorithms with a minimum of effort.<br /><br />Some final notes: mathematicians often refer to mappend as a 'binary operator' and often it's called 'multiplication'. Just like in ordinary algebra, it's often also written with abuttal or using the star operator, ie. ab and a*b might both represent <tt>a `mappend` b</tt>. You can read more about monoids at <a href="http://en.wikipedia.org/wiki/Monoid">Wikipedia</a>. And I wish I had time to talk about monoid morphisms, and why the list monoid is free (and what consequences that might have for how you can your write code), and how compositing gives you <a href="http://en.wikipedia.org/wiki/Alpha_compositing">monoids</a> and a whole lot more.<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-7531218329312814569?l=blog.sigfpe.com'/></div>sigfpehttp://www.blogger.com/profile/08096190433222340957noreply@blogger.com30tag:blogger.com,1999:blog-11295132.post-49353856684236884292009-01-02T14:24:00.000-08:002009-01-02T18:00:09.206-08:00Rewriting Monadic Expressions with Template HaskellThe goal today is to implement an impossible Haskell function. But as this is a literate Haskell post we need to get some boilerplate out of the way:<br /><br /><pre><br />> module Test where<br /><br />> import Language.Haskell.TH<br />> import Control.Monad<br />> import Control.Monad.Reader<br />> import Control.Monad.Cont<br />> import IO<br /><br />> infixl 1 #<br /><br /></pre><br />So back to the impossible function: all monads <tt>m</tt> come equipped with a function of type <tt>a -> m a</tt>. But it's well known that you can't "extract elements back out of the monad" because there is no function of type <tt>m a -> a</tt>. So my goal today will be to write such a function. Clearly the first line of code ought to be:<br /><br /><pre><br />> extract :: Monad m => m a -> a<br /><br /></pre><br />The rest of the post will fill in the details.<br /><br />The type declaration tells us what role it plays syntactically, ie. it tells us how we can write code with <tt>extract</tt> that type checks. But what should the semantics be?<br /><br />For the IO monad an answer is easy to guess. <tt>1 + extract (readLn :: IO Int)</tt> should execute <tt>readLn</tt> by reading n integer from stdin, strip of the <tt>IO</tt> part of the the return value and then add 1 to the result. In fact, Haskell already has a function that does exactly that, <tt>unsafePerformIO</tt>. The goal here is to implement <tt>extract</tt> in a way that works with any monad.<br /><br />What might we expect the value of <tt>1 + extract [1,2,3]</tt> to be? The value of <tt>extract [1,2,3]</tt> surely must be 1, 2 or 3. But which one? And what happens if the list is empty? There really isn't any way of answering this while remaining *purely* functional. But if we were running code on a suitable machine we could fork three threads returning one of 1, 2 and 3 in each thread, and then collecing th results together in a list. In other words, we'd expect the final result to be <tt>[2,3,4]</tt>. This would make <tt>extract</tt> a lot like McCarthy's <a href="http://www.randomhacks.net/articles/2005/10/11/amb-operator">ambiguous operator</a>.<br /><br />So it's clear that we can't interpret <tt>extract</tt> as a pure function. But we could try implementing it on a new abstract machine. But there's another approach we could take. A couple of times I've talked about a function of type <tt>~~a -> a</tt> where <tt>~a</tt> is the type <tt>a -> Void</tt> and <tt>Void</tt> is the type with no elements. This corresponds to double negation elimination in classical logic. The Curry-Howard isomorphism tells us no such function can be implemented in a pure functional language, but we can translate expressions containing references to such a function into expressions that are completely pure. This is the so-called CPS translation. Anyway, I had this idea that we could do something limilar with monads so that we could translate expressions containing <tt>extract</tt> into ordinary Haskell code. Turns out there's already a paper on how to do this: <a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.43.8213">Representing Monads</a> by Andrzej Filinski.<br /><br />To translate all of Haskell this way would be a messy business. But just for fun I thought I'd implement a simpler translation for a small subset of Haskell. It's simply this:<br /><br />For a choice of monad m denote the translation of both types and values by T. So x::a becomes T(x)::T(a). The translation is simply:<br /><br /><blockquote><br />T(a) = m a on types<br />T(f x) = T(f) `ap` T(x)<br />T(extract x) = join T(x)<br /></blockquote><br /><br />The important thing is that <tt>extract</tt> of type <tt>m a -> a</tt> is replaced by <tt>join</tt> of type <tt>m (m a) -> m a</tt>.<br /><br />A great way to translate Haskell code is with Template Haskell. So here's some code:<br /><br /><pre><br />> (#) x y = liftM2 AppE x y<br /><br />> rewrite :: Exp -> ExpQ<br /><br />> rewrite (AppE f x) = do<br />> e <- [| extract |]<br />> if f==e<br />> then [| join |] # rewrite x<br />> else [| ap |] # rewrite f # rewrite x<br /><br /></pre><br />Most of the rest is support for some forms of syntactic sugar. First the infix operators:<br /><br /><pre><br />> rewrite (InfixE (Just x) f Nothing) =<br />> [| fmap |] # return f # rewrite x<br /><br />> rewrite (InfixE (Just x) f (Just y)) =<br />> [| liftM2 |] # return f # rewrite x # rewrite y<br /><br />> rewrite (InfixE Nothing f (Just y)) =<br />> [| fmap |] # ([| flip |] # return f) # rewrite y<br /><br /></pre><br />And list operations. For example <tt>[a,b,c]</tt> is sugar for <tt>a : b : c : []</tt>.<br /><br /><pre><br />> rewrite (ListE []) = [| return [] |]<br /><br />> rewrite (ListE (x:xs)) =<br />> [| liftM2 |] # [| (:) |] # rewrite x # rewrite (ListE xs)<br /><br />> rewrite x =<br />> [| return |] # return x<br /><br />> test :: ExpQ -> ExpQ<br />> test = (>>= rewrite)<br /><br /></pre><br /><tt>extract</tt> itself is just a placeholder that is supposd to be translated away:<br /><br /><pre><br />> extract = undefined<br /><br /></pre><br />If the above is placed in a file called Test.lhs then you can try compiling the following code.<br /><br /><pre><br />{-# LANGUAGE TemplateHaskell #-}<br /><br />import Test<br />import Control.Monad<br />import Control.Monad.Reader<br />import Control.Monad.Cont<br /><br />ex1 = $(test [|<br /> extract [1,2] + extract [10,20]<br /> |])<br />ex2 = runReader $(test [|<br /> extract ask + 7<br /> |]) 10<br />ex3 = $(test [|<br /> 1+extract (readLn :: IO Int)<br /> |] )<br /></pre><br /><br />The big omission is the lack of translation for lambda abstractions. I think that to get this right might requires translating all of the code using -|extract|- from the ground up, not just isolated expressions like those above. And like with CPS, you lose referential transparency and the order in which expressions are evaluated makes a difference.<br /><br />Anyway, this is a partial answer to the question posed <a href="http://www.haskell.org/pipermail/haskell-cafe/2006-December/020295.html">here</a> on "automonadization".<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-4935385668423688429?l=blog.sigfpe.com'/></div>sigfpehttp://www.blogger.com/profile/08096190433222340957noreply@blogger.com5tag:blogger.com,1999:blog-11295132.post-40132212495022175502008-12-24T13:19:00.001-08:002008-12-24T17:01:20.986-08:00The Mother of all MonadsSuppose someone stole all the monads but one, which monad would you want it to be? If you're a Haskell programmer you wouldn't be too bothered, you could just roll your own monads using nothing more than functions.<br /><br />But suppose someone stole do-notation leaving you with a version that only supported one type of monad. Which one would you choose? Rolling your own Haskell syntax is hard so you really want to choose wisely. Is there a universal monad that encompasses the functionality of all other monads?<br /><br />I often find I learn more computer science by trying to decode random isolated sentences than from reading entire papers. About a year ago I must have skimmed this <a href="http://sneezy.cs.nott.ac.uk/fplunch/weblog/?m=200712">post</a> because the line "the continuation monad is in some sense the mother of all monads" became stuck in my head. So maybe <tt>Cont</tt> is the monad we should choose. This post is my investigation of why exactly it's the best choice. Along the way I'll also try to give some insight into how you can make practical use the continuation monad. I'm deliberately going to avoid discussing the underlying mechanism that makes continuations work.<br /><br />So let's start with this simple piece of code<br /><br /><pre><br />> import Control.Monad.Cont<br /><br />> ex1 = do<br />> a <- return 1<br />> b <- return 10<br />> return $ a+b<br /><br /></pre><br />I haven't specified the monad but in almost every case we'd expect the result to have something to do with the number 11. For the list monad we get <tt>[11]</tt>, for the <tt>Maybe</tt> monad we get <tt>Just 11</tt> and so on. For the <tt>Cont</tt> monad we get something that takes a function, and applies it to 11. Here's an example of its use:<br /><br /><pre><br />> test1 = runCont ex1 show<br /><br /></pre><br /><tt>ex1</tt> is just a function that takes as argument <tt>show</tt> and applies it to 11 to give the string <tt>"11"</tt>. <tt>Cont</tt> and <tt>runCont</tt> are just wrapping and unwrapping functions that we can mostly ignore.<br /><br />We could have done that without continuations. So what exactly does the <tt>Cont</tt> monad give us here? Well let's make a 'hole' in the code above:<br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_UdKHLrHa05M/SVKnlvBBkiI/AAAAAAAAARQ/ovyCuO3I3u0/s1600-h/hole1.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 270px; height: 196px;" src="http://4.bp.blogspot.com/_UdKHLrHa05M/SVKnlvBBkiI/AAAAAAAAARQ/ovyCuO3I3u0/s400/hole1.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5283469579576775202" /></a><br />Whatever integer we place in the hole, the value of <tt>test1</tt> will be the result of adding one and applying <tt>show</tt>. So we can think of that picture as being a function whose argument we shove in the hole. Now Haskell is a functional programming language so we expect that we can somehow reify that function and get our hands on it. That's exactly what the continuation monad <tt>Cont</tt> does. Let's call the function we're talking about by the name <tt>fred</tt>. How can we get our hands on it? It's with this piece code:<br /><br /><pre><br />ex1 = do<br /> a <- return 1<br /> b <- Cont (\fred -> ...)<br /> return $ a+b<br /></pre><br /><br />The <tt>...</tt> is a context in which <tt>fred</tt> represents "the entire surrounding computation". Such a computaton is known as a "continuation". It's a bit hard to get your head around but the <tt>Cont</tt> monad allows you to write subexpressions that are able to "capture" the entirety of the code around them, as far as the function provided to <tt>runCont</tt>. To show that this is the case let's apply <tt>fred</tt> to the number 10:<br /><br /><pre><br />> ex2 = do<br />> a <- return 1<br />> b <- Cont (\fred -> fred 10)<br />> return $ a+b<br /><br />> test2 = runCont ex2 show<br /><br /></pre><br />The entire computation is applied to 10 and we get <tt>"11"</tt>. Now you know what <tt>return</tt> does in this monad. But that's a convoluted way of doing things. What other advantages do we get? Well the expression for <tt>b</tt> can do whatever it wants with <tt>fred</tt> as long as it returns the same type, ie. a string. So we can write this:<br /><br /><pre><br />> ex3 = do<br />> a <- return 1<br />> b <- Cont (\fred -> "escape")<br />> return $ a+b<br /><br />> test3 = runCont ex3 show<br /><br /></pre><br /><tt>fred</tt> is completely ignored. The entire computation is thrown away and instead of applying <tt>show</tt> to a number, we simply return <tt>"escape"</tt>. In other words, we have a mechanism for throwing values out of a computation. So continuations provide, among other things, an exception handling mechanism. But that's curious, because that's exactly what the <tt>Maybe</tt> monad provides. It looks like we might be able to simulate <tt>Maybe</tt> this way. But rather than do that, let's do something even more radical.<br /><br /><pre><br />> ex4 = do<br />> a <- return 1<br />> b <- Cont (\fred -> fred 10 ++ fred 20)<br />> return $ a+b<br /><br />> test4 = runCont ex4 show<br /><br /></pre><br />We've used <tt>fred</tt> twice. We've made the code around our "hole" run twice, each time executing with a different starting value. Continuations allow mere subexpressions to take complete control of the expressions within which they lie. That should remind you of something. It's just like the list monad. The above code is a lot like<br /><br /><pre><br />> test5 = do<br />> a <- return 1<br />> b <- [10,20]<br />> return $ a+b<br /><br /></pre><br />So can we emulate the list monad? Well instead of converting our integer to a string at the end we want to convert it to a list. So this will work:<br /><br /><pre><br />> ex6 = do<br />> a <- return 1<br />> b <- Cont (\fred -> fred 10 ++ fred 20)<br />> return $ a+b<br /><br />> test6 = runCont ex6 (\x -> [x])<br /><br /></pre><br />We can avoid those <tt>++</tt> operators by using <tt>concat</tt>:<br /><br /><pre><br />> ex7 = do<br />> a <- return 1<br />> b <- Cont (\fred -> concat [fred 10,fred 20])<br />> return $ a+b<br /><br />> test7 = runCont ex7 (\x -> [x])<br /><br /></pre><br />But now you may notice we can remove almost every depepndence on the list type to get:<br /><br /><pre><br />> ex8 = do<br />> a <- return 1<br />> b <- Cont (\fred -> [10,20] >>= fred)<br />> return $ a+b<br /><br />> test8 = runCont ex8 return<br /><br /></pre><br />Note, we're using monad related functions, but when we do so we're not using do-notation. We can now do one last thing to tidy this up:<br /><br /><pre><br />> i x = Cont (\fred -> x >>= fred)<br />> run m = runCont m return<br /><br /></pre><br />And now we have something close to do-notation for the list monad at our disposal again:<br /><br /><pre><br />> test9 = run $ do<br />> a <- i [1,2]<br />> b <- i [10,20]<br />> return $ a+b<br /><br /></pre><br />I hope you can see how this works. <tt>i x</tt> says that the continuation should be applied to <tt>x</tt>, not as an ordinary function, but with <tt>>>=</tt>. But that's just business as usual for monads. So the above should work for any monad.<br /><br /><pre><br />> test10 = run $ do<br />> i $ print "What is your name?"<br />> name <- i getLine<br />> i $ print $ "Merry Xmas " ++ name<br /><br /></pre><br />The Grinch has been foiled and we see that the continuation monad really is the mother of all monads.<br /><br />There are some interesting consequences of this beyond Haskell. Many languages with support for continuations should be extensible to support monads. In particular, if there is an elegant notation for continuations, there should be one for monads too. This is why I didn't want to talk about the underlying mechanism of the <tt>Cont</tt> monad. Different languages can implement continuations in different ways. An extreme example is (non-portable) C where you can reify continuations by literally flushing out all registers to memory and grabbing the stack. In fact, I've used this to implement something like the list monad for searching in C. (Just for fun, not for real work.) Scheme has <tt>call-with-current-continuation</tt> which can be used similarly. And even Python's <tt>yield</tt> does something a little like reifying a continuation and might be usable this way. (Is that's what's going on <a href="http://www.valuedlessons.com/2008/01/monads-in-python-with-nice-syntax.html">here</a>? I haven't read that yet.).<br /><br />This post was also inspired by <a href="http://www.diku.dk/~andrzej/papers/RM-abstract.html">this paper</a> by Filinski. I haven't followed the details yet (it's tricky) but the gist is similar. I was actually looking at Filinski's paper because of something I'll mention in my next post.<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-4013221249502217550?l=blog.sigfpe.com'/></div>sigfpehttp://www.blogger.com/profile/08096190433222340957noreply@blogger.com15tag:blogger.com,1999:blog-11295132.post-73735952302604093062008-11-29T16:44:00.000-08:002008-11-30T08:55:06.120-08:00An Approach to Algorithm ParallelisationThe other day I came across the paper <a href="http://portal.acm.org/citation.cfm?id=773473.178255">Parallelizing Complex Scans and Reductions</a> lying on a colleague's desk. The first part of the paper discussed how to make a certain family of algorithms run faster on parallel machines and the second half of the paper went on to show how, with some work, the method could be stretched to a wider class of algorithm. What the authors seemed to miss was that the extra work really wasn't necessary and the methods of the first half apply, with no change, to the second half. But don't take this as a criticism! I learnt a whole new way to approach algorithm design, and the trick to making the second half easy uses methods that have become popular in more recent years. Doing a web search I found lots of papers describing something similar to what I did.<br /><br />This is also a nice example of how the notion of abstraction in computing and the notion of abstraction in mathematics are exactly the same thing. But I'm getting ahead of myself.<br /><br />So here a direct translation from he paper of some procedural code to find the largest sum that can be made from a subsequence of a sequence. This will be our first implementation of the problem examined in the second half of the paper:<br /><br /><pre><br />> f1 [] (sofar,max) = (sofar,max)<br />> f1 (b:bs) (sofar,max) =<br />> let sofar' = if sofar+b<0<br />> then 0<br />> else sofar+b<br />> max' = if max<sofar'<br />> then sofar'<br />> else max<br />> in f1 bs (sofar',max')<br /><br /></pre><br /><tt>sofar</tt> is a running sum that is reset each time it dips below zero, and <tt>max</tt> keeps track of the best sum so far. We initialise <tt>sofar</tt> and <tt>sum</tt> with <tt>0</tt> and <tt>-infinity</tt>. Here's an example of how to ue it:<br /><br /><pre><br />> b :: [Double]<br />> b = [1..5] ++ [5,4..(-10)] ++ [(-2)..6]<br /><br />> infinity :: Double<br />> infinity = 1/0<br />> test1 b = snd $ f1 b (0,-infinity)<br /><br /></pre><br />Notice how we prime the algorithm with a starting vector. The <tt>0</tt> corresponds to the fact that at the start we've summed over 0 elements and the <tt>-infinity</tt> corresponds to the fact that we want the first sum computed to be the highest so far at that point.<br /><br />Test the code with <tt>test1 b</tt>. We'll use a similar pattern all the way through this code.<br /><br />You may see the problem with making this parallelisable: we are maintaining running sums so that the final values of <tt>sofar</tt> and <tt>max</tt> all depend on what was computed earlier. It's not obvious that we can break this up into pieces. <tt>sofar</tt> computes sums of subsequences between resets but chopping the array <tt>b</tt> into pieces might split such subsequences between processors. How can we handle this cleanly?<br /><br />The first step is to write version two of the above function using <tt>max</tt> instead of conditionals:<br /><br /><pre><br />> f2 [] (sofar,max) = (sofar,max)<br />> f2 (b:bs) (sofar,max) =<br />> let sofar' = Prelude.max (sofar+b) 0<br />> max' = Prelude.max max sofar'<br />> in f2 bs (sofar',max')<br /><br />> test2 b = snd $ f2 b (0,-infinity)<br /><br /></pre><br />But that doesn't appear to make things any easier, we've just buried the conditionals inside <tt>max</tt>, it doesn't make the serial dependency go away.<br /><br />So let's solve another problem instead. In <tt>f2</tt> I'll replace <tt>max</tt> with addition and addition with multiplication. <tt>0</tt> is the identity for addition so we should replace it with the identity for multiplication, <tt>1</tt>. Similarly, <tt>-infinity</tt> is the identity for <tt>max</tt> so we should replace that with <tt>0</tt>. We get:<br /><br /><pre><br />> f3 [] (sofar,max) = (sofar,max)<br />> f3 (b:bs) (sofar,max) =<br />> let sofar' = sofar*b+1<br />> max' = max+sofar'<br />> in f3 bs (sofar',max')<br /><br />> test3 b = snd $ f3 b (1,0)<br /><br /></pre><br />That's all very well but (1) it looks no more parallelisable and (2) it's solving the wrong problem. Let's ignore problem (2) for now.<br /><br />The thing that makes <tt>f3</tt> easier to work with is that it's almost a linear function acting on the vector <tt>(sofar,max)</tt>. Linear functions have one very nice property. If f and g are linear then we can compute f(g(x)) by acting with g first, and then applying f. But we can also compose f and g without reference to x giving us another linear function. We only have to know how f and g act on basis elements and we can immediately compute how <tt>f . g</tt> acts on basis elements. This is usually expressed by writing f and g as matrices. So let's tweak <tt>f3</tt> so it's linear in its last argument:<br /><br /><pre><br />> f4 [] (sofar,max,i) = (sofar,max,i)<br />> f4 (b:bs) (sofar,max,i) =<br />> let sofar' = (sofar * b) + i<br />> max' = max + sofar'<br />> i' = i<br />> in f4 bs (sofar',max',i')<br /><br />> test4 b = let (_,max,_) = f4 b (1,0,1) in max<br /><br /></pre><br />So now I need to write some code to work with linear functions. I'll do it in a very direct style. Here are some tuples representing basis vectors. (I could have written some fancy vector/matrix code but I don't want to distract from the problem in hand.)<br /><br /><pre><br />> x,y,z :: Num a => (a,a,a)<br />> x = (1,0,0)<br />> y = (0,1,0)<br />> z = (0,0,1)<br /><br /></pre><br />And here's some code that computes how a function acts on a basis, in effect finding the matrix for our function with respect to the basis <tt>x,y,z</tt>:<br /><br /><pre><br />> matrix f = (f x,f y,f z)<br /><br /></pre><br />Some simple operations on vectors:<br /><br /><pre><br />> (a,b,c) .+ (d,e,f) = (a + d,b + e,c + f)<br />> a .* (b,c,d) = (a * b,a * c,a * d)<br /><br /></pre><br />And now a little function that, given how <tt>f</tt> acts on basis elements, can apply <tt>f</tt> to any vector: <br /><br /><pre><br />> apply (mx,my,mz) (sofar,max,i) = (sofar .* mx) .+ (max .* my) .+ (i .* mz)<br /><br /></pre><br />Now we can redo the calculation with <tt>f4</tt> by first making the matrix for <tt>f4</tt>, and then applying that to our starting vector.<br /><br /><pre><br />> test5 b = let m = matrix (f4 b)<br />> (_,max,_) = apply m (1,0,1)<br />> in max<br /><br /></pre><br />Note how by time we've computed <tt>m</tt> we've done almost all of the work even though the code hasn't yet touched <tt>(1,0,1)</tt>.<br /><br />But now comes the first bit of magic. We can split our list <tt>b</tt> into pieces. Compute the corresponding matrix for each piece on a separate processor, and then apply the resulting matrices to our starting vector.<br /><br />Let's chop our list of reals into pieces of size <tt>n</tt>:<br /><br /><pre><br />> chop n [] = []<br />> chop n l = let (a,b) = splitAt n l in a : chop n b<br /><br /></pre><br />We'll use pieces of size 10:<br /><br /><pre><br />> test6 b = max where<br />> (_,max,_) = foldr ($) (1,0,1) (reverse b_functions) where<br />> b_pieces = chop 10 b<br /><br /></pre><br />The following <tt>map</tt>s should be replaced with a parallel version. It's easy to do this.<br /><br /><pre><br />> b_matrices = map (matrix . f4) b_pieces<br />> b_functions = map apply b_matrices<br /><br /></pre><br />Great, we've successfully parallelised the code, but it's the wrong algorithm. How can we use this to solve the correct problem? Remember how we replaced <tt>max</tt> with addition and addition with multiplication. We just have to undo that. That's all! Everything required to prove that the above parallelisation is valid applies over any <a href="http://en.wikipedia.org/wiki/Semiring">semiring</a>. At no point did we divide or subtract, and we only used elementary properties of numbers like a*(b+c) = a*b+a*c. That property holds for <tt>max</tt> and addition. In fact a+max(b,c) = max(a+b,a+c). We don't even have to modify the code. We can just define the max-plus semiring as follows:<br /><br /><pre><br />> newtype MaxPlus = M Double deriving (Eq,Show)<br /><br />> instance Num MaxPlus where<br />> fromInteger 0 = M (-infinity)<br />> fromInteger 1 = M 0<br />> fromInteger _ = error "no conversion from general integer"<br />> M a + M b = M (max a b)<br />> M a * M b = M (a+b)<br />> abs _ = error "no abs"<br />> signum _ = error "no signum"<br />> negate _ = error "no negate"<br /><br /></pre><br />And now all we need is<br /><br /><pre><br />> test7 b = test6 (fmap M b)<br /><br /></pre><br />(I wonder if ghc is smart enough to completely eliminate that <tt>fmap M</tt>, after all, on a newtype, <tt>M</tt> should do zero work.)<br /><br />And that's a completely parallelised version of the original algorithm.<br /><br />There is a ton of optimisation that can be performed here. In particular, <tt>matrix</tt> applies a function to a fixed basis. For the particular function we're using here there's a big win from constant folding. The same constant folding applies in any semiring.<br /><br />And back to the point I made at the beginning. By abstracting from the reals to a general semiring we are able to make the same code perform multiple duties: it can work on functions linear over many semirings, not just the reals. Mathematicians don't work with abstractions just to make life hell for students - they do so because working with abstract entities allows the same words to be reused in a valid way in many different contexts. This benefits both mathematicians and computer scientists.<br /><br />Here's a <a href="http://www.google.com/search?q=parallel+prefix">link</a> you can use to find out more on this technique.<br /><br />Note that in real world usage you wouldn't use lists. -|chop|- would take longer than the rest of the algorithm.<br /><br />PS A curious aside. I spent ages trying to get ghc to compile this code and getting my homebrew HTML markup code to work reliably on it. But eventually I located the problem. I've just returned from Hawai`i where I wrote most of this code. Just for fun I'd put my keyboard into Hawai`ian mode and forgot about it. When I did that, the single quote key started generating the unicode symbol for the Hawai`ian glottal stop. It looks just like a regular single quote in my current terminal font so it was hard to see anything wrong with the code. But, quite reasonably, ghc and many other tools barf if you try to use one where a regular quote is expected.<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-7373595230260409306?l=blog.sigfpe.com'/></div>sigfpehttp://www.blogger.com/profile/08096190433222340957noreply@blogger.com6tag:blogger.com,1999:blog-11295132.post-40548542309035215712008-11-15T12:52:00.000-08:002008-11-15T16:09:46.397-08:00Some thoughts on reasoning and monadsJust a short note about some half-formed thoughts on the subject of monads and reasoning.<br /><br />Haskell types and total functions, at least for a suitable subset of Haskell, form a category, with types and functions as the objects and arrows. Haskell code is usually full of expressions whose types don't correspond to arrows (e.g.. <tt>1::Int</tt>). But in category theory we can only really talk of objects and arrows, not elements of objects. Many categories are made up of objects that have no reasonable notion of an element. By writing code in pointfree style we can eliminate reference to individual elements and talk only about arrows.<br /><br />For example, suppose <tt>f :: A -> B</tt> and <tt>g :: B -> C</tt>. Then we can define <tt>h :: A -> C</tt> in a couple of different ways. Pointfree:<br /><pre><br />h = g . f<br /></pre><br />or pointful:<br /><pre><br />h x = let y = f x<br /> z = g y<br /> in z<br /></pre><br />For examples as simple as this it's not hard to write code that will translate between one style and the other. But the former definition has an advantage, it works in any category. We'll consider a very simple example. Let R be the category of real numbers, where there is an arrow from x to y if x≤y. We can think of f and g as arrows in this category. If we know A≤B and B≤C then the first definition above tells us how to construct an arrow from A to C and hence it proves that A≤C. The second definition makes no sense because it relies on the notion of x as an element of A and uses the functions f and g to construct elements of B and C. These words hold no meaning in the category R where the arrows aren't functions and the objects aren't containers of elements.<br /><br />Except that's not quite true.<br /><br />Because there is a scheme for translating the second pointful definition to pointfree form, the second definition does in fact provide a proof that A≤C. We just have to bear in mind that the proof needs to be translated into pointfree form first. In fact, we can happily spend our day using pointful style to generate proofs about R, as long as at the end of the day we translate our proofs to pointfree notation. In fact, Haskell programmers know that it's often much easier to write programs in pointful style so it seems reasonable to guess that there are many proofs that are easier to write in pointful style even though they can't be interpreted literally. Philosophically this is a bit weird. As long as we restrict ourselves to chains of reasoning that can be translated, we can use intuitions about elements of objects to make valid deductions about domains where these notions make no sense.<br /><br />Part I of <a href="http://books.google.com.vc/books?id=6PY_emBeGjUC">Lambek and Scott</a> is about the correct pointful language to use when talking about cartesian closed categories (CCCs). They use a form of typed lambda calculus. Every arrow in a cartesian category can be written as a pointful lambda expression. Even though there isn't a meaningful way to assign meaning to the 'points' individually, every lambda abstraction gives an arrow in a cartesian closed category that can be built using the standard parts that come with a CCC, and vice versa. The pointful definition of <tt>h</tt> above is an example. Here's a (very) slightly less trivial example. Given <tt>f :: A -> C</tt> and <tt>g :: B -> D</tt> we can define <tt>h :: (A,C) -> Either B D</tt> as<br /><pre><br />f x = let y = fst x<br /> z = Left y<br /> in z<br /></pre><br />Again, that code can be translated into pointfree form using only the standard parts available in a cartesian closed category. Actually, we only need a category with products and coproducts for this example. So consider a lattice. The above code can be translated into a proof that A∩C≤B∪D. (I'm using cap and cup for join and meet.) Again, it makes no sense to talk of elements of a general lattice as containers of elements, and yet after translation to pointfree notation we have a valid proof. If you can restrict yourself to a suitable subset of set theory then your proofs involving elements of objects can carry over to many other categories.<br /><br />Part II of Lambek and Scott is about taking this idea to extremes. It's about categories known as toposes. A topos is a category that is even more like the category of sets than a CCC. It's still general enough that there are many kinds of toposes, but you can use a sizable portion of first order logic and ZF to make deductions about them. Again, the literal notion of membership of the objects of a topos might make no sense, but the proofs have a translation to pointfree notation. In fact, it's possible to write entire papers in what looks like conventional set theory language, and have them be valid for other toposes. <a href="http://home.imf.au.dk/kock/">Anders Kock</a>, for example, writes such papers. Chris Isham has been arguing that topos theory is the correct framework for physics. If you interpret your propositions as being in the category Set then you get classical physics. But those same propositions can be interpreted in other categories, such as one for quantum mechanics, giving a way to use and extend classical language to reason about quantum systems. This set theory-like language is known as the "internal language" of a topos.<br /><br />Anyway, I'm interested in the notion that Haskell do-notation provides another kind of pointful language that can be used to reason about situations where points don't seem at first to make sense. Consider the lattice of subsets of a topological space, ordered by inclusion. Let Cl(U) be the closure of U. Cl satisfies these properties:<br /><blockquote><br />X⊂Y implies Cl(X)⊂Cl(Y)<br />X⊂Cl(X)<br />and<br />Cl(CL(X))⊂Cl(X)<br /></blockquote><br />Look familiar? If we make this lattice into a category, with arrows being inclusions, then the first property states that Cl is a functor and the next two say that Cl is a monad. In fact, monads are a kind of generalised closure. So now suppose we're given A⊂Cl(B) and B⊂Cl(C) and wish to prove that A⊂Cl(C). We can rephrase this by saying that if <tt>f :: A -> Cl B</tt> and <tt>g :: B -> Cl C</tt> we need an arrow <tt>h :: A -> Cl C</tt>. We can write one like this:<br /><pre><br />h x = do<br /> y <- f x<br /> z <- g y<br /> return z<br /></pre><br />Now it's tempting to interpret the inclusions as functions with <tt>y <- f x</tt> saying that <tt>y</tt> is the image of <tt>x</tt> under the inclusion. (I don’t know about you, but when I write <tt>x <- getChar</tt> I think to myself “<tt>x</tt> is the return value from calling <tt>getChar</tt>”.) But that doesn't really work because the type of <tt>f x</tt> is <tt>Cl B</tt> but <tt>y</tt> is of type <tt>B</tt>. On the other hand, we can radically reinterpret the above as something like this: when arguing about chains of inclusions of subsets of a topological space, as long as at the end of the chain of inclusions you always end up in the closure of some subset, you're allowed to cheat and nudge a generic point in the closure of a subset back into the original subset. This is exactly parallel to the way do-notation seems to allow us to extract elements out of monadic objects as long as at the end of the do-block we always return an element of monadic type. I'm sure that with a bit of work we could produce a rigorous metatheorem from this. I also expect we can also produce something similar for comonads.<br /><br />Anyway, the moral is that when working with categories with monads there are may be some interesting and unusual ways to reason. The example of the lattice of subsets is fairly trivial but I'm sure there are other interesting examples. I also expect there's a nice connection with modal logic. I now think of Haskell do-notation as the "internal language" of a category with monads.<br /><br />Update: I left out the crucial sentences I meant to write. It's easy to see do-notation as a kind of Haskell specific trick for making things like IO heavy code look like traditional procedural code. But comparison with the theory in Lambek and Scott, and Topos Theory in general, makes it clear that do-notation is a member of a family of related languages.<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-4054854230903521571?l=blog.sigfpe.com'/></div>sigfpehttp://www.blogger.com/profile/08096190433222340957noreply@blogger.com9tag:blogger.com,1999:blog-11295132.post-19829488059082139642008-11-08T10:13:00.000-08:002008-11-09T07:43:23.598-08:00From Monoids to Monads<H3>Generalising Monoids</H3><br /><br />The word 'monad' is derived from the word 'monoid'. The explanation usually given is that there is an analogy between monoids and monads. On the surface, this seems a bit unlikely. The <tt>join</tt> operation in a monad is supposed to correspond to the binary operator in the monoid, but <tt>join</tt> is a completely different kind of thing, certainly not a binary operator in any usual sense.<br /><br />I'm going to make this analogy precise so that it's clear that both monoids and monads are examples of the same construction. In fact, I'm going to write some Haskell code to define monoids and monads in almost exactly the same way. I was surprised to find I could do this because instances of Haskell's <tt>Monoid</tt> and <tt>Monad</tt> aren't even the same kind of thing (where I'm using 'kind' in its <a href="http://www.haskell.org/onlinereport/decls.html">technical</a> sense). But it can be done.<br /><br />So let's start thinking about monoids. They are traditionally sets equipped with a special element and a binary operator so that the special element acts as an identity for the binary operator, and where the binary operator is associative. We expect type signatures something like <tt>one :: m</tt> and <tt>mult :: m -> m -> m</tt> so that, for example, <tt>m (m a b) c == m a (m b c)</tt>. That's fine as it stands, but it doesn't generalise easily. In particular it'd be nice to generalise this definition to other categories. To do that we need to rephrase the definitions so that they are completely <a href="http://www.haskell.org/haskellwiki/Pointfree">point-free</a> and are written purely as the compositions of arrows.<br /><br />Let's start by thinking about the rule that says multiplying by the identity on the left should leave a value unchanged. Ie. we want <tt>mult one x == x</tt> for all <tt>x</tt>. We already have a problem, it refers to a couple of 'points', both <tt>one</tt>, the identity, and <tt>x</tt> the unknown. We can deal with the first one easily, we just replace <tt>one</tt> with an arrow <tt>() -> m</tt>. But we also need some plumbing to provide two arguments to the <tt>mult</tt> function. Rather than belabour the point, I'll just give the full code:<br /><br /><pre><br />> {-# OPTIONS_GHC -fglasgow-exts #-}<br /><br />> import Control.Monad<br />> import Test.QuickCheck<br /><br />> class Monoid m where<br />> one :: () -> m<br />> mult :: (m,m) -> m<br /><br /></pre><br />The law for multiplication on the left is then given by <tt>law1_left == law1_middle</tt> and so on:<br /><br /><pre><br />> law1_left,law1_middle,law1_right :: m -> m<br />> law1_left = mult . (one <#> id) . lambda<br />> law1_middle = id<br />> law1_right = mult . (id <#> one) . rho<br /><br /></pre><br />The associativity law is then given by <tt>law2_left x = law2_right x</tt>.<br /><br /><pre><br />> law2_left,law2_right :: ((m,m),m) -> m<br />> law2_left = mult . (mult <#> id)<br />> law2_right = mult . (id <#> mult) . alpha<br /><br /></pre><br />The left and right hand sides of the laws are now point-free. But in order to do this I've had to write some auxiliary functions:<br /><br /><pre><br />> lambda :: a -> ((),a)<br />> lambda x = ((),x)<br /><br />> rho :: a -> (a,())<br />> rho x = (x,())<br /><br />> alpha :: ((a,b),c) -> (a,(b,c))<br />> alpha ((x,y),z) = (x,(y,z))<br /><br /></pre><br />I've also used the fact that <tt>(,)</tt> is a bifunctor, ie. it's a functor in each of its arguments so <tt>(,)</tt> doesn't just give a way to generate a new type <tt>(a,b)</tt> from types <tt>a</tt> and <tt>b</tt>. It also combines pairs of arrows to make new arrows. I'll call the part of <tt>(,)</tt> that acts on arrows by the name <tt><#></tt>:<br /><br /><pre><br />> (<#>) :: (a -> c) -> (b -> d) -> (a, b) -> (c, d)<br />> (f <#> g) (x,y) = (f x,g y)<br /><br /></pre><br />Intuitively, <tt>f <#> g</tt> maps <tt>f</tt> on the left and <tt>g</tt> on the right of <tt>(a,b)</tt><br /><br />Try unpacking those definitions to see that we get the usual monoid laws. For example<br /><br /><pre><br />law2_left ((x,y),z) == mult $ (mult <#> id) ((x,y),z) == mult (mult (x,y),z)<br />law2_right ((x,y),z) == mult $ (id <#> mult) $ alpha ((x,y),z) == mult (x,mult (y,z))<br /></pre><br /><br />So we get<br /><br /><pre><br />mult (mult (x,y),z) == mult (x,mult (y,z))<br /></pre><br /><br />the usual associativity law.<br /><br />Now that this definition is point-free it seems we could just carry it over to any category. In fact, we've implicitly done this because we've carried over the definition of a monoid from Set to Hask, the category of Haskell types and total functions. We're so used to treating Hask as a proxy for Set we can forget they are actually different categories. But this definition of a monoid works in both. But what about that <tt>lambda</tt>, <tt>rho</tt> and <tt>alpha</tt>? Well they're easy to define in any Cartesian closed category (CCC). But we don't need all of the features of a CCC to define a monoid, we just need <tt>lambda</tt>, <tt>rho</tt> and <tt>alpha</tt> and some kind of 'product' on the set of objects that also acts like a bifunctor. In fact, there's a bunch of 'obvious' laws that these functions satisfy in a CCC. Any category with these functions satisfying these same laws is called a monoidal category. The above definitions allow us to transfer the definition of a monoid to any such category. For the full definition, see the <a href="http://en.wikipedia.org/wiki/Monoidal_category">wikipedia entry</a>.<br /><br />Anyway, let's check to see if the type <tt>Int</tt> might be a monoid:<br /><br /><pre><br />> instance Monoid Int where<br />> one _ = 1<br />> mult (a,b) = a*b<br /><br />> check1 = quickCheck $ \n -> law1_left n == law1_middle (n :: Int)<br />> check2 = quickCheck $ \n -> law1_left n == law1_right (n :: Int)<br />> check3 = quickCheck $ \n -> law2_left n == (law2_right n :: Int)<br /><br /></pre><br />Of course that's no proof, but it gives us some confidence. (Eg. what about large numbers close to 2<sup>31</sup>...?)<br /><br /><H3>Another Category in the World of Haskell</H3><br /><br />Most people around here will be familiar with how Haskell types and functions form a category Hask in the obvious way. But it's less well known that there is another category lurking in Haskell. Consider the set of all Haskell functors. These are endofunctors, ie. functors Hask→Hask. Between any two functors is the set of natural transformations between those functors. (If <tt>f</tt> and <tt>g</tt> are functors, then the polymorphic functions <tt>f a -> g a</tt> are the <a href="http://sigfpe.blogspot.com/2008/05/you-could-have-defined-natural.html">natural transformations</a>.) In addition, you can compose natural transformations and the composition is associative. In other words, Haskell functors form a category. (See the appendix for more abstract nonsense relating to this.)<br /><br />We'll call the category of endofunctors on Hask by the name E. Functors can be composed like so:<br /><br /><pre><br />> type (f :<*> g) x = f (g x)<br /><br /></pre><br />It'd be cool if this was a product in the usual categorical sense, but it isn't. There isn't a natural way to map to both <tt>f a</tt> and <tt>g a</tt> from <tt>f (g a)</tt> with the universal property of <a href="http://en.wikipedia.org/wiki/Categorical_product">products</a>. Instead it's a weaker type of product which is still a bifunctor. Here's how it acts on arrows (remembering that in E, arrows are natural transformations):<br /><br /><pre><br />> (<*>) f g = f. fmap g<br /><br /></pre><br />(We could also have used <tt>fmap g . f</tt>.) If you think of <tt>f <#> g</tt> as making <tt>f</tt> act on the left and <tt>g</tt> act on the right, then you can think of <tt>f <*> g</tt> as making <tt>f</tt> and <tt>g</tt> act on the outer and inner containers of a container of containers.<br /><br />Here's the identity element for this product, the identity functor. It plays a role similar to <tt>()</tt> in Hask:<br /><br /><pre><br />> data Id x = Id x deriving Show<br />> instance Functor Id where<br />> fmap f (Id x) = Id (f x)<br /><br /></pre><br />We can now define some familiar looking natural transformations:<br /><br /><pre><br />> lambda' :: Functor f => f a -> (Id :<*> f) a<br />> lambda' x = Id x<br /><br />> rho' :: Functor f => f a -> (f :<*> Id) a<br />> rho' x = fmap Id x<br /><br />> alpha' :: f (g (h a)) -> f (g (h a))<br />> alpha' = id<br /><br /></pre><br />With these we have turned E into a monoidal category.<br /><br />(OK, this may be confusing. We're 'multiplying' functors and we have associativity and a left- and right-identity. So functors form a monoid (modulo isomorphism). But that's a distraction, these are not the monoids that you're looking for. See the appendix for more on this.)<br /><br />So now we can define monoids in this category using almost the same code as above:<br /><br /><pre><br />> class Functor m => Monoid' m where<br />> one' :: Id a -> m a<br />> mult' :: (m :<*> m) a -> m a<br /><br />> law1_left',law1_middle',law1_right' :: m a -> m a<br />> law1_left' = mult' . (one' <*> id) . lambda'<br />> law1_middle' = id<br />> law1_right' = mult' . (id <*> one') . rho'<br /><br />> law2_left',law2_right' :: ((m :<*> m) :<*> m) a -> m a<br />> law2_left' = mult' . (mult' <*> id)<br />> law2_right' = mult' . (id <*> mult') . alpha'<br /><br /></pre><br />And here's the punch line. That's precisely a monad, laws 'n' all.<br /><br />If you want, you can unpack the definitions above and see that they correspond to the usual notion of a monad. We can write code to do the translation:<br /><br /><pre><br />> data Monad m => TranslateMonad m a = TM { unTM :: m a } deriving (Eq,Show)<br /><br />> translate :: Monad m => m a -> TranslateMonad m a<br />> translate x = TM x<br /><br />> instance (Monad m,Functor m) => Functor (TranslateMonad m) where<br />> fmap f (TM x) = TM (fmap f x)<br /><br />> instance (Functor m,Monad m) => Monoid' (TranslateMonad m) where<br />> one' (Id x) = TM $ return x<br />> mult' (TM x) = TM $ fmap unTM x >>= id<br /><br /></pre><br />In other words, any instance of <tt>Monad</tt> gives an instance of <tt>Monoid'</tt>. I'll let you write the reverse mapping. We can even check (not prove!) the laws are satisfied by a particular monad by using <tt>QuickCheck</tt>:<br /><br /><pre><br />> instance Arbitrary a => Arbitrary (Id a) where<br />> arbitrary = liftM Id arbitrary<br /><br />> instance (Monad m,Eq (m a),Arbitrary (m a)) => Arbitrary (TranslateMonad m a) where<br />> arbitrary = liftM TM arbitrary<br /><br />> check4 = quickCheck $ \n -> law1_left' n == law1_middle' (n :: TranslateMonad [] Int)<br />> check5 = quickCheck $ \n -> law1_left' n == law1_right' (n :: TranslateMonad [] Int)<br />> check6 = quickCheck $ \n -> law2_left' n == (law2_right' n :: TranslateMonad [] Int)<br /><br /></pre><br />I don't know about you, but I find this mind-blowing. We took a definition of a simple concept, the monoid. We then abstractificated its definition so that it applied to any monoidal category, and in doing so we completely changed the meaning of <tt>one</tt> and <tt>mult</tt>. And yet the resulting object, including its laws, are precisely what you need to solve a whole host of problems in algebra and Haskell programming. This is an amazing example of the <a href="http://www.ccrnp.ncifcrf.gov/~toms/Hamming.unreasonable.html">unreasonable effectiveness of mathematics</a>. The concept might be a little tricky to grasp: monads are like ordinary monoids, but with outer and inner replacing left and right. But the payoff from this is that intuitions about monoids carry over to monads. I hope to say more about this in future episodes.<br /><br /><H3>Appendix</H3><br /><br />Consider the category, H, with one object, Hask, and whose arrows are the endofunctors on Hask. We've shown how the arrows on this category aren't just a set, they themselves form a category. This makes H a 2-category. A category with one object is a monoid. But this is a 2-category, so we have a 2-monoid. In fact, there are some extra details required to show we have a 2-category. For example, we need to show that composition of arrows (which now form a category, not a set) is a functor (not just a function). That follows from the <a href="http://sigfpe.blogspot.com/2008/05/interchange-law.html">Interchange Law</a> which I've already talked about. But note that H is a monoid in a completely conventional way, its arrows form a set with a binary operator on them. This is not the structure that corresponds to monads, although it plays a part in constructing them. Also, don't confuse this monoid with the one that appears in a <tt>MonadPlus</tt>.<br /><br /><H3>Confession</H3><br /><br />I'm having trouble giving <tt><*></tt> the correct type signature. I think it should be something like <tt>(forall x.a x -> c x) -> (forall x.b x -> d x) -> (forall x.a (b x) -> c (d x))</tt>. GHC doesn't like it. Can anyone come up with the correct thing? The code still works.<br /><br /><H3>Links</H3><br />Another <a href="http://scienceblogs.com/goodmath/2008/03/meta_out_the_wazoo_monads_and.php">account</a> of the same subject, modulo some details.<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-1982948805908213964?l=blog.sigfpe.com'/></div>sigfpehttp://www.blogger.com/profile/08096190433222340957noreply@blogger.com13tag:blogger.com,1999:blog-11295132.post-53543533936324371472008-10-25T18:45:00.000-07:002008-10-27T09:35:39.031-07:00Operads and their MonadsHardly a day goes by at the <a href="http://golem.ph.utexas.edu/category/">n-Category Cafe</a> without <a href="http://homepages.ulb.ac.be/~fschlenk/Maths/What/operad.pdf">operads</a> being mentioned. So it's time to write some code illustrating them.<br /><br /><pre><br />> {-# LANGUAGE FlexibleInstances #-}<br /><br />> import Data.Monoid<br />> import Control.Monad.Writer<br />> import Control.Monad.State<br /><br /></pre><br />Let's define a simple tree type:<br /><br /><pre><br />> data Tree a = Leaf a | Tree [Tree a] deriving (Eq,Show)<br /><br /></pre><br />Sometimes we want to apply a function to every element of the tree. That's provided by the <tt>fmap</tt> member of the <tt>Functor</tt> type class.<br /><br /><pre><br />> instance Functor Tree where<br />> fmap f (Leaf x) = Leaf (f x)<br />> fmap f (Tree ts) = Tree $ map (fmap f) ts<br /><br /></pre><br />But just as we can't use map to apply monadic functions to a list (we'd write <tt>mapM print [1,2,3]</tt>), we can't use fmap to apply them to our tree. What we need is a monadic version of <tt>fmap</tt>. Here's a suitable type class:<br /><br /><pre><br />> class FunctorM c where<br />> fmapM :: Monad m => (a -> m b) -> c a -> m (c b)<br /><br /></pre><br />(I could have used Data.Traversable but that entails Data.Foldable and that's too much work.)<br /><br />And now we can implement a monadic version of <tt>Tree</tt>'s <tt>Functor</tt> instance:<br /><br /><pre><br />> instance FunctorM Tree where<br />> fmapM f (Leaf x) = do<br />> y <- f x<br />> return (Leaf y)<br />> fmapM f (Tree ts) = do<br />> ts' <- mapM (fmapM f) ts<br />> return (Tree ts')<br /><br /></pre><br />We can use <tt>fmapM</tt> to extract the list of elements of a container type:<br /><br /><pre><br />> serialise a = runWriter $ fmapM putElement a<br />> where putElement x = tell [x]<br /><br /></pre><br />Not only does <tt>serialise</tt> suck out the elements of a container, it also spits out an empty husk in which all of the elements have been replaced by <tt>()</tt>. We can think of the latter as the 'shape' of the original structure with the original elements removed. We can formalise this as<br /><br /><pre><br />> type Shape t = t ()<br /><br /></pre><br />So we have:<br /><br /><pre><br />> serialise :: FunctorM t => t a -> (Shape t,[a])<br /><br /></pre><br />Keeping the shape around allows is to invert <tt>serialise</tt> to give:<br /><br /><pre><br />> deserialise :: FunctorM t => Shape t -> [a] -> (t a, [a])<br />> deserialise t = runState (fmapM getElement t) where<br />> getElement () = do<br />> x:xs <- get<br />> put xs<br />> return x<br /><br /></pre><br />(That's a bit like using the supply monad. This function also returns the leftovers.)<br /><br />We can also write (apologies for the slightly cryptic use of the writer monad):<br /><br /><pre><br />> size :: FunctorM t => t a -> Int<br />> size a = getSum $ execWriter $ fmapM incCounter a<br />> where incCounter _ = tell (Sum 1)<br /><br /></pre><br />Let's try an example. Here's an empty tree:<br /><br /><pre><br />> tree1 = Tree [Tree [Leaf (),Leaf ()],Leaf ()]<br /><br /></pre><br />We can pack a bunch of integers into it:<br /><br /><pre><br />> ex1 = fst $ deserialise tree1 [1,2,3]<br /><br /></pre><br />And get them back out again:<br /><br /><pre><br />> ex2 = serialise ex1<br /><br /></pre><br /><tt>serialise</tt> separates the shape from the data, something you can read lots more about at <a href="http://www-staff.it.uts.edu.au/~cbj/Publications/shapes.html">Barry Jay</a>'s web site.<br /><br />Remember that <a href="http://sigfpe.blogspot.com/2006/11/variable-substitution-gives.html">trees are also monads</a>:<br /><br /><pre><br />> instance Monad Tree where<br />> return x = Leaf x<br />> t >>= f = join (fmap f t) where<br />> join (Leaf t) = t<br />> join (Tree ts) = Tree (fmap join ts)<br /><br /></pre><br />The <tt>join</tt> operation for a tree grafts the elements of a tree of trees back into the tree.<br /><br />Right, that's enough about trees and shapes for now.<br /><br /><a href="http://en.wikipedia.org/wiki/Operad_theory">Operads</a> are a bit like the plumbing involved in installing a sprinkler system. Suppose you have a piece, A that splits a single pipe into two:<br /><br /><pre><br /> ______<br /> /<br /> / ____<br />____/ /<br /> A /<br />____ \<br /> \ \____<br /> \<br /> \______<br /></pre><br /><br />And you have two more pipes B and C that look like this:<br /><br /><pre><br /> ______<br /> /<br /> / ____<br /> / /<br />____/ \____<br /> <br /> B or C<br />____ ____<br /> \ /<br /> \ \____<br /> \<br /> \______<br /><br /></pre><br /><br />then you can 'compose' them to make a larger system like this:<br /><br /> <br /><br /><pre><br /> ______<br /> /<br /> / ____<br /> / /<br /> _________/ \____<br /> / B <br /> / _______ ____<br /> / / \ /<br /> / / \ \____<br /> / / \<br />____/ / \______<br /> A /<br />____ \ ______<br /> \ \ /<br /> \ \ / ____<br /> \ \ / /<br /> \ \_______/ \____<br /> \ C <br /> \_________ ____<br /> \ /<br /> \ \____<br /> \<br /> \______<br /></pre><br /><br />(Vim rectangular mode FTW!)<br /><br />The important thing to note here is that as A had two outputs (or inputs, depending on your point of view) you can attach two more pieces, like B and C, directly to it.<br /><br />Call the number of outlets the 'degree' of the system. If A has degree n then we can attach n more systems, A<sub>1</sub>...A<sub>n</sub> to it and the resulting system will have degree degree(A<sub>1</sub>)+...+degree(A<sub>n</sub>). We can write the result as A(A<sub>1</sub>,...,A<sub>n</sub>).<br /><br />We also have the 'identity' pipe that looks like this:<br /><br /><pre><br />_____________<br /><br /> identity<br />_____________<br /></pre><br /><br />Formally, an operad is a collection of objects, each of which has a 'degree' that's an integer n, n≥0, depending on your application), an identity element of degree 1, and a composition law:<br /><br /><pre><br />> class Operad a where<br />> degree :: a -> Int<br />> identity :: a<br />> o :: a -> [a] -> a<br /><br /></pre><br /><tt>o</tt> is the composition operation. If f has degree n then we can apply it to a list of n more objects. So we only expect to evaluate <tt>f `o` fs</tt> successfully if <tt>degree f == length fs</tt>.<br /><br />There are many identities we'd expect to hold. For example <tt>f `o` [identities,...,identity] == f</tt>, because adding plain sections of pipe has no effect. We also expect some associativity conditions coming from the fact that it doesn't matter what order we build a pipe assembly in, it'll still function the same way.<br /><br />We can follow this pipe metaphor quite closely to define what I think of as the prototype Operad. A <tt>Fn a</tt> is a function that takes n inputs of type a and returns one of type a. As we can't easily introspect and find out how many arguments such a function expects, we also store the degree of the function with the function:<br /><br /><pre><br />> data Fn a = F { deg::Int, fn::[a] -> a }<br /><br />> instance Show a => Show (Fn a) where<br />> show (F n _) = "<degree " ++ show n ++ " function>"<br /><br /></pre><br /><tt>unconcat</tt> is a kind of inverse to <tt>concat</tt>. You give a list of integers and it chops up a list into pieces with lengths corresponding to your integers. We use this to unpack the arguments to <tt>f `o` gs</tt> into pieces suitable for the elements of <tt>gs</tt> to consume.<br /><br /><pre><br />> unconcat [] [] = []<br />> unconcat (n:ns) xs = take n xs : unconcat ns (drop n xs)<br /><br />> instance Operad (Fn a) where<br />> degree = deg<br />> f `o` gs = let n = sum (map degree gs)<br />> in F n (fn f . zipWith fn gs . unconcat (map degree gs))<br />> identity = F 1 head<br /><br /></pre><br />Now compute an example, f(f<sub>1</sub>,f<sub>2</sub>,f<sub>3</sub>) applied to [1,2,3]. It should give 1+1+2*3=8.<br /><br /><pre><br />> ex3 = fn (f `o` [f1,f2]) [1,2,3] where<br />> f = F 2 (\[x,y] -> x+y)<br />> f1 = F 1 (\[x] -> x+1)<br />> f2 = F 2 (\[x,y] -> x*y)<br /><br /></pre><br />(That's a lot like lambda calculus without names. Operads are a bit like n-ary combinators.)<br /><br />Now I'm going to introduce a different looking operad. Think of <tt>V</tt> as representing schemes for dicing the real line. Here are some examples:<br /><br /><pre><br />|0.25|0.25| 0.5 |<br /><br />|0.1|0.1| 0.8 |<br /></pre><br /><br />If A divides up the real line into n pieces then you could divide up each of the n pieces using their own schemes. This means that dicing schemes compose. So if we define A, B and C as:<br /><br /><pre><br />A = |0.5|0.5|<br /><br />B = |0.75|0.25|<br />C = |0.1|0.1|0.8|<br /></pre><br /><br />Then A(B,C) is:<br /><br /><pre><br />|0.375|0.125|0.05|0.05|0.4|<br /></pre><br /><br />We could implement <tt>V</tt> as a list of real numbers, but it's more fun to generalise to any monoid and not worry about divisions summing to 1:<br /><br /><pre><br />> data V m = V { unV :: [m] } deriving (Eq,Show)<br /><br /></pre><br />This becomes an operad by allowing the monoid value in a 'parent' scheme multiply the values in a 'child'.<br /><br /><pre><br />> instance Monoid m => Operad (V m) where<br />> degree (V ps) = length ps<br />> (V as) `o` bs = V $ op as (map unV bs) where<br />> op [] [] = []<br />> op (a:as) (b:bs) = map (mappend a) b ++ op as bs<br />> identity = V [mempty]<br /><br /></pre><br />For example, if d<sub>1</sub> cuts the real line in half, and d<sub>2</sub> cuts it into thirds, then d<sub>1</sub>(d<sub>1</sub>,d<sub>2</sub>) will cut it into five pieces of lengths 1/4,1/4,1/6,1/6,1/6:<br /><br /><pre><br />> ex4 = d1 `o` [d1,d2] where<br />> d1 = V [Product (1/2),Product (1/2)]<br />> d2 = V [Product (1/3),Product (1/3),Product (1/3)]<br /><br /></pre><br />If the elements in a <tt>V</tt> are non-negative and sum to 1 we can think of them as probability distributions. The composition A(A<sub>1</sub>,...,A<sub>n</sub>) is the distribution of all possible outcomes you can get by selecting a value i in the range {1..n} using distribution A and then selecting a second<br />value conditionally from distribution A<sub>i</sub>. We connect with the recent <a href="http://golem.ph.utexas.edu/category/2008/10/entropy_diversity_and_cardinal.html">n-category post</a> on entropy.<br /><br />In fact we can compute the entropy of a distrbution as follows:<br /><br /><pre><br />> h (V ps) = - (sum $ map (\(Product x) -> xlogx x) ps) where<br />> xlogx 0 = 0<br />> xlogx x = x*log x/log 2<br /><br /></pre><br />We can now look at the 'aside' in that post. From an element of <tt>V</tt> we can produce a function that computes a corresponding linear combination (at least for <tt>Num</tt> types):<br /><br /><pre><br />> linear (V ps) xs = sum $ zipWith (*) (map (\(Product x) -> x) ps) xs<br /><br /></pre><br />We can now compute the entropy of a distribution in two different ways:<br /><br /><pre><br />> (ex5,ex6) = (h (d1 `o` [d1,d2]),h d1 + linear d1 (map h [d1,d2])) where<br />> d1 = V [Product 0.5,Product 0.5]<br />> d2 = V [Product 0.25,Product 0.75]<br /><br /></pre><br />Now according to <a href="http://arxiv.org/pdf/math/0404016v1">this paper on operads</a> we can build a monad from an operad. Here's the construction:<br /><br /><pre><br />> data MonadWrapper op a = M { shape::op, value::[a] } deriving (Eq,Show)<br /><br /></pre><br />(The field names aren't from the paper but they do give away what's actually going on...)<br /><br />The idea is that an element of this construction consists of an element of the operad of degree n, and an n element list. It's a functor in an obvious way:<br /><br /><pre><br />> instance Functor (MonadWrapper o) where<br />> fmap f (M o xs) = M o (map f xs)<br /><br /></pre><br />It's also a <tt>FunctorM</tt>:<br /><br /><pre><br />> instance FunctorM (MonadWrapper o) where<br />> fmapM f (M s c) = do<br />> c' <- mapM f c<br />> return $ M s c'<br /><br /></pre><br />We can make the construction a monad as follows:<br /><br /><pre><br />> instance Operad o => Monad (MonadWrapper o) where<br />> return x = M identity [x]<br />> p >>= f = join (fmap f p) where<br />> join (M p xs) = M (p `o` map shape xs) (concatMap value xs)<br /><br /></pre><br />Now for something to be a monad there are various laws that needs to be satisfied. These follow from the rules (which I haven't explicitly stated) for an operad. When I first looked at that paper I was confused - it seemed that the operad part and the list part didn't interact with each other. And then I suddenly realised what was happening. But hang on for a moment...<br /><br />Tree shapes make nice operads. The composition rule just grafts child trees into the leaves of the parent:<br /><br /><pre><br />> instance Operad (Tree ()) where<br />> degree t = length (snd (serialise t))<br />> identity = Leaf ()<br />> t `o` ts = let (r,[]) = deserialise t ts in r >>= id<br /><br /></pre><br />We can write that more generically so it works with more than trees:<br /><br />> data OperadWrapper m = O { unO::Shape m }<br /> <br /><pre><br />> instance (FunctorM m,Monad m) => Operad (OperadWrapper m) where<br />> degree (O t) = size t<br />> identity = O (return ())<br />> (O t) `o` ts = let (r,[]) = deserialise t (map unO ts) in O (r >>= id)<br /><br /></pre><br />So let's use the construction above to make a monad. But what actually is this monad? Each element is a pair with (1) a tree shape of degree n and (2) an n-element list. In other words, it's just a serialised tree. We can define these isomorphisms to make that clearer:<br /><br /><pre><br />> iso1 :: FunctorM t => t x -> MonadWrapper (t ()) x<br />> iso1 t = uncurry M (serialise t)<br /><br />> iso2 :: FunctorM t => MonadWrapper (t ()) x -> t x<br />> iso2 (M shape contents) = let (tree,[]) = deserialise shape contents in tree<br /><br /></pre><br />So, for example:<br /><br /><pre><br />> ex7 = iso2 (iso1 tree) where<br />> tree = Tree [Tree [Leaf "Birch",Leaf "Oak"],Leaf "Cypress",Leaf "Binary"]<br /><br /></pre><br />That construction won't work for all monads, just those monads that come from operads. I'll leave you to characterise those.<br /><br />And now we have it: a way to think about operads from a computational perspective. They're the shapes of certain monadic serialisable containers. Operadic composition is the just the same grafting operation used in the <tt>join</tt> operation, using values <tt>()</tt> as the graft points.<br /><br />I have a few moments spare so let's actually do something with an operad. First we need the notion of a free operad. This is basically just a set of 'pipe' parts that we can stick together with no equations holding apart from those inherent in the definition of an operad. This is different from the <tt>V</tt> operad where many different ways of apply the <tt>o</tt> operator can result in the same result. We can use any set of parts, as long as we can associate an integer with each part:<br /><br /><pre><br />> class Graded a where<br />> grade :: a -> Int<br /><br /></pre><br />The free operad structure is just a tree:<br /><br /><pre><br />> data FreeOperad a = I | B a [FreeOperad a] deriving Show<br /><br /></pre><br /><tt>I</tt> will be the identity, but it will also serve as a 'terminator' like the way there's always a <tt>[]</tt> at the end of a list.<br /><br />An easy way to make a single part an element of an operad:<br /><br /><pre><br />> b n = let d = grade n in B n (replicate d I)<br /><br /></pre><br />Here's the instance:<br /><br /><pre><br />> instance Graded a => Operad (FreeOperad a) where<br />> degree I = 1<br />> degree (B _ xs) = sum (map degree xs)<br />> identity = I<br />> I `o` [x] = x<br />> B a bs `o` xs = let arities = map degree bs<br />> in B a $ zipWith o bs (unconcat arities xs)<br /><br /></pre><br />Now I'm going to use this to make an operad and then a monad:<br /><br /><pre><br />> instance Graded [a] where<br />> grade = length<br /><br />> type DecisionTree = MonadWrapper (FreeOperad [Float])<br /><br /></pre><br />What we get is a lot like the <a href="http://www.randomhacks.net/articles/2007/02/21/randomly-sampled-distributions">probability monad</a> except it doesn't give the final probabilities. Instead, it gives the actual tree of possibilities. (I must also point out <a href="http://hpaste.org/1818">this hpaste</a> by wli.)<br /><br /><pre><br />> test = do<br />> a <- M (b [Product 0.5,Product 0.5]) [1,2]<br />> b <- M (b [Product (1/3.0),Product (1/3.0),Product (1/3.0)]) [1,2,3]<br />> return $ a+b<br /><br /></pre><br />Now we can 'flattten' this tree so that the leaves have the final probabilities:<br /><br /><pre><br />> flatten :: Monoid m => FreeOperad [m] -> V m<br />> flatten I = V [mempty]<br />> flatten (B ms fs) = V $ concat $ zipWith (map . mappend) ms (map (unV . flatten) fs)<br /><br /></pre><br />This is a morphism of operads. (You can probably guess the definition of such a thing.) It induces a morphism of monads:<br /><br /><pre><br />> liftOp :: (Operad a,Operad b) => (a -> b) -> MonadWrapper a x -> MonadWrapper b x<br />> liftOp f (M shape values) = M (f shape) values<br /><br /></pre><br /><tt>liftOp flatten test</tt> will give you the final probabilities.<br /><br />There may just be a possible application of this stuff. The point of separating shape from data is performance. You can store all of your data in flat arrays and do most of your work there. It means you can write fast tight loops and only rebuild the original datatype if needed at the end. If you're lucky you can precompute the shape of the result, allowing you to preallocate a suitable chunk of memory for your final answer to go into. What the operad does is allow you to extend this idea to monadic computations, for suitable monads. If the 'shape' of the computation is independent of the details of the computation, you can use an operad to compute that shape, and then compute the contents of the corresponding array separately. If you look at the instance for <tt>MonadWrapper</tt> you'll see that the part of the computation that deals with the data is simply a <tt>concatMap</tt>.<br /><br />BTW In some papers the definition restricts the degree to ≥1. But that's less convenient for computer science applications. If it really bothers you then you can limit yourself to thinking about containers that contain at least one element.<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-5354353393632437147?l=blog.sigfpe.com'/></div>sigfpehttp://www.blogger.com/profile/08096190433222340957noreply@blogger.com10tag:blogger.com,1999:blog-11295132.post-48430042746720945322008-10-18T15:14:00.000-07:002008-10-18T17:04:43.625-07:00What's the use of a transfinite ordinal?This post is going to talk about one player games (which I'll call 1PGs, or just games). They seem pretty stupid at first, but it turns out that they have a lot of rich and applicable theory. The idea is that at any position in a 1PG there is a set of new positions that you can move to. You lose when you have no legal moves and the goal is to avoid losing for as long as you can. For convenience, let's identify your position in a 1PG with the set of positions you could move to. A position in a 1PG is just another 1PG. So we can give the following recursive formal definition of a 1PG:<br /><br />A one player game is either the losing position with no legal moves, denoted by ∅, or a set of one player games.<br /><br />A sequence of legal moves in a 1PG, say x, is then a sequence of the form x ∋x<sub>0</sub> ∋x<sub>1</sub> ∋x<sub>2</sub> ∋....<br /><br />Did you notice that I sneaked in a really big restriction on 1PGs in that definition? My definition means that 1PGs are the same thing as sets. And sets are <a href="http://en.wikipedia.org/wiki/Axiom_of_foundation">well-founded</a> meaning that any sequence of the form x ∋x_0 ∋x<sub>1</sub> ∋x<sub>2</sub> ∋... eventually terminates. So my definition implicitly contains the restriction that you always eventually lose. That's fine for my purposes.<br /><br />Now suppose that x ∋y ∋z is a legal sequence of plays in some 1PG. You might as well assume that z is in x. The reason is that if z were a legal move from x, you wouldn't take it, because in order to delay the inevitable you're going to prefer to go via y. So we'll assume that any game x is "transitively closed", ie. if x∋y and y∋z then x∋z.<br /><br />Now I want to restrict things even further to the most boring kind of 1PG of all, games where the successor moves are all totally ordered. Intuitively what I mean is that you can think of every move in the game as simply moving you along a 'line'. Given any two positions in the game, x and y, either x∋y, or y∈x, or x=y. In these games there are no branches, the only thing you can do in these games is delay the inevitable. A bit like life really. We'll also use the notation a<b to mean a∈b.<br /><br />I'll call these games ordinal 1PGs. You may have noticed that this is precisely the same as the ordinary definition of an ordinal.<br /><br />Let's call the game ∅ by the name 0. 0 is sudden death. Recursively define the game n={0,1,2,3,n-1}. In the game of n you have up to n steps to play before you die (though you could hurry things along if you wanted to).<br /><br />But what about this game: ω={0,1,2,...}. There's nothing mysterious about it. In your first move you choose an integer, and then you get to delay the end for up to n moves. This is our first transfinite ordinal, but it just describes a 1PG that anyone can play and which is guaranteed to terminate after a finite number of moves. We can't say in advance how many moves it will take, and we can't even put an upper bound on the game length, but we do know it's finite.<br /><br />Given two games, a and b, we can define the sum a+b. This is simply the game where you play through b first, and then play through a. Because of the transitivity condition above you could simply abandon b at any stage and then jump into a, but that's not a good thing to do if you're trying to stay alive for as long as possible. The first example of such a 1PG is the game ω+1. You get to make one move, then you choose an n, and then you get to play n.<br /><br />What about 1+ω? Well you get to choose n, play n moves, and then play an extra move. But that's exactly like playing ω and making the move n+1. So 1+ω and ω are the same game. Clearly addition of 1PGs isn't commutative.<br /><br />We can multiply two games as well. Given games a and b, ab is defined like this: we have game a and game b in front of us at the same time, a on the left and b on the right. At any turn we have a choice: make a move in the game on the left, or make a move in the game on the right, resetting the left game back to a again. It's like playing through b, but playing through a copy of a at each turn. Note how for integers a and b the game ab is just ab, where the former is game multiplication, and the latter is ordinary arithmetical multiplication.<br /><br />We can also define exponentiation. We can define a<sup>n</sup>, for finite integer n, in the obvious way as ω·ω·...·ω n times. Before getting onto more general exponentiation I need to classify ordinal games. There are three kinds. There's<br /><br />(1) the zero game 0=∅.<br />(2) the games where there is one 'largest' next move in the game, the one you should take. These are games of the form a+1 and the next position in the game, if you're playing to survive, will be a. For example, the game 7 is 6+1.<br />(3) the games where there is no single best choice of next move. These are games where there is no largest next move, but instead an ever increasing sequence of possible next moves and you have to pick one. The first example is ω where you're forced to pick an n. These are called limit ordinals.<br /><br />We can use this to recursively define a<sup>b</sup> using the classification of b.<br /><br />(1) If b=0 then by definition a<sup>b</sup>=1 and that's that.<br />(2) if b=c+1, then a<sup>b</sup>=a<sup>c</sup>·a.<br />(3) if b is a limit ordinal then a move in a<sup>b</sup> works like this: you pick a move in b, say c, and now you get to play a<sup>c</sup> (or, by transitive closure, any smaller game).<br /><br />For example, you play ω<sup>ω</sup> like this: in your first move you pick an n, and then you play ω<sup>n</sup>. Note how the state of play never becomes infinite, you're always playing with finite rules on a finite 'board'. This is an important point. Even though ω<sup>ω3+2</sup>2+ω+4 seems like it must be a very big kind of thing, it describes a finite process that always terminates.<br /><br />So what use are these things?<br /><br />Suppose you have an algorithm that runs one step at a time, with each step taking a finite time, and where the internal state of the program at step t is s<sub>t</sub>. Suppose that you can find a map f from program states to the ordinals such that f(s<sub>t+1</sub>)<f(s<sub>t</sub>), ie. the program always decreases the ordinal associated to a state. Then the program must eventually terminate.<br /><br />For example, suppose the program iterates 10 times. We can simply assign f(s<sub>t</sub>)=10-t. Suppose instead your algorithm computes some number n at the first step, though you can't figure out what that number is (or maybe it accepts n as an input from a user), then we can use f(0)=ω and so on. We don't know in advance what n is, but that doesn't matter. Whatever n is entered, the program will terminate.<br /><br />This technique is particularly useful for proving termination of rewrite rules. For example, with the rewrite rules generated by the <a href="http://sigfpe.blogspot.com/2007/07/ill-have-buchburger-with-fries.html">Buchberger algorithm</a> we can map the first step to ω<sup>n</sup>, for some n. And if we don't know what n is, we can just start with ω<sup>ω</sup>. If you take a peek at the <a href="http://www.cs.tau.ac.il/%7Enachumd/pub.html">papers of Nachum Dershowitz</a> you'll see how he applies this technique to many other types of rewrite rule.<br /><br />Transfinite ordinals are not as esoteric as they may first appear to be. Using them to prove termination goes back to Turing and <a href="http://en.wikipedia.org/wiki/Robert_Floyd">Floyd</a> but I learnt about the method from reading Dershowitz. (I think Floyd suggested using the method to prove termination of the standard rewrite rules for computing derivatives of expressions. It's tricky because differentiating a large product, say, can result in many new terms, so there's a race between the derivative 'knocking down' powers, and products causing growth.)<br /><br />Probably the best example of using ordinals to prove the termination of a process is in the <a href="http://math.andrej.com/2008/02/02/the-hydra-game/">Hydra game</a>. What's amazing about this game is that you need quite powerful axioms of arithmetic to prove termination.<br /><br />One more thing. All of this theory generalises to <a href="http://en.wikipedia.org/wiki/Surreal_number">two player games</a>.<br /><br />Update: I think <a href="http://blog.plover.com/">mjd</a> may have been about to write on the same thing. On the other hand, though he mentions program termination, he says of it "But none of that is what I was planning to discuss". So maybe what I've written can be seen as complementary.<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-4843004274672094532?l=blog.sigfpe.com'/></div>sigfpehttp://www.blogger.com/profile/08096190433222340957noreply@blogger.com3tag:blogger.com,1999:blog-11295132.post-71401082122703551112008-10-12T09:15:00.000-07:002008-10-12T10:12:01.060-07:00Untangling with Continued Fractions: part 5<pre><br />> {-# LANGUAGE MultiParamTypeClasses,FunctionalDependencies,FlexibleInstances,GeneralizedNewtypeDeriving #-}<br /><br />> module Main where<br /><br />> import qualified Data.Map as M<br />> import Control.Monad<br />> import Prelude<br />> import Ratio hiding (numerator,denominator)<br /><br />> infixl 5 .+<br />> infixl 6 .*<br /><br /></pre><br />So far I've described a way to use monad notation to represent knots, <a href="http://en.wikipedia.org/wiki/Link_%28knot_theory%29">links</a> and tangles, and I've provided a way to implement a monad so that the notation gives rise to the bracket polynomial when applied to a knot. What happens when we do the same with a tangle? <br /><br />A tangle corresponds to something with this type:<br /><br /><pre><br />> type Tangle = (Bool,Bool) -> KnotMonad (Bool,Bool)<br /><br /></pre><br />Here are two examples of tangles, ie. expressions with this type:<br /><br /><pre><br />> infinity (i,j) = return (i,j)<br />> zero (i,j) = do<br />> cup (i,j)<br />> cap ()<br /><br /></pre><br />These correspond to these tangles:<br /><br />And the names of those expressions are the corresponding <a href="http://sigfpe.blogspot.com/2008/08/untangling-with-continued-fractions_23.html">extended rational</a> values.<br /><br />Now here's a surprising but <a href="http://www.math.uic.edu/~kauffman/RTang.pdf">nontrivial fact</a> that holds when T is a rational tangle: if we set A<sup>2</sup>=i (where i<sup>2</sup>=-1) then the value of the monadic expression corresponding to the tangle is of the form a<[∞]>+b<[0]> where the [r] are the rational tangles above, and <T> is the bracket polynomial (evaluated at A<sup>2</sup>=i). The rational associated to the original tangle is then given by -ia/b. I believe this discovery is, like many other things I've talked about in this blog, due to <a href="http://en.wikipedia.org/wiki/John_Horton_Conway">Conway</a>.<br /><br />We now have the makings of a complete algorithm. Using the polynomial monad defined in the previous installment we evaluate our tangle. We then compute the coefficients a and b, which are polynomials in A. Now compute a and b at A<sup>2</sup>=-i, and then compute -ia/b and use the operations z→z+1, z→z-1 and z→-1/z to reduce this rational to zero, interpreting these operations as twist, antitwist and rotation respectively (as I described <a href="http://sigfpe.blogspot.com/2008/08/untangling-with-continued-fractions_23.html">here</a>).<br /><br />There's a shortcut I'm going to take but I don't have a proof it's guaranteed to work so it's risky. Instead of computing polynomials in A, doing polynomial division, and then evaluating at A<sup>2</sup>=-i, I'm going to do everything from the first step with the assumption that A<sup>2</sup>=-1. (The catch is that if a and b are polynomials in A, then a and b may have a common divisor that cancels when computing a/b. If we substitute A<sup>2</sup>=-1 from the beginning we get division of zero by zero. I don't know for sure if this can't happen. It's not a big deal if it does though, I just need to write a polynomial division routine.)<br /><br />Now if A<sup>2</sup>=-1 then we could choose A=(1/2)<sup>1/2</sup>+(1/2)<sup>1/2</sup>i. That suggests using real arithmetic but we need to extract results that are rational numbers. So instead introduce the field K=Q((1/2)<sup>1/2</sup>), ie. the field of numbers of the form u+v(1/2)<sup>1/2</sup>. We can implement this as:<br /><br /><pre><br />> data K = K { re2::Rational, im2::Rational } deriving (Eq,Show)<br /><br />> instance Num K where<br />> K a b + K a' b' = K (a+a') (b+b')<br />> K a b * K a' b' = K (a*a'+2*b*b') (a*b'+a'*b)<br />> negate (K a b) = K (negate a) (negate b)<br />> abs _ = error ""<br />> signum _ = error ""<br />> fromInteger i = K (fromInteger i) 0<br /><br />> instance Fractional K where<br />> recip (K a b) = let r = recip (a*a-2*b) in K (r*a) (-r*b)<br />> fromRational x = K x 0<br /><br /></pre><br />We actually want to work with the field L=K(i), ie. complex numbers built from elements of K. We can implement this as:<br /><br /><pre><br />> type L = Complex K<br /><br /></pre><br />(Notice how it's not at all hard to build up algebraic structures like this in Haskell without a symbolic algebra package. For more powerful code in this vein, see <a href="http://www.polyomino.f2s.com/david/haskell/hs/QuadraticField.hs">David Amos's code</a>).<br /><br />Now we have our field, our monad is that of vector spaces over L:<br /><br /><pre><br />> type KnotMonad = V L<br /><br /></pre><br />Let's define some useful constants. The square root of 1/2, A, B=1/A and i:<br /><br /><pre><br />> hr2 = K 0 (1%2)<br />> a = hr2 :+ hr2<br />> b = hr2 :+ (-hr2)<br />> i = 0 :+ 1<br /><br /></pre><br />Here's the knot monad code I've presented before. (This version uses -|()|- in a way that may appear a bit useless. I'm just making explicit when we're working with a 1-dimensional vector space generated by -|()|-, which is much the same thing as the base field.)<br /><br /><pre><br />> cup :: (Bool,Bool) -> KnotMonad ()<br />> cup (u,v) = case (u,v) of<br />> (False,True) -> (-i * b) .* return ()<br />> (True,False) -> (i * a) .* return ()<br />> otherwise -> vzero<br /><br />> cap :: () -> KnotMonad (Bool,Bool)<br />> cap () = (-i * b) .* return (False,True) .+ (i * a) .* return (True,False)<br /><br />> over :: Tangle<br />> over (u,v) = a .* do<br />> () <- cup (u,v)<br />> cap ()<br />> .+<br />> b .* return (u,v)<br /><br />> under :: Tangle<br />> under (u,v) = b .* do<br />> () <- cup (u,v)<br />> cap ()<br />> .+<br />> a .* return (u,v)<br /><br /></pre><br />Here's where we work out the coefficients a and b. We treat a tangle as the function that it is and look at what happens when we apply it to two different values in <tt>(Bool,Bool)</tt>. Applying to <tt>value'</tt> to <tt>infinity</tt> and <tt>zero</tt> shows that they do in fact compute a and b correctly:<br /><br /><pre><br />> value' p =<br />> let alpha = coefficient (False,False) $ p (False,False)<br />> beta = coefficient (True,False) $ p (False,True)<br />> in (alpha,beta)<br /><br /></pre><br />From a and b we extract -ia/b as a rational:<br /><br /><pre><br />> value :: Tangle -> Rational<br />> value p =<br />> let (a,b) = value' p<br />> in re2 $ realPart $ -i*a/b<br /><br /></pre><br />And now we reduce to 0. This isn't the best algorithm for reducing rationals to zero. But it's what I wrote in my first attempt and I'm sticking with it for consistency with part one:<br /><br /><pre><br />> steps :: Rational -> [String]<br />> steps 0 = []<br />> steps q | q<= -1 = "twist":steps (q+1)<br />> | -1<q && q<1 = "rotate":steps (-1/q)<br />> | q>=1 = "antitwist":steps (q-1)<br /><br />> untangle t = steps (value t)<br /><br /></pre><br />Here's the example from the <a href="http://sigfpe.blogspot.com/2008/08/untangling-with-continued-fractions.html">first installment</a>:<br /><br /><pre><br />> example :: Tangle<br />> example (a,b) = do<br />> (c,d) <- over (a,b)<br />> (e,f) <- cap ()<br />> (g,h) <- over (c,e)<br />> (i,j) <- over (f,d)<br />> (m,n) <- cap ()<br />> (k,l) <- cap ()<br />> (q,r) <- over (h,k)<br />> (s,y) <- over (l,i)<br />> (o,p) <- over (n,g)<br />> (t,u) <- under (p,q)<br />> (v,w) <- under (r,s)<br />> (x,z) <- over (y,j)<br />> cup (o,t)<br />> cup (u,v)<br />> cup (w,x)<br />> return (m,z)<br /><br /></pre><br />I'll leave you to type <tt>untangle example</tt> in ghci and refer back to my original <a href="http://sigfpe.blogspot.com/2008/08/untangling-with-continued-fractions_16.html">video</a>.<br /><br />To find out more, and to read about the details I've left out, try the many papers by Kauffman, including <a href="http://www.math.uic.edu/~kauffman/RTang.pdf">this one</a>.<br /><HR><br />That was hard! I didn't anticipate how much stuff I'd have to introduce to just sketch out how this works. It's one of those cases where the comments need to be about 30 or 40 times larger than the code and done properly, there's enough material to fill a good few chapters of a book. So I apologise if I skimped on some details along the way, and left out some needed explanations. But I hope that I've at least motivated you to take a peek at knot theory and appreciate how nicely tangles and monads play together.<br /><br />And remember this is only code to illustrate some mathematics - it's inefficient and not how I'd really untangle big rational tangles! (For example, just using the monad associativity laws you can often <a href="http://sigfpe.blogspot.com/2007/06/how-to-write-tolerably-efficient.html">rearrange the monad expressions</a> for knots into equivalent forms that execute much quicker.)<br /><HR><br />The rest is similar to code I've used in previous installments:<br /><br /><pre><br />> swap (x,y) = (y,x)<br /><br />> class Num k => VectorSpace k v | v -> k where<br />> vzero :: v<br />> (.+) :: v -> v -> v<br />> (.*) :: k -> v -> v<br />> (.-) :: v -> v -> v<br />> v1 .- v2 = v1 .+ ((-1).*v2)<br /><br />> data V k a = V { unV :: [(k,a)] } deriving (Show)<br /><br />> reduce x = filter ((/=0) . fst) $ fmap swap $ M.toList $ M.fromListWith (+) $ fmap swap $ x<br /><br />> instance Num k => Functor (V k) where<br />> fmap f (V as) = V $ map (\(k,a) -> (k,f a)) as<br /><br />> instance Num k => Monad (V k) where<br />> return a = V [(1,a)]<br />> x >>= f = join (fmap f x)<br />> where join x = V $ concat $ fmap (uncurry scale) $ unV $ fmap unV x<br />> scale k1 as = map (\(k2,a) -> (k1*k2,a)) as<br /><br />> instance Num r => MonadPlus (V r) where<br />> mzero = V []<br />> mplus (V x) (V y) = V (x++y)<br /><br />> instance (Num k,Ord a) => VectorSpace k (V k a) where<br />> vzero = V []<br />> V x .+ V y = V (x ++ y)<br />> (.*) k = (>>= (\a -> V [(k,a)]))<br /><br />> coefficient b (V bs) = maybe 0 id (lookup b (map swap (reduce bs)))<br /><br />> data Complex a = (:+) { realPart :: a, imagPart :: a } deriving (Eq,Show)<br /><br />> instance Num a => Num (Complex a) where<br />> (a :+ b) + (a' :+ b') = (a+a') :+ (b+b')<br />> (a :+ b) * (a' :+ b') = (a*a'-b*b') :+ (a*b'+a'*b)<br />> negate (a :+ b) = (-a) :+ (-b)<br />> fromInteger n = fromInteger n :+ 0<br />> abs (a :+ b) = undefined<br />> signum (a :+ b) = undefined<br /><br />> instance Fractional a => Fractional (Complex a) where<br />> recip (a :+ b) = let r = recip (a*a+b*b) in ((a*r) :+ (-b*r))<br />> fromRational q = fromRational q :+ 0<br /></pre><div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-7140108212270355111?l=blog.sigfpe.com'/></div>sigfpehttp://www.blogger.com/profile/08096190433222340957noreply@blogger.com0tag:blogger.com,1999:blog-11295132.post-40062985667048332422008-09-26T08:36:00.000-07:002008-09-26T08:44:06.040-07:00On writing Python one-liners.<pre><br /># I'm off to the Oregon Shakespeare Festival for the weekend so no<br /># time to finish my Untangling series. But I do have time to post this...<br /><br /># The other day someone asked me if it was possible to implement a<br /># certain task in one line of Python. My response was "yes, but you<br /># wouldn't want to." You can think of what follows as the demonstration<br /># that you really don't want to.<br /><br /># The problem is, there are lots of Python constructions you can't<br /># do on one line: assigning variables, defining functions, using<br /># conditionals, recursion and so on. So it appears there's a limit<br /># to what you can do. Amazingly, all of these problems can be solved<br /># with the use of lambdas.<br /><br /># One place you often want one line lambdas is in the use of the map<br /># function. So let's stick with that as an example and warm up with<br /># an easy example. Lets say we want to add 1 to all of the numbers<br /># from 0 to 19. We can write:<br /><br />def inc(n):<br /> return n+1<br /><br />print "1.",map(inc,range(0,20))<br /><br /># Lambdas enable you to write that as a one-liner:<br /><br />print "1.",map(lambda x:x+1,range(0,20))<br /><br /># Now what happens if we want to apply a function that uses a local<br /># (constant) variable. For example:<br /><br />def f1(n):<br /> m = 2*n+1<br /> return m+m*m<br /><br />print "2.",map(f1,range(0,20))<br /><br /># We can eliminate the assigment by means of a function. m changes from<br /># being a local to a function argument:<br /><br />def f2(n):<br /> def g(m):<br /> return m+m*m<br /> return g(2*n+1)<br /><br />print "2.",map(f2,range(0,20))<br /><br /># But that seems to make things worse because now we have a function<br /># definition instead of a variable assigment. But we can eliminate<br /># that with a lambda:<br /><br />def f3(n):<br /> return (lambda m:m+m*m)(2*n+1)<br /><br />print "2.",map(f3,range(0,20))<br /><br /># And now we can write the whole thing as one big lambda:<br /><br />print "2.",map(lambda n:(lambda m:m+m*m)(2*n+1),range(0,20))<br /><br /># But what happens if we want to define a bunch of local functions<br /># inside our lambda? We can use the same technique. We (1) use<br /># lambdas to allow us to fake local variable definitions and<br /># (2) use lambdas to define the functions. So we can replace this:<br /><br />def f(n):<br /> def g(n):<br /> return 2*n+1<br /><br /> def h(n):<br /> return 3*n-2<br /><br /> return g(h(g(h(n))))<br /><br />print "3.",map(f,range(0,20))<br /><br /># with this:<br /><br />print "3.",map(lambda n:(lambda g,h,n:g(h(g(h(n)))))(lambda n:2*n+1,lambda n:3*n-2,n),range(0,20))<br /><br /># We're passing in lambdas as arguments to a lambda so that they<br /># get bound to g and h.<br /><br /># But what happens if we want a recursive definition like:<br /><br />def fact(n):<br /> if n<=1:<br /> return 1<br /> else:<br /> return n*fact(n-1)<br /><br />print "4.",map(fact,range(0,20))<br /><br /># Because the function is bound to the name fact, we can refer to<br /># fact by name inside the defintion of fact. How can we do this in<br /># a one-line lambda? We can start by eliminating the self-reference<br /># in a fairly standard way. First rewrite fact so it doesn't call<br /># itself, but calls a function passed in as an argument:<br /><br />def urfact(f,n):<br /> if n<=1:<br /> return 1<br /> else:<br /> return n*f(f,n-1)<br /><br /># We can then make a factorial function by calling urfact with itself,<br /># allowing it to carry out the recursion:<br /><br />def fact1(n):<br /> return urfact(urfact,n)<br /><br />print "4.",map(fact1,range(0,20))<br /><br /># We can use urfact directly without fact1:<br /><br />print "4.",map(lambda n:urfact(urfact,n),range(0,20))<br /><br /># Now all we need is a one-liner definition of urfact. It's tempting to try:<br /><br />def urfact1(f,n):<br /> return {True:1,False:n*f(f,n-1)}[n<=1]<br /><br /># But that results in non-termination because it evaluates *both* branches<br /># of the conditional. In a lazy language like Haskell, the branch not taken<br /># wouldn't be evaluated. But Python can be made lazy by using...you<br /># guessed it...lambda. The expression lambda:x is like a lazy version<br /># of the expression x. It doesn't get evaluated until you apply it as a<br /># function of no arguments, at which time it takes on the value x. So we<br /># can rewrite urfact as:<br /><br />def urfact2(f,n):<br /> return {True:lambda:1,False:lambda:n*f(f,n-1)}[n<=1]()<br /><br /># And we get<br /><br />print "4.",map(lambda n:urfact2(urfact2,n),range(0,20))<br /><br /># Note how we use urfact2 twice, so we need the trick above for local<br /># definitions to get it down to one reference:<br /><br />print "4.",map(lambda n:(lambda f:f(f,n))(urfact2),range(0,20))<br /><br /># And now we can replace urfact2 with a lambda<br /><br />print "4.",map(lambda n:(lambda f:f(f,n))(lambda f,n:{True:lambda:1,False:lambda:n*f(f,n-1)}[n<=1]()),range(0,20))<br /><br /># And that's enough for now. I expect that if we keep going we'll find <br /># way to rewrite any Python program as a one liner. You could probably<br /># even write a program to automatically 'compile' a usable subset of<br /># Python into one liners.<br /><br /># But you probably don't want to do that.<br /></pre><div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-4006298566704833242?l=blog.sigfpe.com'/></div>sigfpehttp://www.blogger.com/profile/08096190433222340957noreply@blogger.com22tag:blogger.com,1999:blog-11295132.post-85867493709765829882008-09-20T16:07:00.000-07:002008-09-20T16:47:59.392-07:00Untangling with Continued Fractions: Part 4<pre><br />> {-# LANGUAGE MultiParamTypeClasses,FunctionalDependencies,FlexibleInstances,GeneralizedNewtypeDeriving #-}<br /><br />> module Main where<br /><br />> import Test.QuickCheck<br />> import qualified Data.Map as M<br />> import Control.Monad<br />> import Ratio<br /><br />> infixl 5 .+<br />> infixl 6 .*<br /><br /></pre><br />As I've mentioned <a href="http://sigfpe.blogspot.com/2008/08/untangling-with-continued-fractions_31.html">previously</a>, we need to choose values for our pieces of knot or tangle so that when they're combined together we get something that tells us about the knot or tangle, not the diagram. For example, suppose our diagram contains this piece:<br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_UdKHLrHa05M/SNWC4xN9-9I/AAAAAAAAAM8/eZI4SAfP54Y/s1600-h/cupcap.jpg"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://4.bp.blogspot.com/_UdKHLrHa05M/SNWC4xN9-9I/AAAAAAAAAM8/eZI4SAfP54Y/s400/cupcap.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5248244852566391762" /></a><br />That piece is isotopic to:<br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_UdKHLrHa05M/SNWC42ZOFtI/AAAAAAAAANE/EnypLH3XZNg/s1600-h/straight.jpg"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://1.bp.blogspot.com/_UdKHLrHa05M/SNWC42ZOFtI/AAAAAAAAANE/EnypLH3XZNg/s400/straight.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5248244853955761874" /></a><br />We want any kind of evaluation of our diagram to be independent of which one of these we've chosen. In other words, among other things we need this function:<br /><br /><pre><br />> cupcap i = do<br />> (j,k) <- cap<br />> cup (i,j)<br />> return k<br /><br /></pre><br />to equal this function:<br /><br /><pre><br />> straight i = do<br />> return i<br /><br /></pre><br />I've already said how in the vector space monad, these correspond to summations, so really we want these two expressions to be equal (using the summation convention):<br /><blockquote><br />M<sup>ij</sup>N<sub>jk</sub>=δ<sup>i</sup><sub>k</sub><br /></blockquote><br />(δ<sup>i</sup><sub>k</sub> is 1 if i=k and zero otherwise, ie. the identity matrix.)<br />The left hand side is just the rule for multiplying matrices. So we require M to be the inverse of N.<br /><br />Now we can start on some real knot theory. There is a collection of 'moves' we can perform on a diagram called <a href="http://en.wikipedia.org/wiki/Reidemeister_move">Reidemeister moves</a>. Here are diagrams illustrating them:<br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_UdKHLrHa05M/SNWDVvAWO-I/AAAAAAAAANM/xudKl6R9dXQ/s1600-h/type1.jpg"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://4.bp.blogspot.com/_UdKHLrHa05M/SNWDVvAWO-I/AAAAAAAAANM/xudKl6R9dXQ/s400/type1.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5248245350188596194" /></a><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_UdKHLrHa05M/SNWDVx0hIcI/AAAAAAAAANU/Ormc9NQOWGc/s1600-h/type2.jpg"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://4.bp.blogspot.com/_UdKHLrHa05M/SNWDVx0hIcI/AAAAAAAAANU/Ormc9NQOWGc/s400/type2.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5248245350944285122" /></a><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_UdKHLrHa05M/SNWDV7lXOAI/AAAAAAAAANc/QTp3_f7pBXI/s1600-h/type3.jpg"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://2.bp.blogspot.com/_UdKHLrHa05M/SNWDV7lXOAI/AAAAAAAAANc/QTp3_f7pBXI/s400/type3.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5248245353565075458" /></a><br /><br />There are three types of move that we call type I, type II and type III.<br /><br />These diagrams are intended to show changes we can make to a part of a bigger knot or tangle. It's not hard to see that these are isotopies, they're just simple operations we can perform on a knot or tangle by moving the string a little bit. If you perform one of these moves on a knot or tangle then you should end up with merely a different diagram representing the same knot or tangle. Each of these corresponds to an equality of monadic expression, or an equality of summations. But the really important part is that these are all you need to consider. Every isotopy can be constructed out of these moves. So if we can satisfy the corresponding equalities of expressions then we automatically have a knot invariant that tells us about the underlying knot or tangle, not just the diagram.<br /><br />Unfortunately, there's a catch. Satisfying the equality for a type I move is too onerous. So we'll just concentrate on type II and III. But the good news is that it doesn't matter, there are workarounds.<br /><br />Let a be an unknown complex number. We're going to build expressions from a but they're all going to be polynomials in a and its reciprocal, like a+a^2-3/a. These are known as Laurent polynomials and I pointed out <a href="http://sigfpe.blogspot.com/2008/08/hopf-algebra-group-monad.html">recently</a> that I can use the <tt>Group</tt> type class to represent them.<br /><br /><pre><br />> i = 0 :+ 1<br />> newtype Z = Z Int deriving (Eq,Ord,Show,Num)<br />> type Poly = V (Complex Rational) Z<br /><br /></pre><br />In other words, elements of <tt>Poly</tt> are weighted linear combinations of integers. We can think of each integer n as representing a<sup>n</sup> and the weights being coefficients of these powers. Amazingly the group multiplication multiplies these things out correctly.<br /><br /><tt>a</tt> and <tt>ia</tt> represent a and a<sup>-1</sup>.<br /><br /><pre><br />> a = return (Z 1) :: Poly<br />> ia = return (Z (-1)) :: Poly<br /><br /></pre><br />We can now satisfy the type II and III Reidemeister moves with these definitions:<br /><br /><pre><br />> cup :: (Bool,Bool) -> V Poly ()<br />> cup (u,v) = case (u,v) of<br />> (False,True) -> (-i .* ia) .* return ()<br />> (True,False) -> (i .* a) .* return ()<br />> otherwise -> zero<br /><br />> cap :: V Poly (Bool,Bool)<br />> cap = (-i .* ia) .* return (False,True) .+ (i .* a) .* return (True,False)<br /><br />> over :: (Bool,Bool) -> V Poly (Bool,Bool)<br />> over (u,v) = a .* do<br />> () <- cup (u,v)<br />> cap<br />> .+<br />> ia .* return (u,v)<br /><br />> under :: (Bool,Bool) -> V Poly (Bool,Bool)<br />> under (u,v) = ia .* do<br />> () <- cup (u,v)<br />> cap<br />> .+<br />> a .* return (u,v)<br /><br /></pre><br />(I'm making the <tt>V</tt> monad do double duty here. I'm using it to represent the Laurent polynomials, but I'm also using it to represent vectors over the Laurent polynomials.)<br /><br />For example, here's what a type III move looks like if we try to arrange things nicely in terms of our standard 'components':<br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_UdKHLrHa05M/SNWDWBUtE9I/AAAAAAAAANk/EpWSfTpXmmc/s1600-h/type3a.jpg"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://1.bp.blogspot.com/_UdKHLrHa05M/SNWDWBUtE9I/AAAAAAAAANk/EpWSfTpXmmc/s400/type3a.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5248245355105817554" /></a><br />That gives rise to the two functions<br /><br /><pre><br />> left (i,j,k) = do<br />> (l,m) <- under (i,j)<br />> (n,o) <- over (m,k)<br />> (p,q) <- over (l,n)<br />> return (p,q,o)<br /><br /></pre><br />and<br /><br /><pre><br />> right (i,j,k) = do<br />> (l,m) <- over (j,k)<br />> (n,o) <- over (i,l)<br />> (p,q) <- under (o,m)<br />> return (n,p,q)<br /><br /></pre><br />whose equality can be tested with<br /><br /><pre><br />> test1 = quickCheck $ \(x,y,z) -> left (x,y,z)==right (x,y,z)<br /><br /></pre><br />I'll leave checking the type II rule as an exercise.<br /><br />With these defintions we can take any knot diagram and compute a value for it. Note that because a not has no loose ends we have no inputs and no outputs so the result won't be a function but a value, a polynomial in a and its reciprocal. This is known as the <a href="http://en.wikipedia.org/wiki/Bracket_polynomial">Bracket Polynomial</a>.<br /><br />As mentioned above, the bracket polynomial fails to be a knot invariant. Two diagrams related by type I moves will have different bracket polynomials. With a little bit of sleight of hand we can apply a fudge factor to l with those extra loops and we end up with the <a href="http://en.wikipedia.org/wiki/Jones_polynomial">Jones polynomial</a>. As that's not my main goal here, I'll let you read how to use the <a href="http://en.wikipedia.org/wiki/Writhe">writhe</a> get from the bracket polynomial to the Jones polynomial at Wikipedia. But do I want to make another digression.<br /><br />It's tricky to define knot invariants. The problems is that knots are topologically invariant in the sense that you can deform a knot in lots of ways and it's still the same knot. This rules out most attempts at using the geometry of a knot to define an invariant. Just about any geometrical feature you find in a particular configuration will be ruined by deforming the knot. But here's an idea: suppose you could compute some geometrical property from a particular curve representing a knot and then find the average of that value for all possible curves representing the same knot. Then we'd get a knot invariant as it wouldn't depend on a particular choice of representative curve. But there's a catch. We need to average over an infinite space, the space of all curves representing a knot. On the other hand, there is a perfectly good tool for averaging over infinite sets: integration. But there's another catch: the space of all curves representing a knot is very large, infinite dimensional in fact, and we don't have very good <a href="http://en.wikipedia.org/wiki/Functional_integration">theory</a> for how to do this. At least mathematicians don't. But physicists integrate over spaces of paths all the time, ever since Feynman came up with the <a href="http://en.wikipedia.org/wiki/Feynman_path_integral">path integral</a>. Amazingly, <a href="http://en.wikipedia.org/wiki/Edward_Witten">Ed Witten</a> showed that tere was already a physical model that fit the bill: <a href="http://en.wikipedia.org/wiki/Chern-Simons_theory">Chern-Simons theory</a>. In fact, Witten showed that the Jones polynomial, which as I've already mentioned came originally out of statistical mechanics, could be derived from this theory.<br /><br />Anyway, enough of that digression. In the next installment I'll show how the above can be modified for use with tangles and give my final algorithm.<br /><br /><br />In the last installment (I think) I'll show how we can derive the rational number for a tangle using the bracket polynomial.<br /><HR><br />The following is the 'library' behind the code above. Most of it is simply built up from what I've said in recent weeks.<br /><br /><pre><br />> swap (x,y) = (y,x)<br /><br />> class Num k => VectorSpace k v | v -> k where<br />> zero :: v<br />> (.+) :: v -> v -> v<br />> (.*) :: k -> v -> v<br />> (.-) :: v -> v -> v<br />> v1 .- v2 = v1 .+ ((-1).*v2)<br /><br />> data V k a = V { unV :: [(k,a)] } deriving (Show)<br /><br />> reduce x = filter ((/=0) . fst) $ fmap swap $ M.toList $ M.fromListWith (+) $ fmap swap $ x<br /><br />> instance (Ord a,Num k) => Eq (V k a) where<br />> V x==V y = reduce x==reduce y<br /><br />> instance (Ord a,Num k,Ord k) => Ord (V k a) where<br />> compare (V x) (V y) = compare (reduce x) (reduce y)<br /><br />> instance Num k => Functor (V k) where<br />> fmap f (V as) = V $ map (\(k,a) -> (k,f a)) as<br /><br />> instance Num k => Monad (V k) where<br />> return a = V [(1,a)]<br />> x >>= f = join (fmap f x)<br />> where join x = V $ concat $ fmap (uncurry scale) $ unV $ fmap unV x<br />> scale k1 as = map (\(k2,a) -> (k1*k2,a)) as<br /><br />> instance Num r => MonadPlus (V r) where<br />> mzero = V []<br />> mplus (V x) (V y) = V (x++y)<br /><br />> instance (Num k,Ord a) => VectorSpace k (V k a) where<br />> zero = V []<br />> V x .+ V y = V (x ++ y)<br />> (.*) k = (>>= (\a -> V [(k,a)]))<br /><br />> e = return :: Num k => a -> V k a<br />> coefficient b (V bs) = maybe 0 id (lookup b (map swap (reduce bs)))<br /><br />> diag a = (a,a)<br />> both f g (a,b) = (f a,g b)<br /><br />> class Group m a where<br />> unit :: () -> m a<br />> counit :: a -> m ()<br />> mult :: (a,a) -> m a<br />> comult :: a -> m (a,a)<br />> anti :: a -> m a<br /><br />> instance Monad m => Group m Z where<br />> unit _ = return (Z 0)<br />> counit _ = return ()<br />> mult = return . uncurry (+)<br />> comult = return . diag<br />> anti = return . negate<br /><br />> newtype Identity a = I a deriving (Eq,Ord,Show)<br /><br />> instance Monad Identity where<br />> return x = I x<br />> I x >>= f = f x<br /><br />> data K = K Rational Rational deriving (Eq,Show)<br /><br />> instance Num K where<br />> K a b + K a' b' = K (a+a') (b+b')<br />> K a b * K a' b' = K (a*a'+2*b*b') (a*b'+a'*b)<br />> negate (K a b) = K (negate a) (negate b)<br />> abs _ = error ""<br />> signum _ = error ""<br />> fromInteger i = K (fromInteger i) 0<br /><br />> instance Num k => Num (V k Z) where<br />> a + b = a .+ b<br />> a * b = do<br />> u <- a<br />> v <- b<br />> mult (u,v)<br />> negate a = (-1) .* a<br />> fromInteger n = fromInteger n .* return 0<br />> abs a = error ""<br />> signum a = error ""<br /><br />> data Complex a = (:+) { realPart :: a, imagPart :: a } deriving (Eq,Show)<br /><br />> instance Num a => Num (Complex a) where<br />> (a :+ b) + (a' :+ b') = (a+a') :+ (b+b')<br />> (a :+ b) * (a' :+ b') = (a*a'-b*b') :+ (a*b'+a'*b)<br />> negate (a :+ b) = (-a) :+ (-b)<br />> fromInteger n = fromInteger n :+ 0<br />> abs (a :+ b) = undefined<br />> signum (a :+ b) = undefined<br /><br />> instance Fractional a => Fractional (Complex a) where<br />> recip (a :+ b) = let r = recip (a*a+b*b) in ((a*r) :+ (-b*r))<br />> fromRational q = fromRational q :+ 0<br /><br />> instance Ord (Complex Float) where<br />> compare (a :+ b) (c :+ d) = compare (a,b) (c,d)<br /></pre><div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-8586749370976582988?l=blog.sigfpe.com'/></div>sigfpehttp://www.blogger.com/profile/08096190433222340957noreply@blogger.com2tag:blogger.com,1999:blog-11295132.post-6096561416479068972008-09-16T09:39:00.000-07:002008-09-16T21:53:35.381-07:00Two Papers and a PresentationI've been slow to write the last couple of parts on my untangling series as I've been pretty busy lately. Among other things I've finally got around to submitting a couple of graphics related papers (that I actually wrote ages ago) for publication. Not sure they'll get accepted but there's no harm trying. I'll probably write blog posts about these in the near future now that I have permission from my employer to talk about them.<br /><br />Note: Neither of these are anything I actually use at work. Well, I've used the adjoint one a tiny bit. The latter is in the spirit of computational geometry papers: it demonstrates an algorithm that runs (asymptotically) in the fastest possible time, but in practice you'll probably want to use an approximation that looks just as good. I was amazed I was able to find a practical application for a result that came out of <a href="http://en.wikipedia.org/wiki/Toric_variety">toric varieties</a>, but it's going to be tough on the reviewers as there seem to be so few people in the graphics world who know about <a href="http://en.wikipedia.org/wiki/Transfer_function">multidimensional DSP techniques</a>.<br /><br /><a href="http://www.sigfpe.com/adjoint.pdf">Two Tricks for the<br />Price of One: Linear Filters and their Adjoints</a><br /><br /><a href="http://www.sigfpe.com/hexagon.pdf">Fast and Exact<br />Convolution with Polygonal Filters</a><br /><br />Of course, straight after submitting I noticed a bunch of pieces missing from some expressions. In places designed to help disambiguate things for the reviewers as well. Ho hum.<br /><br />I also recently presented a very short presentation at the "Stupid Rat Tricks" part of the Renderman User Group: <a href="http://www.sigfpe.com/SRTX2008.pdf">quine.sl: a self-shading shader</a>. (For more on RSL see <a href="http://en.wikipedia.org/wiki/RenderMan_Shading_Language">Wikipedia</a>.) The source is available <a href="http://www.sigfpe.com/quine.sl">here</a>. I don't think the company lawyers fully appreciated that this shader now allows their handiwork to be used like a texture map.<br /><br />My presentation was completely blown away by the guy who implemented a large part of the Renderman interface <a href="http://www.renderman.org/RMR/Examples/srt2008/index.html#stephenson">completely in bourne shell</a>. I can't think of a stupider thing than that! :-)<br /><br />Back to untangling in the next post.<br /><br />Update: I added a <a href="http://www.sigfpe.com/hexagon.nb">Mathematica 5 notebook</a> illustrating the transfer functions in the polygonal filter paper.<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-609656141647906897?l=blog.sigfpe.com'/></div>sigfpehttp://www.blogger.com/profile/08096190433222340957noreply@blogger.com11tag:blogger.com,1999:blog-11295132.post-62189338633297856782008-08-31T15:40:00.001-07:002008-09-01T21:55:48.947-07:00Untangling with Continued Fractions: Part 3It's time to return to the <a href="http://sigfpe.blogspot.com/2008/08/hopf-algebra-group-monad.html">vector space monad</a>. But first, after some Haskell preamble, I need to talk about the Einstein summation convention.<br /><br /><pre><br />> {-# LANGUAGE MultiParamTypeClasses,FlexibleInstances,FunctionalDependencies #-}<br /><br />> module Main where<br /><br />> import qualified Data.Map as M<br />> import Control.Monad<br />> import Test.QuickCheck<br /><br />> infixl 4 .+<br />> infixl 4 .-<br />> infixl 5 .*<br /><br /></pre><br />We're going to be dealing with multidimensional arrays of values that I'll call tensors. We can refer to the individual elements of these arrays by providing tuples of indices drawn from some set of indices. For example we might refer to the elements of a 10 by 10 array by using pairs of indices, each drawn from the set {0,1,2,...,9}. We'll write some indices as subscripts and some as superscripts. Here are some examples: v<sup>i</sup>, w<sub>j</sub>,T<sup>ij</sup>,W<sup>ij</sup><sub>kl</sub>. v<sup>i</sup> means the ith value in array v, T<sup>ij</sup> means the (i,j)-th element of T and so on.<br /><br />We can form sums over such indices like this<br /><blockquote><br />Σ<sub>jk</sub>T<sup>ij</sup>W<sub>jk</sub>v<sup>k</sup><br /></blockquote><br />where the sum is over the corresponding index sets for j and k.<br />The Einstein summation convention is simply this: if an expression has an index repeated once as a subscript and once as a superscript then we implicitly assume a sum over the index without writing the summation symbol. So the above expression becomes simply<br /><blockquote><br />T<sup>ij</sup>W<sub>jk</sub>v<sup>k</sup><br /></blockquote><br />The vector space monad allows us to express something similar using do-notation. Let's use the type <tt>Bool</tt> for our index set, so we're selecting elements of our arrays using the values {False,True} rather than integers. Let e<sub>i</sub> be the (2-dimensional) vector with the property that its j-component is 1 for j=i and 0 for j≠i. In do-notation this is represented by <tt>return i</tt>. We use superscripts to indicate the components of a vector so the components of e<sub>i</sub> are e<sup>j</sup><sub>i</sub>. We can write the vector v=3e<sub>False</sub>-5e<sub>True</sub>a, with components (3,-5) as<br /><br /><pre><br />> v :: V Float Bool<br />> v = 3 .*return False .- 5 .*return True<br /><br /></pre><br />Dual to vectors we have covectors. You can think of these as linear functions that consume vectors and return reals. We can define a basis of covectors f<sup>i</sup> with the property that f<sup>i</sup>(e<sub>j</sub>) is 1 for i=j and 0 otherwise. In our Haskell code it'll be more convenient to think of covectors as mapping vectors to the one-dimensional vector space whose basis element is labelled by <tt>()</tt>. When we make a vector using an expression like <tt>return True</tt>, that <tt>True</tt> is a constructor for the type <tt>Bool</tt>. We expect to use the dual operation for covectors, and that dual is case analysis. So, for example, we can write w = f<sup>False</sup>+2*f<sup>True</sup>, wih components (1,2), as<br /><br /><pre><br />> w :: Bool -> V Float ()<br />> w i = case i of<br />> False -> return ()<br />> True -> 2 .* return ()<br /><br /></pre><br />Applying w to v gives the expression v<sup>i</sup>w<sub>i</sub> using the summation convention. Using do-notation it looks like<br /><br /><pre><br />> vw :: V Float ()<br />> vw = reduce $ do<br />> i <- v<br />> w i<br /><br /></pre><br />The point to note is that just like with the summation convention, we don't need to explicitly sum over i. i serves as a kind of iterator object and a vector v automatically supplies i with values in the appropriate range. w then consumes i. So we can translate the summation convention to do-notation by thinking of upper indices corresponding to places that iterators can emerge from, and thinking of lower indices as places where iterators can be consumed.<br /><br />Another example is matrix representing a transformation of a vector to a vector. It must consume one iterator and emit another one. In the Einstein summation convention such a transformation might be written r<sup>i</sup>=M<sup>i</sup><sub>j</sub>v<sup>j</sup>. We could write a simple 2D 2x scale and 90 degree rotation matrix as<br /><br /><pre><br />> m :: Bool -> V Float Bool<br />> m j = case j of<br />> False -> 2 .*return True<br />> True -> -2 .*return False<br /><br /></pre><br />We can then write the transformation as<br /><br /><pre><br />> r = do<br />> j <- v<br />> i <- m j<br />> return i<br /><br /></pre><br />Another simple example is an inner product. We can represent the ordinary dot product as a matrix d<sub>ij</sub> and write the square length of v as v . v = d<sub>ij</sub>v<sup>i</sup>v<sup>j</sup>. We can implement this as<br /><br /><pre><br />> v2 = do<br />> i <- v<br />> j <- v<br />> d (i,j)<br /><br /></pre><br />where we define<br /><br /><pre><br />> d (i,j) = case (i,j) of<br />> (False,False) -> return ()<br />> (True,True) -> return ()<br />> otherwise -> mzero<br /><br /></pre><br />Similarly, tuples of superscripts correspond to iterators that are tuples of indices.<br /><br />With that in place, we're now in a position to consider what monadic notation for knots means in the vector-space-over <tt>Bool</tt> monad. Essentially what we're doing is representing 'components' of a tangle or knot, the cups, caps and crossings, as tensors. When we <a href="http://sigfpe.blogspot.com/2008/08/untangling-with-continued-fractions_16.html">glue these components with labels</a> we're forming sums. In fact, each labelled edge connecting two components corresponds to an index and so we can think of it as possibly taking two different values, <tt>False</tt> or <tt>True</tt>. So the monadic expression corresponds to a sum over all possible ways to assign values from <tt>Bool</tt> to the indices with a corresponding weight for each term. This should be familiar to some people, it's just like what takes place in statistical thermodynamics. We consider a bunch of particles and form a sum over all of the possible states that those particles could have, with a corresponding weight for each configuration of states. A good example is the <a href="http://en.wikipedia.org/wiki/Ising_model">Ising model</a> in which electrons sit at the vertices of a regular lattice. Two neighbouring electrons in a ferromagnet tend to want to align the same way, so confgurations in which neighbouring electrons have the same spin will get assigned a higher weight.<br /><br />In our tangle problem, the trick is to find a magic set of weights with the property that two different summations representing isotopic knots or tangles give the same result. This turns out to be too hard, but we relax it a bit we will be led to one of the most fruitful new approaches in knot theory in recent years - the <a href="http://en.wikipedia.org/wiki/Jones_polynomial">Jones Polynomial</a>. Vaughan Jones was investigating certain types of model from statistical mechanics and came to realise that some of the <a href="http://en.wikipedia.org/wiki/Yang–Baxter_equation">equations</a> that were arising were the same ones needed to get suitable weights for knot theory. From this realisation emerged his eponymous polynomial. But, that's all I'll be saying about this polynomial in this post.<br /><br />Anyway, to make things clearer here's an example:<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_UdKHLrHa05M/SLseCqZv-8I/AAAAAAAAAM0/BCBZvqZunXE/s1600-h/example1.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://3.bp.blogspot.com/_UdKHLrHa05M/SLseCqZv-8I/AAAAAAAAAM0/BCBZvqZunXE/s400/example1.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5240815622466567106" /></a><br /><br />We interpret <tt>cap</tt>, <tt>cup</tt>, <tt>over</tt> and <tt>under</tt> as the tensors M<sup>ij</sup>, N<sub>ij</sub>, S<sup>ij</sup><sub>kl</sub> and T<sup>ij</sup><sub>kl</sub>. In monadic notation we get<br /><br /><pre><br />> example (i,j) = do<br />> (k,l) <- over (i,j)<br />> (m,n) <- cap<br />> cup (l,m)<br />> return (k,n)<br /><br /></pre><br />With the summation convention in mind this evaluates to:<br /><blockquote><br />S<sup>ij</sup><sub>kl</sub>N<sup>lm</sup>M<sub>m,n</sub><br /></blockquote><br />and without it we see that all we're doing is computing<br /><blockquote><br />Σ<sub>l,m</sub>S<sup>ij</sup><sub>kl</sub>N<sup>lm</sup>M<sub>m,n</sub><br /></blockquote><br /><br />We can try an elementary choice for our tensors above. We can define M and N to be given by identity matrices:<br /><br /><pre><br />> cap :: V Float (Bool,Bool)<br />> cap = return (False,False) .+ return (True,True)<br /><br />> cup :: (Bool,Bool) -> V Float ()<br />> cup (i,j) = case (i,j) of<br />> (False,False) -> return ()<br />> (True,True) -> return ()<br />> otherwise -> mzero<br /><br /></pre><br />We can then choose S and T to simply swap their arguments:<br /><br /><pre><br />> over,under :: (Bool,Bool) -> V Float (Bool,Bool)<br />> over (i,j) = return (j,i)<br />> under (i,j) = return (j,i)<br /><br /></pre><br />With these choices, the weighting is such that a non-zero weight is assigned to a knot or tangle if and only if all of the labels along a single strand are assigned the same value, in other words we're now summing over ways of assigning labels to entire strands. For rational tangles we can only end up with three possible values corresponding to the three ways the two outputs can be connected to the two inputs. (Straight through, the <a href="http://sigfpe.blogspot.com/2008/08/untangling-with-continued-fractions_23.html">tangle I called</a> ∞; straight through with a swap, the tangle called -1; and connecting together both the inputs, and both the outputs, the tangle called 0.)<br /><br /><pre><br />> tangleInfinity,tangleMinus1,tangleZero :: (Bool,Bool) -> V Float (Bool,Bool)<br />> tangleInfinity (i,j) = return (i,j)<br />> tangleMinus1 (i,j) = return (j,i)<br />> tangleZero (i,j) = do<br />> cup (i,j)<br />> cap<br /><br />> test1 = quickCheck $ \x -> (example x == tangleInfinity x)<br />> test2 = quickCheck $ \x -> (example x == tangleMinus1 x)<br />> test3 = quickCheck $ \x -> (example x == tangleZero x)<br /><br /></pre><br />Only <tt>test2</tt> succeeds so we know <tt>example</tt> is a tangle that forms its outputs by swapping the inputs.<br /><br />So even with this simple assignment we're able to extract some basic connectivity information from our monadic representation. With smarter choices we'll get more information, but that's enough for now. We now have the prerequisites and in the next installment we can start doing some proper knot theory.<br /><HR><br /><pre><br />> data V r b = V [(r,b)] deriving (Eq,Show)<br />> unV (V bs) = bs<br /><br />> instance Num r => Functor (V r) where<br />> fmap f (V bs) = V $ map (\(r,b) -> (r,f b)) bs<br /><br />> scale x (bs) = map (\(r,b) -> (x*r,b)) bs<br /><br />> instance Num r => Monad (V r) where<br />> return b = V [(1,b)]<br />> x >>= f = join (fmap f x)<br />> where join x = g $ fmap unV x<br />> g (V bs) = V $ concat $ map (uncurry scale) bs<br /><br />> instance Num r => MonadPlus (V r) where<br />> mzero = V []<br />> mplus (V x) (V y) = V (x++y)<br /><br />> class Num r => VectorSpace r v | v -> r where<br />> zero :: v<br />> (.+) :: v -> v -> v<br />> (.*) :: r -> v -> v<br />> (.-) :: v -> v -> v<br />> v1 .- v2 = v1 .+ ((-1).*v2)<br /><br />> swap (x,y) = (y,x)<br />> reduce x = V $ filter ((/=0) . fst) $ fmap swap $ M.toList $ M.fromListWith (+) $ fmap swap $ unV $ x<br /><br />> instance (Num r,Ord b) => VectorSpace r (V r b) where<br />> zero = V []<br />> V v1 .+ V v2 = reduce (V (v1 ++ v2))<br />> r .* V v = reduce (V $ scale r v)<br /></pre><div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-6218933863329785678?l=blog.sigfpe.com'/></div>sigfpehttp://www.blogger.com/profile/08096190433222340957noreply@blogger.com1tag:blogger.com,1999:blog-11295132.post-32198956906941070402008-08-23T15:00:00.000-07:002008-08-23T19:15:55.743-07:00Untangling with Continued Fractions: Part 2Recall that I <a href="http://sigfpe.blogspot.com/2008/08/untangling-with-continued-fractions.html">defined a rational tangle</a> to be what you get by starting with a pair of untangled strings whose ends are at infinity and then sliding the strings around as much as you like keeping the ends at infinity. If you don't like the infinity bit, just think of strings inside a sphere with the ends constrained to lie on (but free to slide around) the surface of the sphere. We also insist that the ends end up at the four diagonal directions.<br /><br />Two rational tangles are considered to be <em>isotopic</em> if you can slide the strings about to turn one into the other, this time keeping the ends of the strings fixed. (Such a motion is called an <em>isotopy</em>.)<br /><br />As I've previously mentioned, you only need twists, antitwists and rotations to untangle a rational tangle. Here are the twist operations again:<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_UdKHLrHa05M/SLCJla2tPLI/AAAAAAAAALE/nTrUp43Nhto/s1600-h/twists_and_antitwist.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://1.bp.blogspot.com/_UdKHLrHa05M/SLCJla2tPLI/AAAAAAAAALE/nTrUp43Nhto/s400/twists_and_antitwist.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5237837642588830898" /></a><br /><br />I'm using a box to represent some unknown tangle. The antitwist really is an inverse for the twist because one followed by the other gives you what you started with once you pull the strings straight:<br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_UdKHLrHa05M/SLCJltsrIcI/AAAAAAAAALM/QXwA5jv0__o/s1600-h/cancel.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://4.bp.blogspot.com/_UdKHLrHa05M/SLCJltsrIcI/AAAAAAAAALM/QXwA5jv0__o/s400/cancel.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5237837647647023554" /></a><br /><br />We can also draw a rotation like this:<br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_UdKHLrHa05M/SLCKhY4maVI/AAAAAAAAALU/CS1kUW0N_oM/s1600-h/rotation.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://2.bp.blogspot.com/_UdKHLrHa05M/SLCKhY4maVI/AAAAAAAAALU/CS1kUW0N_oM/s400/rotation.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5237838672852052306" /></a><br />Let's call the twist T (so the antitwist is T<sup>-1</sup>) and the rotation S and write products of operations from right to left in the usual way. As both S and T are invertible, they generate a <a href="">group</a>. It's pretty clear that S<sup>4</sup>=1, where 1 is the identity operation. So we don't need a clockwise rotation, we can just use S<sup>3</sup>.<br /><br />But here's a surprising fact: S<sup>2</sup>=1. If we perform a 180 degree rotation on a rational tangle the result might look different at first, but it is isotopic to what we started with. I'll leave you to convince yourself of this, but you can prove it by induction on the number of twists or antitwists you have in your diagram. Try it for simple rational tangles first, maybe real ones made with real string.<br /><br />Here's an even more surprising fact, (TS)<sup>3</sup>=1. I can demonstrate this pictorially. Start with he result of applying T:<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_UdKHLrHa05M/SLCOk0TYjBI/AAAAAAAAALc/29zrFJjcWOA/s1600-h/T.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://2.bp.blogspot.com/_UdKHLrHa05M/SLCOk0TYjBI/AAAAAAAAALc/29zrFJjcWOA/s400/T.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5237843129798265874" /></a><br /><br />The result of ST:<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_UdKHLrHa05M/SLCOkzps1WI/AAAAAAAAALk/_ajRWkb2Cg4/s1600-h/ST.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://2.bp.blogspot.com/_UdKHLrHa05M/SLCOkzps1WI/AAAAAAAAALk/_ajRWkb2Cg4/s400/ST.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5237843129623434594" /></a><br /><br />The result of TST:<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_UdKHLrHa05M/SLCOlJe4VeI/AAAAAAAAALs/_O0GQhhkA7g/s1600-h/TST.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://3.bp.blogspot.com/_UdKHLrHa05M/SLCOlJe4VeI/AAAAAAAAALs/_O0GQhhkA7g/s400/TST.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5237843135483631074" /></a><br /><br />The result of STST:<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_UdKHLrHa05M/SLCOlZIFmTI/AAAAAAAAAL0/-nypXicaAfQ/s1600-h/STST.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://1.bp.blogspot.com/_UdKHLrHa05M/SLCOlZIFmTI/AAAAAAAAAL0/-nypXicaAfQ/s400/STST.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5237843139682998578" /></a><br /><br />After TSTST:<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_UdKHLrHa05M/SLCOlm7pBcI/AAAAAAAAAL8/kiYclPD4luw/s1600-h/TSTST.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://2.bp.blogspot.com/_UdKHLrHa05M/SLCOlm7pBcI/AAAAAAAAAL8/kiYclPD4luw/s400/TSTST.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5237843143388890562" /></a><br />But by sliding one string behind the tangle, allowed in an isotopy, we find that that is isotopic to this:<br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_UdKHLrHa05M/SLCOus07lHI/AAAAAAAAAME/H34xep7byu0/s1600-h/TSTST2.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://4.bp.blogspot.com/_UdKHLrHa05M/SLCOus07lHI/AAAAAAAAAME/H34xep7byu0/s400/TSTST2.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5237843299590182002" /></a><br /><br />It's not hard to see that another S brings us back to where we started. (To do that we need to rotate the block labelled A, but that is allowed in an isotopy as we're not moving the ends of the strings.) This means we can write an antitwist in terms of twists and rotations. But it's convenient to keep using antitwists.<br /><br />So we have S<sup>2</sup>=1 and (ST)<sup>3</sup>=1. It can also be proved that any other relationships between S and T can be derived from these using the operations in a group. But this is a well studied <a href="http://en.wikipedia.org/wiki/Modular_group">group</a>. We can approach it like this. Define two functions s and t by<br /><blockquote><br />s(x) = -1/x<br /></blockquote><br /><blockquote><br />t(x) = x+1.<br /></blockquote><br />It's easy to show that s<sup>2</sup>=identity=(st)<sup>3</sup>. In fact, we have an isomorphism between the group of operations we can perform on rational tangles and the functions we can build with s and t. It goes further.<br /><br />Take the rational numbers and throw in ∞ to give us the <em>extended rationals</em>. Let s and t act in the usual way on most of the rationals but throw in the extra rules s(∞)=0, s(0)=∞ and t(∞)=∞ to handle infinity. We have defined an action of s and t on the extended rationals. We can exactly parallel this with the rational tangles. Call this tangle 0:<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_UdKHLrHa05M/SLC6TATt-AI/AAAAAAAAAMs/62RhtTPMa7A/s1600-h/zero.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://4.bp.blogspot.com/_UdKHLrHa05M/SLC6TATt-AI/AAAAAAAAAMs/62RhtTPMa7A/s400/zero.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5237891202294872066" /></a><br /><br />We can reach every rational tangle by applying the operations S and T to 0. But if we replace S with s and T with t then the sequence of s's and t's will give us an extended rational number that labels the tangle. Because S and T obey the same equations as s and t, isotopic knots will end up getting the same extended rational. So we can use extended rationals to label the rational tangles. Here are some examples:<br /><br />Firstly just an antiwist, T<sup>-1</sup>. That corresponds to t<sup>-1</sup>(0)=-1.<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_UdKHLrHa05M/SLCnxjKy62I/AAAAAAAAAMM/rfS0vCpGUtE/s1600-h/-1.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://3.bp.blogspot.com/_UdKHLrHa05M/SLCnxjKy62I/AAAAAAAAAMM/rfS0vCpGUtE/s400/-1.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5237870836327836514" /></a><br /><br />Two antitwists gives -2:<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_UdKHLrHa05M/SLCnxhlMeBI/AAAAAAAAAMU/dr5u_Fk3Y4M/s1600-h/-2.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://1.bp.blogspot.com/_UdKHLrHa05M/SLCnxhlMeBI/AAAAAAAAAMU/dr5u_Fk3Y4M/s400/-2.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5237870835901691922" /></a><br /><br />Composing a 90 degree rotation gives<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_UdKHLrHa05M/SLCnx--BaKI/AAAAAAAAAMk/DAT73AITBbU/s1600-h/half.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://3.bp.blogspot.com/_UdKHLrHa05M/SLCnx--BaKI/AAAAAAAAAMk/DAT73AITBbU/s400/half.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5237870843790452898" /></a><br /><br />And applying another antitwist subtracts one again giving:<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_UdKHLrHa05M/SLCnx_Zd7_I/AAAAAAAAAMc/qride1q9jtM/s1600-h/-half.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://3.bp.blogspot.com/_UdKHLrHa05M/SLCnx_Zd7_I/AAAAAAAAAMc/qride1q9jtM/s400/-half.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5237870843905568754" /></a><br />The name <em>rational</em> tangle is justified.<br /><br />So, given any any rational tangle we simply need to find the corresponding extended rational and then figure out how to write it as a composition of s's and t's applied to 0. This is more or less the usual technique for finding the continued fraction representation of a rational. And once we know the sequence, we can then apply it in reverse to untangle the tangle.<br /><br />But there's one catch, how can we go from the monadic representation of a tangle that I described previously to the corresponding extended rational? We'll start on that problem in my next post, but amazingly the vector space monad can be used to do almost all of the work. It's surprising: it's only a short bit of code that works out the untangling operations, but the explication is getting rather long now...<br /><br />An amazing application of this work is in the study of the untangling of DNA. DNA is a double helix and the two parts of the helix must be separated in order to transcribe it. As you might imagine, the DNA can become incredibly tangled by this process. To deal with this, the <a href="http://en.wikipedia.org/wiki/Topoisomerase">topoisomerases</a> snip, manipulate and repair the DNA so as to eliminate these tangles. It was realised that it was possible to work out exactly what operations the topoisomerases were performing by analysing the before and after DNA strings in terms of rational tangles.<br /><br />The perosn who first figured out this relationship between tangles and rational numbers was the ubiquitous <a href="http://en.wikipedia.org/wiki/John_Horton_Conway">JH Conway</a>.<br /><br /><h3>References</H3><br />For the proofs missing above, these references should fill in the details:<br /><a href="http://www.math.uic.edu/~kauffman/RTang.pdf">Rational Tangles</a>, Kauffman and Goldman<br /><a href="http://books.google.com/books?hl=en&id=av05vRwIKIwC">Knots and Physics</a>, Kauffman<br />And if I've lost you above, the following is a nice easy introduction:<br /><a href="http://www.geometer.org/mathcircles/tangle.pdf">Conway's Rational Tangles</a>, Davis<br /><a href="http://www.math.uiowa.edu/~idarcy/ART/p22atopoREV2.pdf">Modeling protein-DNA complexes with tangles</a><br /><br />One last thing. I've noticed some, but not all, papers define the twist oppositely from me.<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-3219895690694107040?l=blog.sigfpe.com'/></div>sigfpehttp://www.blogger.com/profile/08096190433222340957noreply@blogger.com8tag:blogger.com,1999:blog-11295132.post-89007901528783691082008-08-16T20:38:00.000-07:002008-08-17T17:04:52.631-07:00Untangling with Continued Fractions: Part 1Continuing from <a href="http://sigfpe.blogspot.com/2008/08/untangling-with-continued-fractions.html">before</a> It's time now do discuss the problem of forming a representation of a knot or tangle in machine readable form.<br /><br />I've defined rational tangles but I should set that in a wider context. A knot is essentially a single closed loop embedded in 3D space and a link is a non-intersecting union of a bunch of these. Any tangle, knot or link can be projected down to 2D space so that you get a finite collection of 'over' or 'under' crossings connected by arcs. You might have to jiggle things around a bit to ensure that you don't get any degeneracies like two separate parts of a knot being projected to the same segment of curve in 2D, but this is always possible. Here's an example of such a 2D projection, a knot diagram, for the photograph I posted last week:<br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_UdKHLrHa05M/SKedxQeYg_I/AAAAAAAAAKU/l9Fjdmk5mtY/s1600-h/knot1.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://3.bp.blogspot.com/_UdKHLrHa05M/SKedxQeYg_I/AAAAAAAAAKU/l9Fjdmk5mtY/s400/knot1.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5235326561403044850" /></a><br />Roughly speaking, two knots or links or tangles are equivalent if you can slide the strings about to get from one to the other without passing one string through another. In the case of rational tangles we have the extra constraint that the free ends mustn't move and must always remain 'at infinity' so you can't pass loops over the end. Given two diagrams for rational tangles, the task is to tell when they represent something equivalent. I'll start on this in a future posting, but for now I just want to consider the problem of getting diagrams like that above into machine readable from. (Incidentally, the rigorous mathematical definition of knot equivalence, <em>ambient isotopy</em>, relies on the notion of finding a continuous bijection between the space <em>around</em> the two knots, not the knot itself.)<br /><br /> We don't need anything particularly clever to form our representation as there is a fairly commonplace way to represent connectons between components using monads. If we have some kind of block with inputs a and b and outputs c and d we can represent this as a line of do-notation looking like<br /><pre><br /><br />(c,d) <- block (a,b)<br /><br /></pre><br />Examples of such notation are in Matthew Naylor's <a href="http://www.haskell.org/sitewiki/images/0/03/TMR-Issue7.pdf">article on Lava</a>. We could also use <a href="http://www.haskell.org/arrows/">arrows</a> but in this case there is no need.<br /><br />So now we need to break up a diagram of a rational tangle into components and hook them up, working down the diagram from top to bottom. There are four essential components that can be discerned and I'm calling them cups, caps, overs and unders. Here are the corresponding diagrams for each one:<br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_UdKHLrHa05M/SKeefhaoj-I/AAAAAAAAAKs/iVFNgdq9Yo4/s1600-h/components.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://4.bp.blogspot.com/_UdKHLrHa05M/SKeefhaoj-I/AAAAAAAAAKs/iVFNgdq9Yo4/s400/components.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5235327356224704482" /></a><br />Cups have two inputs but no output and caps have two outputs and no inputs.<br /><br />Now all we have to do is label the inputs and outputs of each arc, write the corresponding lines as above, and collect them together in a do-block. Here I've redrawn the above diagram marking the cups and caps with red circles and the overs and unders with green circles.<br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_UdKHLrHa05M/SKedyT2OdFI/AAAAAAAAAKc/4Sh-QC0WCLI/s1600-h/knot3.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://1.bp.blogspot.com/_UdKHLrHa05M/SKedyT2OdFI/AAAAAAAAAKc/4Sh-QC0WCLI/s400/knot3.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5235326579488224338" /></a><br />I've also labelled the inputs and outputs. We can now write the following block of code.<br /><pre><br /><br />> example (a,b) = do<br />> (c,d) <- over (a,b)<br />> (e,f) <- cap<br />> (g,h) <- over (c,e)<br />> (i,j) <- over (f,d)<br />> (m,n) <- cap<br />> (k,l) <- cap<br />> (q,r) <- over (h,k)<br />> (s,y) <- over (l,i)<br />> (o,p) <- over (n,g)<br />> (t,u) <- under (p,q)<br />> (v,w) <- under (r,s)<br />> (x,z) <- over (y,j)<br />> cup (o,t)<br />> cup (u,v)<br />> cup (w,x)<br />> return (m,z)<br /><br /></pre><br />Note how this block has a pair of inputs (a,b) and a pair of outputs (m,z) corresponding to the strings at the top and bottom. Clearly knots and links should have no inputs or outputs but rational tangles should have a pair of inputs and a pair of outputs. The precise order shouldn't matter as long as you don't try to use a name that isn't yet in scope.<br /><br />So what monad should we use? It's easy to imagine some kind of state monad that allows us to generate fresh labels for each of the connections and collects up a <a href="http://web.engr.oregonstate.edu/~erwig/fgl/haskell/">graph</a>-like representation of our tangle. But the surprise is this: it turns out we don't need to do anything complicated like this. With suitable definitions of <tt>cup</tt>, <tt>cap</tt>, <tt>over</tt> and <tt>under</tt>, not only does the <a href="http://sigfpe.blogspot.com/2008/08/hopf-algebra-group-monad.html">vector space monad</a> give us the representation we want, it also does most of our computation for us. But first I need to explain the underlying mathematics in an upcoming installment.<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-8900790152878369108?l=blog.sigfpe.com'/></div>sigfpehttp://www.blogger.com/profile/08096190433222340957noreply@blogger.com12