tag:blogger.com,1999:blog-18932188586082199532008-07-17T09:59:22.573+10:00This is mePhilip Robinsonhttp://www.blogger.com/profile/17605642659657207381noreply@blogger.comBlogger25125tag:blogger.com,1999:blog-1893218858608219953.post-20427066193249914882008-06-10T21:37:00.006+10:002008-06-10T23:26:06.062+10:00"Dynamic" module generation with compile-time macros<a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp0.blogger.com/_6zQp6LtHMTo/SE6ApnDOyKI/AAAAAAAAAD8/YyLZ_gO9N3k/s1600-h/kelphorse.PNG"><img style="float:right; margin:0 0 10px 10px;cursor:pointer; cursor:hand;" src="http://bp0.blogger.com/_6zQp6LtHMTo/SE6ApnDOyKI/AAAAAAAAAD8/YyLZ_gO9N3k/s320/kelphorse.PNG" border="0" alt=""id="BLOGGER_PHOTO_ID_5210243271259768994" /></a><br />Over on the Erlang Questions mailing list, Jacob Perkins asked:<br /><br />"I have a few files that are basically lists of words. What I want to do is generate a module whose functions will return each list of words. So if I have two files named "adjectives" and "nouns", then the generated module (let's call it 'grammar') should have two functions, adjectives() and nouns(), that return their respective list of words.<br /><br />How can I do this?"<br /><br /><br />If I had some method of running Erlang code at compile-time - and I do - then I could write a program to slurp in those adjective and noun files and generate the appropriate functions.<br /><br /><br />adjectives.txt:<br /><pre><br />this<br />that<br />these</pre><br /><br />nouns.txt:<br /><pre><br />apple<br />banana<br />carrot</pre><br /><br />grammar.erl:<br /><pre><br />-module(grammar).<br />-export([nouns/0, adjectives/0]).<br />-compile({parse_transform, emp2}).<br /><br />-macro({grammar_macro, generate, [nouns]}).<br />-macro({grammar_macro, generate, [adjectives]}).</pre><br /><br />grammar_macro.erl:<br /><pre><br />-module(grammar_macro).<br />-export([generate/1]).<br /><br />generate(NameFun) -><br /> {ok,Fd} = file:open(atom_to_list(NameFun) ++ ".txt", [read]),<br /> generate(Fd, NameFun, []).<br /><br />generate(Fd, NameFun, Words) -><br /> case io:get_line(Fd, "") of<br /> eof -><br /> ok=file:close(Fd),<br /> io_lib:format("~w() -> ~w.~n", [NameFun, lists:reverse(Words)]);<br /> Word -> generate(Fd, NameFun, [Word | Words])<br /> end.</pre><br /><br />And in the Erlang shell:<br /><pre><br />1> c(grammar_macro), c(grammar).<br />{ok,grammar}<br />2> grammar:nouns().<br />["apple\n","banana\n","carrot\n"]<br />3> grammar:adjectives().<br />["this\n","that\n","these\n"]</pre>Philip Robinsonhttp://www.blogger.com/profile/17605642659657207381noreply@blogger.comtag:blogger.com,1999:blog-1893218858608219953.post-43234505795432100242007-11-30T09:35:00.001+10:002007-12-03T08:29:03.777+10:00PARE - PARallel Execution in Erlang<a href="http://bp0.blogger.com/_6zQp6LtHMTo/R09NS6PAIJI/AAAAAAAAAD0/TyYMaEbbr4k/s1600-h/Teacup.png"><img id="BLOGGER_PHOTO_ID_5138410687118188690" style="margin: 0px 0px 10px 10px; float: right;" alt="" src="http://bp0.blogger.com/_6zQp6LtHMTo/R09NS6PAIJI/AAAAAAAAAD0/TyYMaEbbr4k/s320/Teacup.png" border="0" /></a>Update 3/12/2007: Attempt to restore all the missing vertical bars from the code...<br /><br /><br />Don <a href="http://cgi.cse.unsw.edu.au/%7Edons/blog/2007/11/29#smoking">echoed</a> a recent sentiment of mine: "Importantly, and unlike say, Erlang or Concurrent Haskell, we don't have to do manual thread creation, synchronisation or communication -- the compiler does all that for us!"<br /><br />Consider this arbitrary (and rather pointless) piece of Erlang code:<br /><br /><pre><code><br /> A = a(),<br /> b(),<br /> c().</code></pre><br /><br />If I wanted to run the above expressions in parallel (and let us assume that in this case I knew what I was doing, however improbable that might seem), I would normally have to risk RSI with something like this:<br /><br /><pre><code><br /> PidThis = self(),<br /> PidA = spawn_link(fun() -> PidThis ! {self(), a()} end),<br /> spawn_link(fun() -> b() end),<br /> PidC = spawn_link(fun() -> PidThis ! {self(), c()} end),<br /> A = receive {PidA, ResultA} -> ResultA end,<br /> receive {PidC, ResultC} -> ResultC end.</code></pre><br /><br />Ouch.<br /><br />It would be much nicer if we could just tell the compiler that we want the next 'N' expressions to be executed in parallel and have the compiler handle all the rest of the boilerplate. Like this:<br /><br /><pre><code><br /> parallel_execution_3,<br /> A = a(),<br /> b(),<br /> c().</code></pre><br /><br />Rather fortuitously for this blog entry, that is exactly what the PARE parse transformation module does:<br /><br /><pre><code><br />% PARE: PARallel Execution<br />%<br />% Caveats:<br />% Atoms starting with 'parallel_execution_' are consumed by this parse_transform,<br />% and variables starting with 'parallel_execution_' will be created.<br />% A process dictionary counter (key: 'parallel_execution_var_id') will be used<br />% while compiling.<br />% Change the definition of 'PREFIX' and 'VAR_ID' if these are unsuitable for your<br />% codebase.<br />%<br />% Use:<br />% A sequence of expressions beginning with an atom of the form<br />% 'parallel_execution_N' (where N is an integer) will be parallelised by this<br />% parse transformation. The next 'N' expressions (at the same level as the<br />% triggering atom) will be converted into a series of spawn/receive expressions,<br />% and the triggering atom will be removed from the code.<br />%<br />% Return-value order will be preserved: the last expression in a list of<br />% expressions will always be the value returned by that sequence of expressions.<br />%<br />% Future:<br />% Use different triggering atom prefixes for spawn vs spawn_link?<br /><br />-module(pare).<br />-author("Philip Robinson").<br />-vsn('1.0').<br />-export([parse_transform/2]).<br /><br />-define(PREFIX, "parallel_execution_").<br />-define(VAR_ID, parallel_execution_var_id).<br /><br />parse_transform(ASTIn, _Options) -><br /> put(?VAR_ID, 0), ASTOut = ast(ASTIn, []), erase(?VAR_ID), ASTOut.<br /><br />% PARALLELISE_AST/2<br />ast([{function,Line,NameFun,Arity,ClausesIn} | ASTIn], ASTOut) -><br /> ast(ASTIn, [{function,Line,NameFun,Arity,clause(ClausesIn, [])} | ASTOut]);<br />ast([Node | ASTIn], ASTOut) -> ast(ASTIn, [Node | ASTOut]);<br />ast([], ASTOut) -> lists:reverse(ASTOut).<br /><br />% PARALLELISE_CLAUSES/2<br />clause([{clause,Line,Pattern,Guards,ExprsIn} | ClausesIn], ClausesOut) -><br /> clause(ClausesIn, [{clause,Line,Pattern,Guards,expr(ExprsIn, [])}</code><code> | </code><code>ClausesOut]);<br />clause([], ClausesOut) -> lists:reverse(ClausesOut).<br /><br />% PARALLELISE_EXPRS/2 - Searching for a trigger atom.<br />expr([{atom,_,Atom}=Expr</code><code> | </code><code>ExprsIn], ExprsOut) -><br /> AtomStr = atom_to_list(Atom),<br /> case lists:prefix(?PREFIX, AtomStr) of<br /> false -> expr(ExprsIn, [Expr</code><code> | </code><code>ExprsOut]);<br /> true -><br /> N = list_to_integer(element(2, lists:split(length(?PREFIX), AtomStr))),<br /> PidThis = new_var(),<br /> Line = element(2, hd(ExprsIn)),<br /> {RParallelExprs, Rest} = expr(PidThis, ExprsIn, N, [], []),<br /> ExprPidThis = [{match,Line,{var,Line,PidThis},<br /> {call,Line,{atom,Line,self},[]}}],<br /> expr(Rest, RParallelExprs ++ ExprPidThis ++ ExprsOut)<br /> end;<br />expr([{block,Line,Body}</code><code> | </code><code>ExprsIn], ExprsOut) -><br /> expr(ExprsIn,[{block,Line,expr(Body, [])}</code><code> | </code><code>ExprsOut]);<br />expr([{'case',Line,Condition,Clauses}</code><code> | </code><code>ExprsIn], ExprsOut) -><br /> expr(ExprsIn,[{'case',Line,Condition,clause(Clauses,[])}</code><code> | </code><code>ExprsOut]);<br />expr([{'if',Line,Clauses}</code><code> | </code><code>ExprsIn], ExprsOut) -><br /> expr(ExprsIn,[{'if',Line,clause(Clauses,[])}</code><code> | </code><code>ExprsOut]);<br />expr([{'try',Line,Body,CaseClauses,CatchClauses,After}</code><code> | </code><code>ExprsIn], ExprsOut) -><br /> expr(ExprsIn,[{'try',Line,expr(Body,[]), clause(CaseClauses,[]),<br /> clause(CatchClauses,[]), expr(After,[])}</code><code> | </code><code>ExprsOut]);<br />expr([Expr</code><code> | </code><code>ExprsIn], ExprsOut) -> expr(ExprsIn, [Expr</code><code> | </code><code>ExprsOut]);<br />expr([], ExprsOut) -> lists:reverse(ExprsOut).<br /><br />% PARALLELISE_EXPRS/5 - Trigger atom has been found, parallelise the following 'N' expressions.<br />% Build up a list of expressions to spawn and receive.<br />expr(_PidThis, ExprsIn, 0, SpawnExprs, ReceiveExprs) -><br /> {ReceiveExprs ++ SpawnExprs, ExprsIn};<br />% Match expression:<br />% Spawn RHS, match receive value to original LHS.<br />expr(PidThis, [{match,Line,LHS,RHS}</code><code> | </code><code>ExprsIn], N, SpawnExprs, ReceiveExprs) -><br /> VarPid = {var,Line,new_var()},<br /> VarReceive = {var,Line,new_var()},<br /> VarReason = {var,Line,new_var()},<br /> Spawn = {match,Line,VarPid,{call,Line,{atom,Line,spawn_link},<br /> [{'fun',Line,{clauses,[{clause,Line,[],[],[{op,Line,'!',{var,Line,PidThis},<br /> {tuple,Line,[{call,Line,{atom,Line,self},[]},RHS]}}]}]}}]}},<br /> Receive = {match,Line,LHS,{'receive',Line,[<br /> {clause,Line,[{tuple,Line,[VarPid,VarReceive]}], [], [VarReceive]},<br /> {clause,Line,[{tuple,Line,[{atom,Line,'EXIT'},VarReason]}], [],<br /> [{call,Line,{atom,Line,exit},[VarReason]}]}]}},<br /> expr(PidThis, ExprsIn, N - 1, [Spawn</code><code> | </code><code>SpawnExprs], [Receive | ReceiveExprs]);<br />% Last expression in parallel block and not a match expression:<br />% Spawn expression, capture return value as last return from expression sequence.<br />expr(PidThis, [Expr</code><code> | </code><code>ExprsIn], 1, SpawnExprs, ReceiveExprs) -><br /> Line = element(2, Expr),<br /> VarPid = {var,Line,new_var()},<br /> VarReceive = {var,Line,new_var()},<br /> VarReason = {var,Line,new_var()},<br /> Spawn = {match,Line,VarPid,{call,Line,{atom,Line,spawn_link},<br /> [{'fun',Line,{clauses,[{clause,Line,[],[],[{op,Line,'!',{var,Line,PidThis},<br /> {tuple,Line,[{call,Line,{atom,Line,self},[]},Expr]}}]}]}}]}},<br /> Receive = {'receive',Line,[<br /> {clause,Line,[{tuple,Line,[VarPid,VarReceive]}], [], [VarReceive]},<br /> {clause,Line,[{tuple,Line,[{atom,Line,'EXIT'},VarReason]}], [],<br /> [{call,Line,{atom,Line,exit},[VarReason]}]}]},<br /> expr(PidThis, ExprsIn, 0, [Spawn</code><code> | </code><code>SpawnExprs], [Receive</code><code> | </code><code>ReceiveExprs]);<br />% Non-match expression:<br />% Spawn expression, do not wait for a return message.<br />expr(PidThis, [Expr | ExprsIn], N, SpawnExprs, ReceiveExprs) -><br /> Line = element(2, Expr),<br /> Spawn = {call,Line,{atom,Line,spawn},<br /> [{'fun',Line,{clauses,[{clause,Line,[],[],[Expr]}]}}]},<br /> expr(PidThis, ExprsIn, N - 1, [Spawn</code><code> | </code><code>SpawnExprs], ReceiveExprs).<br /><br />% NEW_VAR/0 - Return the next internal PARE variable.<br />new_var() -> list_to_atom(?PREFIX ++ integer_to_list(put(?VAR_ID, get(?VAR_ID) + 1))).</code></pre><br /><br />Here is an Erlang version of Don's 'fib' module, using PARE:<br /><br /><pre><code><br />-module(fib).<br />-export([main/0]).<br />-compile({parse_transform, pare}).<br /><br />fib(0) -> 0;<br />fib(1) -> 1;<br />fib(N) -><br /> parallel_execution_2,<br /> A = fib(N-1),<br /> B = fib(N-2),<br /> A + B.<br /><br />main() -><br /> [io:format("n=~B => ~B~n", [X, fib(X)]) || X <- lists:seq(0, 35)],<br /> ok.</code></pre><br /><br />Postscript:<br /><br />A handy thing to have when developing parse-transformations is a 'showast' module:<br /><br /><pre><code><br />-module(showast).<br />-export([parse_transform/2]).<br /><br />parse_transform(AST, _Options) -><br /> io:format("AST:~n~p~n", [AST]),<br /> AST.</code></pre><br /><br />Include it twice in your testing code to get snapshots of the test module's AST before and after your parse_transform has had a go at it:<br /><br /><pre><code><br />-module(test).<br />-export([test/0]).<br />-compile([{parse_transform, showast},<br /> {parse_transform, pare}, {parse_transform, showast}]).<br /><br />test() -> ok.</code></pre><br /><br />Alternatively, you could just compile the test module with the 'P' compiler option <code>c(test, ['P']).</code> to produce a code listing (in the file "test.P") after the parse-transform has been applied.<br /><br /><br />Post-Postscript:<br /><br />Setting up a macro or two can save a lot of typing with PARE:<br /><br /><pre><code><br />-define(P2, parallel_execution_2).<br /><br />test() -><br /> ?P2,<br /> io:format("1~n"),<br /> io:format("2~n").</code></pre><br /><br />Or you could change PARE to look for a different triggering atom prefix. Be my guest!Philip Robinsonhttp://www.blogger.com/profile/17605642659657207381noreply@blogger.comtag:blogger.com,1999:blog-1893218858608219953.post-25231042540960159392007-07-08T03:19:00.000+10:002007-07-28T21:15:23.009+10:00Erlang and the Very Large Binary<a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp0.blogger.com/_6zQp6LtHMTo/Ro_Z4qOJxvI/AAAAAAAAADs/0AtfHuz6RUs/s1600-h/HalfOnion06.png"><img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer;" src="http://bp0.blogger.com/_6zQp6LtHMTo/Ro_Z4qOJxvI/AAAAAAAAADs/0AtfHuz6RUs/s320/HalfOnion06.png" alt="" id="BLOGGER_PHOTO_ID_5084522071754131186" border="0" /></a><br /><span style="font-weight: bold;">Update 28/07/2007:</span> The issue with pattern-matching on very large binaries has been resolved in <a href="http://erlang.org/download/otp_src_R11B-5.readme">R11B5</a>, so the workaround below is no longer needed. Nothing to see here folks, move along... many thanks to the Erlang OTP team for clearing this up.<br /><br />The following post has been kept purely for posterity.<br /><br />(Also, see Per's comment where a better workaround than mine is suggested.)<br /><br /><div style="text-align: center;">----------</div><br /><br />I have these <a href="http://chlorophil.blogspot.com/2007/06/erlang-binary-map.html">binary files I created from the Netflix data</a>. Some of them are quite large, so for peace of mind I had to do a quick check to see whether Erlang could handle binaries of that size.<br /><br />It turns out that Erlang can indeed handle some <a href="http://erlang.org/doc/efficiency_guide/advanced.html#7.2">reasonably large binary sizes</a>. Sort of. There was certainly no problem with loading a 300MB binary into RAM. Accessing the elements of this binary, however, proved to be somewhat of a problem.<br /><br />I had written a simple helper function to manage retrieving elements from memory- or file-based binaries[1]:<br /><br /><pre><code>bin_get(BytesOffset, BytesToRead, Fd={file_descriptor,prim_file,_}) -><br /> {ok,Element} = file:pread(Fd, BytesOffset, BytesToRead),<br /> Element;<br />bin_get(BytesOffset, BytesToRead, Bin) -><br /> <<_:bytesoffset/binary, Element:BytesToRead/binary>> = Bin,<br /> Element.</code></pre><br /><br />For relatively small <code>BytesOffset</code> values everything worked as expected. But as soon as I tried an offset of 134,217,728 bytes or higher I received a <code>badmatch</code> error from <code>bin_get/3</code>... but only for memory-based binaries. Opening a file descriptor to the same binary and retrieving the same offset value worked just fine, if a bit slower.<br /><br /><blockquote><span style="font-weight: bold;">There appears to be a maximum element size limit of 2^27 - 1 for binary pattern matching.</span>[2]</blockquote><br /><br />There was a simple workaround for this limit - all I needed to do was have a few extra clauses in <code>bin_get/3</code> and insert multiple anonymous elements into the binary pattern match where needed. Since my largest binary is just a bit over 300MB I could get away with three clauses:<br /><br /><pre><code>bin_get(BytesOffset, BytesToRead, Bin) when BytesOffset =< 134217727 -><br /> <<_:BytesOfset/binary, Element:BytesToRead/binary>> = Bin,<br /> Element;<br />bin_get(BytesOffset, BytesToRead, Bin) when BytesOffset =< 268435454 -><br /> BytesOffset2 = BytesOffset - 134217727,<br /> <<_:134217727/binary, _:BytesOffset2/binary, Element:BytesToRead/binary>> = Bin,<br /> Element;<br />bin_get(BytesOffset, BytesToRead, Bin) -><br /> BytesOffset2 = BytesOffset - 268435454,<br /> <<_:134217727/binary, _:134217727/binary, _:BytesOffset2/binary, Element:BytesToRead/binary>> = Bin,<br /> Element.</code></pre><br /><br />I was happy to let a call for an offset greater than 402,653,181 to produce a runtime <code>badmatch</code> error, but not so happy to discover that the code above produced a compile error:<br /><br /><code>beam/beam_load.c(1551): Error loading function test:bin_get/3: op bs_skip_bits2 f x w u u: no specific operation found<br />{error,badfile}</code><br /><br />After judicious use of the 'comment out lines of code and recompile the program' debugging technique, I determined that the Erlang compiler really did not like having the two initial anonymous elements in that last binary pattern match. Even turning the underscores into named (but ignored) variables gave the same result.<br /><br /><br />The solution to the problem raised by my solution to the initial problem was to go recursive:<br /><br /><pre><code>-define(MAX_BIN_ELEM_SIZE, 134217727).<br />bin_get(BytesOffset, BytesToRead, Bin) when BytesOffset =< ?MAX_BIN_ELEM_SIZE -><br /> <<_:BytesOffset/binary, Element:BytesToRead/binary>> = Bin,<br /> Element;<br />bin_get(BytesOffset, BytesToRead, <<_:?MAX_BIN_ELEM_SIZE/binary,Bin/binary>>) -><br /> bin_get(BytesOffset - ?MAX_BIN_ELEM_SIZE, BytesToRead, Bin).</code></pre><br /><br />In other words, keep discarding the first 'magic number' bytes from the binary and subtract the magic number from the offset until our offset is equal to or less than the magic number, then access the element in the binary in the usual manner.<br /><br /><br />I am not particularly happy with the need for this hack, but the end result lets me use some large in-memory binaries instead of constantly seeking and reading from disk. (If this last attempt hadn't worked then I was going to try breaking the large binaries into 100-MB chunks, and that would have been a <span style="font-style: italic;">much</span> uglier workaround.)<br /><br /><br /><br />[1] Depending on how much RAM the machine I was running the code on had, I set certain read-only binary files to be either loaded straight into memory or to just have a file handle opened for them. <code>bin_get/3</code> was written to abstract that difference away from the rest of the code. I would have gotten away with it, too, if it weren't for those pesky errors.<br /><br />[2] This would be consistent with using a 32-bit word to store the size value (with 4 of those bits used for a type identifier and 1 bit for a sign indicator). I would expect the element limit on a 64-bit architecture to be somewhat larger, and I wouldn't have noticed this problem with the size of the binaries I am currently using.Philip Robinsonhttp://www.blogger.com/profile/17605642659657207381noreply@blogger.comtag:blogger.com,1999:blog-1893218858608219953.post-51324850843581603052007-06-28T21:14:00.000+10:002007-06-30T23:29:12.965+10:00Erlang Binary Map<a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp0.blogger.com/_6zQp6LtHMTo/RoOaPqOJxtI/AAAAAAAAADc/v4O89pbzA4M/s1600-h/SubdivideSphere4.bmp"><img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer;" src="http://bp0.blogger.com/_6zQp6LtHMTo/RoOaPqOJxtI/AAAAAAAAADc/v4O89pbzA4M/s320/SubdivideSphere4.bmp" alt="" id="BLOGGER_PHOTO_ID_5081074398426416850" border="0" /></a><br />(Updated to clean up some hypertext errors in the code examples.)<br /><br />The <a href="http://www.netflixprize.com/">Netflix Challenge</a> <a href="http://www.netflixprize.com/download">dataset</a> is very big, and there is not much RAM in my laptop at all.<br /><br />To fit as much as possible of the rating data into memory, I converted the data into a bunch of Erlang binaries. Erlang binaries are 'just' contiguous blocks of bytes, so they do not incur the overhead of list cons cells or dictionary indexing. They are fast, but pretty simple, and using them to store data like C arrays means that you have to write your own code to manage access to them. (At least you will get a runtime error if you try to access past the end of a binary, which is a major step up from C.)<br /><br /><br />While wandering through the wasteland that is my code I happened to notice that one section did not smell quite right. This code was taking a chunk of that binary data, converting the chunk into a list of elements, and then mapping a function over this list of elements in order to return a list of function results. Creating that intermediate list seemed like a bit of unnecessary overhead - I would much rather iterate directly across the binary itself and save all of that extra list consing - but I could not find any <code>binary:map</code>-like function anywhere.<br /><br />So I wrote my own.<br /><br />It's not very big.<br /><br /><br />The original code had used a <code>list_from_bin</code> function to turn a binary into a list of elements:<br /><br /><pre><code>% Convert a binary into a list of unsigned integer elements.<br />list_from_bin(Bin, BytesPerElem) -><br /> list_from_bin(Bin, BytesPerElem, BytesPerElem * 8, size(Bin), []).<br /><br />list_from_bin(_Bin, _BytesPerElem, _BitsPerElem, 0, Results) -> Results;<br />list_from_bin(Bin, BytesPerElem, BitsPerElem, BytesOffset, Results) -><br /> BytesDiscard = BytesOffset - BytesPerElem,<br /> <<_:BytesDiscard/binary,Element:BitsPerElem/unsigned-integer,_/binary>> = Bin,<br /> list_from_bin(Bin, BytesPerElem, BitsPerElem, BytesDiscard, [Element|Results]).</code></pre><br /><br />And <code>list_from_bin</code> was used in this manner (with a trivial "<code>2 * Elem</code>" function):<br /><br /><code>[2 * N || N <- list_from_bin(<<1,2,3,4>>, 1)].</code><br />-> [2,4,6,8]<br /><br /><code>[2 * N || N <- list_from_bin(<<1,2,3,4>>, 2)].</code><br />-> [516,1544]<br /><br />Note that <code>list_from_bin</code> iterates from the end of the binary backwards down to the beginning of the binary, so the list it builds is in the same order as the original binary and does not need reversing.<br /><br />(If all of my elements were one byte long then I could have just used the standard Erlang <code>binary_to_list/1</code> function, but sometimes the elements I used were two or three bytes in length. I should have probably included an initial clause of "<code>list_from_bin(Bin, 1) -> list_from_binary(Bin);</code>", but didn't think of it at the time.)<br /><br /><br />The new <code>map_bin</code> function maps a function over the elements in a binary, and returns a <em>list</em> of the function's results:<br /><br /><pre><code>map_bin(Fun, Bin, BytesPerElem) -><br /> map_bin(Fun, Bin, BytesPerElem, 0, size(Bin) div BytesPerElem, []).<br /><br />map_bin(_Fun, _Bin, _BytesPerElem, _BytesDiscard, 0, Results) -> lists:reverse(Results);<br />map_bin(Fun, Bin, BytesPerElem, BytesDiscard, CountElemRemain, Results) -><br /> <<_:BytesDiscard/binary,Elem:BytesPerElem/binary,_/binary>> = Bin,<br /> map_bin(Fun, Bin, BytesPerElem, BytesDiscard + BytesPerElem, CountElemRemain - 1, [Fun(Elem)|Results]).</code></pre><br /><br />The main function (<code>map_bin/3</code>) takes these arguments:<br /><ul><li><code>Fun</code>: A function that takes a single binary element as input. May return any value.</li><li><code>Bin</code>: The original binary data block. Its <code>size</code> should be an exact multiple of <code>BytesPerElem</code>. If the <code>size</code> of the binary is not an exact multiple of the <code>BytesPerElem</code> value then any excess bytes at the end of the binary are discarded.</li><li><code>BytesPerElem</code>: The number of bytes taken up by each element in the binary.<br /></li></ul><br />The helper function (<code>map_bin/6</code>) takes three additional arguments:<br /><ul><li><code>BytesDiscard</code>: The number of bytes to skip at the beginning of the binary, for the current iteration.</li><li><code>CountElemRemain</code>: The number of elements remaining to process.</li><li><code>Results</code>: An accumulated, reversed list of the function's results.<br /></li></ul><br />The <code>BytesDiscard</code> argument was added to avoid having to recalculate the number of bytes to skip for every iteration (with, for example, something like "<code>BytesDiscard = Offset * BytesPerElem</code>"). I am not sure if this was a good decision or if it reeks too much of premature optimisation. Old C habits die hard.<br /><br /><code>CountElemRemain</code> starts at the number of elements to process and decrements each iteration so the terminating condition can be written simply as <code>0</code>, rather than having to have a "<code>when Offset > CountElems</code>" guard on the function clause.<br /><br />And here is <code>map_bin</code> in action:<br /><br /><code>map_bin(fun(<< Elem:8>>) -> 2 * Elem end, <<1,2,3,4>>, 1).</code><br />-> [2,4,6,8]<br /><br /><code>map_bin(fun(<< Elem:16>>) -> 2 * Elem end, <<1,2,3,4>>, 2).</code><br />-> [516,1544]<br /><br /><br />Now with the new <code>map_bin</code> function my code can skip the creation of an intermediate list, and, quite entirely by accident, is actually more flexible than before. The original code always produced lists of unsigned integers from the binaries; the new code can be used to operate on multiple elements. For example:<br /><br /><code>map_bin(fun(<< Elem1:8,Elem2:16>>) -> Elem1 + Elem2 end, <<1,2,3,4,5,6>>, 3).</code><br />-> [516,1290]<br /><br />It's just not <span style="font-style: italic;">quite</span> as nice to look at as a good list comprehension.<br /><br /><br /><span style="font-size:130%;">Performance</span><br /><br />The <code>map_bin</code> function is all well and good, but the real question we all want answered is... does using this new code actually make our program run any faster?<br /><br />Well, according to my <span style="font-weight:bold;">very</span> informal use of <code>now/0</code> and <code>timer:now_diff/2</code>, with large binaries and a trivial "x2" function for each element, <code>map_bin</code> seems to be around 11% faster than using <code>list_from_bin</code>. That's... not too bad. But we could go faster with multiple processes, I think.<br /><br /><br /><span style="font-size:130%;">Parallelism</span><br /><br />Really, what is the point of using Erlang if we don't <code>spawn</code> a few hundred processes for every trivial piece of code?<br /><br />Luckily for my fingers it is only a minor modification to make a parallel version of <code>map_bin</code>:<br /><br /><pre><code>pmap_bin(Fun, Bin, BytesPerElem) -><br /> pmap_bin(Fun, Bin, BytesPerElem, 0, size(Bin) div BytesPerElem, []).<br /><br />pmap_bin(_Fun, _Bin, _BytesPerElem, _BytesDiscard, 0, Pids) -><br /> [receive {done, Pid, Result} -> Result end || Pid <- lists:reverse(Pids)];<br />pmap_bin(Fun, Bin, BytesPerElem, BytesDiscard, CountElemRemain, Pids) -><br /> PidThis = self(),<br /> <<_:BytesDiscard/binary,Elem:BytesPerElem/binary,_/binary>> = Bin,<br /> pmap_bin(Fun, Bin, BytesPerElem, BytesDiscard + BytesPerElem, CountElemRemain - 1,<br /> [spawn(fun() -> PidThis ! {done, self(), Fun(Elem)} end)|Pids]).</code></pre><br /><br />Sadly, I cannot comment much on the speed difference of this conversion because I do not (yet) have access to a multi-core machine. A routine like this is probably best avoided for a single-CPU system as the overhead of spawning many processes would be wasted, but it should perform better on multiple-CPU systems. (Feel free to try it out and let me know how it goes!)<br /><br /><br />There is still room for improvement in this function, if we wanted to take it further. We may not want to spawn a separate process for each element, for instance, but rather split the original binary into N chunks and spawn a process to apply a function to each chunk. Also, we might want to expand on the parallelism and include other nodes into the mix, to spread the calculations across multiple machines.<br /><br />Something for another day.<br /><br /><br /><span style="font-size:78%;">And now someone is going to tell me that binary comprehensions have been available in the standard Erlang distribution for a while, and I just haven't been paying enough attention to the Erlang mailing-list announcements.<br /></span>Philip Robinsonhttp://www.blogger.com/profile/17605642659657207381noreply@blogger.comtag:blogger.com,1999:blog-1893218858608219953.post-4407938042117253442007-06-22T13:28:00.000+10:002007-06-22T13:47:40.182+10:00Erlang Macro Oddness<a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp3.blogger.com/_6zQp6LtHMTo/RntCTbit3BI/AAAAAAAAADU/grrwczgjlcU/s1600-h/SubdivideSphere3.bmp"><img style="float:right; margin:0 0 10px 10px;cursor:pointer; cursor:hand;" src="http://bp3.blogger.com/_6zQp6LtHMTo/RntCTbit3BI/AAAAAAAAADU/grrwczgjlcU/s320/SubdivideSphere3.bmp" border="0" alt=""id="BLOGGER_PHOTO_ID_5078725906368683026" /></a><br />I found a little oddity with Erlang macros while I was writing version 2 of the <a href="http://chlorophil.blogspot.com/2007/06/collaborative-filtering-weighted-slope_20.html">Weighted Slope One</a> algorithm. It seems that passing a multi-statement anonymous function as a parameter into a macro confuses the compiler.<br /><br />For example, this code works:<br /><pre><code>-module(macro_oddness).<br />-export([start/0]).<br />-define(A_MACRO(FunAnon), apply(FunAnon, [])).<br /><br />start() -><br /> ?A_MACRO(<br /> fun() -><br /> io:format("Single-element function works fine.~n")<br /> end).</code></pre><br /><br />But this code produces a compile-time error:<br /><pre><code>-module(macro_oddness).<br />-export([start/0]).<br />-define(A_MACRO(FunAnon), apply(FunAnon, [])).<br /><br />start() -> ?A_MACRO(<br /> fun() -><br /> io:format("Multiple-element function "),<br /> io:format("does not compile.~n")<br /> end).</code></pre><br /><br />An "argument mismatch for macro ''A_MACRO''" error, to be precise.<br /><br />Interestingly, multiple statements in a begin..end block seem to be okay:<br /><br /><pre><code>-module(macro_oddness).<br />-export([start/0]).<br />-define(A_MACRO(FunAnon), apply(FunAnon, [])).<br /><br />start() -> ?A_MACRO(<br /> fun() -><br /> begin<br /> io:format("Multiple-element function "),<br /> io:format("with a begin..end block is okay.~n")<br /> end<br /> end).</code></pre><br /><br />Something to keep an eye out for.Philip Robinsonhttp://www.blogger.com/profile/17605642659657207381noreply@blogger.comtag:blogger.com,1999:blog-1893218858608219953.post-66358471854652538672007-06-20T22:30:00.000+10:002007-06-20T22:46:26.100+10:00Collaborative Filtering: Weighted Slope One in Erlang (v2)<a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp2.blogger.com/_6zQp6LtHMTo/Rnkefrit3AI/AAAAAAAAADM/0WIo7Urb1kw/s1600-h/wave2.png"><img style="float:right; margin:0 0 10px 10px;cursor:pointer; cursor:hand;" src="http://bp2.blogger.com/_6zQp6LtHMTo/Rnkefrit3AI/AAAAAAAAADM/0WIo7Urb1kw/s320/wave2.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5078123584450059266" /></a><br />Okay, so my initial Weighted Slope One Erlang translation wasn't very Erlang-ish... not a single <code>spawn</code> in sight and side-effects galore. Yuck.<br /><br />I've ripped out ETS and replaced it with dictionaries, and modified the <code>build_matrix</code> loops to palm off the real work to <code>spawn</code>ed processes rather than do it themselves. <br /><br />The main change was with the <code>build_matric</code> function. A long-running process is <code>spawn</code>ed for every item in the system (the 'X' item in the {x, y} difference/frequency matrix), and short-term user rating processes are <code>spawn</code>ed to send off difference and frequency information to the relevant item and wait for confirmation that the message was received. Once all of the data has been sent the item processes are asked to return their individual dictionaries to the main process, which then merges them into one big dictionary.<br /><br />The item processes die naturally after they return their dictionaries, and the user rating processes only live long enough to ensure that their data is received by the relevant item process.<br /><br />The only other significant changes to the program were to make the <code>predict</code> function use a dictionary rather than an ETS table.<br /><br /><br />The concurrent bits of this program use this idiom[*]:<br /><br /><pre><code>process_flag(trap_exit, true),<br />[receive<br /> {'EXIT', Pid, normal} -> ok;<br /> {'EXIT', Pid, Reason} -> exit(self(), Reason)<br /> end<br /> || Pid <- [spawn_link(fun() -> <span style="font-weight:bold;">>>do something<<</span> end)<br /> || <span style="font-weight:bold;">>>value/s <- list of value/s<<</span>]]</code></pre><br /><br />In other words: <code>spawn</code> a process to do something for every element in the list of values, and wait for each of those processes to signal that they have finished normally. (I used a substitution macro to avoid typing in all that boilerplate.)<br /><br /><br />An unfortunate consequence of this modification is that the code is about 50% larger than the earlier Erlang version:<br /><br /><pre><code>%% slopeone2.erl<br />% Philip Robinson<br />% A parallel implementation of the weighted slope one<br />% algorithm in Erlang for item-based collaborative<br />% filtering.<br />% Based on the same algorithm in Java by Daniel Lemire and Marco Ponzi:<br />% http://www.daniel-lemire.com/fr/documents/publications/SlopeOne.java<br /><br />-module(slopeone2).<br />-export([start/0]).<br /><br />-define(SPAWN_AND_WAIT(Block, Data),<br /> process_flag(trap_exit, true),<br /> [receive<br /> {'EXIT', Pid, normal} -> ok;<br /> {'EXIT', Pid, Reason} -> exit(self(), Reason)<br /> end<br /> || Pid <- [spawn_link(fun() -> Block end)<br /> || Data]]).<br /><br />start() -><br /> % The rating database: A list of users, each containing a list of {item, rating} elements.<br /> Items = [{item1,"candy"}, {item2,"dog"}, {item3,"cat"}, {item4,"war"}, {item5,"strange food"}],<br /> DataRating = [{user1, "Bob", [{item1,1.0}, {item2,0.5}, {item4,0.1}]},<br /> {user2, "Jane", [{item1,1.0}, {item3,0.5}, {item4,0.2}]},<br /> {user3, "Jo", [{item1,0.9}, {item2,0.4}, {item3,0.5}, {item4,0.1}]},<br /> {user4, "StrangeJo", [{item1,0.1}, {item4,1.0}, {item5,0.4}]}],<br /> % The difference & frequency database: a dictionary of {{item X, itemY}, {diff, freq}}.<br /> % Create the predictor engine.<br /> DiffFreq = build_matrix(Items, DataRating),<br /> io:format("Here's the data I have accumulated...~n"),<br /> print_data(Items, DataRating, DiffFreq),<br /> io:format("Ok, now we predict...~n"),<br /> TestRatings1 = [{item5,0.4}],<br /> io:format("Inputting...~n"),<br /> print_user_ratings(Items, TestRatings1),<br /> io:format("Getting...~n"),<br /> print_user_ratings(Items, predict(Items, TestRatings1, DiffFreq)),<br /> TestRatings2 = [{item4,0.2}|TestRatings1],<br /> io:format("Inputting...~n"),<br /> print_user_ratings(Items, TestRatings2),<br /> io:format("Getting...~n"),<br /> print_user_ratings(Items, predict(Items, TestRatings2, DiffFreq)),<br /> ok.<br /><br />% Based on existing data, and using weights, try to predict all missing ratings.<br />% The trick to make this more scalable is to consider only difference entries<br />% having a large (> 1) frequency entry.<br />% Precondition: ItemRatings is sorted as per lists:sort/1 (to re-merge actual ratings).<br />predict(Items, ItemRatings, DiffFreq) -><br /> % Gather the sum of the rating differences and frequencies for all available items given the known item ratings.<br /> RatingsPredicted = lists:foldl(<br /> fun({IdItemX, IdItemY, RatingX}, Dict) -><br /> case dict:find({IdItemY, IdItemX}, DiffFreq) of<br /> error -> Dict;<br /> {ok, {Diff, DFreq}} -><br /> dict:update(IdItemY,<br /> fun({Pred, PFreq}) -> {Pred + ((RatingX + Diff) * DFreq), PFreq + DFreq} end,<br /> {(RatingX + Diff) * DFreq, DFreq},<br /> Dict)<br /> end<br /> end,<br /> dict:new(),<br /> [{IdItemX, IdItemY, RatingX}<br /> || {IdItemX, RatingX} <- ItemRatings,<br /> {IdItemY, _} <- Items]),<br /> % Put the original (actual) ratings back into our prediction list.<br /> RatingsPredictedAndActual = lists:foldl(<br /> fun({Item,Rating}, Ratings) -> dict:store(Item, {Rating, 1}, Ratings) end,<br /> RatingsPredicted, ItemRatings),<br /> % Divide the total rating difference by the frequency and return as a list.<br /> [{Item, Rating / Freq} || {Item, {Rating, Freq}} <- dict:to_list(RatingsPredictedAndActual)].<br /><br /><br />print_data(Items, DataRating, DiffFreq) -><br /> [begin io:format("~s~n", [Name]), print_user_ratings(Items, ItemRatings) end<br /> || {_Id, Name, ItemRatings} <- DataRating],<br /> [print_item_diffs(Items, Item, DiffFreq) || Item <- Items],<br /> io:format("~n").<br /><br />print_user_ratings(Items, ItemRatings) -><br /> [begin {value, {Item, NameItem}} = lists:keysearch(Item, 1, Items),<br /> io:format(" ~12s --> ~4.2f~n", [NameItem, Rating]) end<br /> || {Item, Rating} <- lists:sort(ItemRatings)].<br /><br />print_item_diffs(Items, {ItemX, Name}, DiffFreq) -><br /> io:format("~n~12s:", [Name]),<br /> [case dict:find({ItemX, ItemY}, DiffFreq) of<br /> error -> io:format(" ");<br /> {ok, {Diff, Freq}} -> io:format(" ~6.3f ~1B", [Diff, Freq])<br /> end || {ItemY, _} <- Items].<br /><br /><br />% Long-running itemX process to manage a dictionary of {{itemX, itemY}, {Diff, Freq}} entries.<br />dict_itemX(IdItemX, DictDiffFreq) -><br /> receive<br /> {data, PidSender, IdItemY, DiffNew} -><br /> PidSender ! {done, self(), IdItemY},<br /> dict_itemX(IdItemX,<br /> dict:update({IdItemX, IdItemY},<br /> fun({DiffOld, FreqOld}) -> {DiffOld + DiffNew, FreqOld + 1} end,<br /> {DiffNew, 1}, DictDiffFreq));<br /> {results, PidParent} -> PidParent ! {results, self(), DictDiffFreq}<br /> end.<br /><br />% Build a matrix (dictionary) of {{itemX, itemY}, Difference, Frequency} entries.<br />build_matrix(Items, DataRating) -><br /> PidThis = self(),<br /> % Create a bunch of long-running processes to manage itemX data.<br /> PidItems = dict:from_list(<br /> [{IdItemX, spawn(fun() -> dict_itemX(IdItemX, dict:new()) end)} || {IdItemX, _NameItem} <- Items]),<br /> % Spawn a short-term process for each user's ratings and wait until they are all done.<br /> ?SPAWN_AND_WAIT(process_user_ratings(PidItems, Ratings), {_, _, Ratings} <- DataRating),<br /> % Retrieve and merge all the item process dictionaries, then divide all differences by their frequency.<br /> dict:map(<br /> fun(_Key, {Diff, Freq}) -> {Diff / Freq, Freq} end,<br /> dict:fold(<br /> fun(_IdItemX, PidItemX, DictIn) -><br /> PidItemX ! {results, PidThis},<br /> receive<br /> {results, PidItemX, DiffFreq} -><br /> dict:merge(fun(_, _, _) -> io:format("Key collision~n") end, DictIn, DiffFreq)<br /> end<br /> end,<br /> dict:new(), PidItems)).<br /><br />process_user_ratings(PidItems, Ratings) -><br /> % Spawn a short-term process for each itemX rating and wait until they are all done.<br /> ?SPAWN_AND_WAIT(process_user_itemX_ratings(dict:fetch(IdItemX, PidItems), RatingX, Ratings),<br /> {IdItemX, RatingX} <- Ratings).<br /><br />process_user_itemX_ratings(PidItemX, RatingX, Ratings) -><br /> % Spawn a process for each itemY rating that sends a message to the long-running itemX process,<br /> % waits for confirmation that its message has been processed, then signals that it is done.<br /> ?SPAWN_AND_WAIT(begin<br /> PidItemX ! {data, self(), IdItemY, RatingX - RatingY},<br /> receive {done, PidItemX, IdItemY} -> ok end<br /> end, {IdItemY, RatingY} <- Ratings).</code></pre><br /><br /><br /><br />[*] If trapping exits is not your cup of tea then you could use plain old <code>spawn</code> like this:<br /><br /><pre><code>PidThis = self(),<br />[receive {done, Pid} -> ok end<br /> || Pid <- [spawn(fun() -><br /> <span style="font-weight:bold;">>>do something<<</span>,<br /> PidThis ! {done, self()}<br /> end)<br /> || <span style="font-weight:bold;">>>value/s <- list of value/s<<</span>]]</code></pre><br /><br />I had used this code until I thought of using <code>spawn_link</code>. I am not sure which version would be considered better by Erlang Gurus, but my personal preference leans towards <code>spawn_link</code>. <code>Spawn_link</code> seems easier to extend to handle multiple Erlang nodes, and if I was concerned about setting my main process to trap all exits then I could simply <code>spawn</code> a single process to manage all of the other <code>spawn_link</code>ing.Philip Robinsonhttp://www.blogger.com/profile/17605642659657207381noreply@blogger.comtag:blogger.com,1999:blog-1893218858608219953.post-57695300979790461602007-06-19T01:00:00.000+10:002007-06-19T01:32:20.718+10:00Collaborative Filtering: Weighted Slope One in Erlang<a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp2.blogger.com/_6zQp6LtHMTo/RnakRbit2_I/AAAAAAAAADE/45dS-70tKa8/s1600-h/wave.png"><img style="float:right; margin:0 0 10px 10px;cursor:pointer; cursor:hand;" src="http://bp2.blogger.com/_6zQp6LtHMTo/RnakRbit2_I/AAAAAAAAADE/45dS-70tKa8/s320/wave.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5077426249264913394" /></a><br />I have been toying with the <a href="http://netflixprize.com/">Netflix Challenge</a> for a while now. It's fascinating stuff.<br /><br />I knew nothing about collaborative filtering when I started this project, but that it pretty normal for my coding hobbies. (Hey, why would I start something new if I already knew how to do it?)<br /><br />During my standard "collect and absorb everything" phase I ran across an article on Wikipedia that described the <a href="http://en.wikipedia.org/wiki/Slope_One">Slope One algorithm</a>. This article had links to various implementations of the algorithm, including a <a href="http://www.daniel-lemire.com/fr/documents/publications/SlopeOne.java">standalone Java program by Daniel Lemire</a>. More information on the Weighted Slope One algorithm may be found on <a href="http://www.daniel-lemire.com/fr/abstracts/SDM2005.html">Daniel's site</a>.<br /><br />I am not using Slope One in my current Netflix algorithm attempt, but I translated Daniel's Java code into Erlang as a learning exercise anyway:<br /><br /><pre><code>%% slopeone.erl<br />% Philip Robinson<br />% A simple implementation of the weighted slope one<br />% algorithm in Erlang for item-based collaborative<br />% filtering.<br />% Based on the same algorithm in Java by Daniel Lemire and Marco Ponzi:<br />% http://www.daniel-lemire.com/fr/documents/publications/SlopeOne.java<br /><br />-module(slopeone).<br />-export([start/0]).<br /><br />start() -><br /> % The rating database: A list of users, each containing a list of {item, rating} elements.<br /> Items = [{item1,"candy"}, {item2,"dog"}, {item3,"cat"}, {item4,"war"}, {item5,"strange food"}],<br /> DataRating = [{user1, "Bob", [{item1,1.0}, {item2,0.5}, {item4,0.1}]},<br /> {user2, "Jane", [{item1,1.0}, {item3,0.5}, {item4,0.2}]},<br /> {user3, "Jo", [{item1,0.9}, {item2,0.4}, {item3,0.5}, {item4,0.1}]},<br /> {user4, "StrangeJo", [{item1,0.1}, {item4,1.0}, {item5,0.4}]}],<br /> % The difference & frequency database: an ETS table of {{item X, itemY}, diff, freq}.<br /> ets:new(diff_freq, [private, set, named_table]),<br /> % Create the predictor engine.<br /> build_matrix(DataRating),<br /> io:format("Here's the data I have accumulated...~n"),<br /> print_data(Items, DataRating),<br /> io:format("Ok, now we predict...~n"),<br /> TestRatings1 = [{item5,0.4}],<br /> io:format("Inputting...~n"),<br /> print_user_ratings(Items, TestRatings1),<br /> io:format("Getting...~n"),<br /> print_user_ratings(Items, predict(Items, TestRatings1)),<br /> TestRatings2 = [{item4,0.2}|TestRatings1],<br /> io:format("Inputting...~n"),<br /> print_user_ratings(Items, TestRatings2),<br /> io:format("Getting...~n"),<br /> print_user_ratings(Items, predict(Items, TestRatings2)),<br /> ets:delete(diff_freq).<br /><br />% Based on existing data, and using weights, try to predict all missing ratings.<br />% The trick to make this more scalable is to consider only diff_freq entries<br />% having a large (> 1) frequency entry.<br />predict(Items, ItemRatings) -><br /> PredFreq = ets:new(pred_freq, []),<br /> [ets:insert(PredFreq, {Item, 0.0, 0}) || {Item, _} <- Items],<br /> [[case {ets:match(diff_freq, {{ItemY, ItemX}, '$1', '$2'}),<br /> ets:match(PredFreq, {ItemY, '$1', '$2'})} of<br /> {[], _} -> ok;<br /> {[[Diff, DFreq]], [[Pred, PFreq]]} -><br /> ets:insert(PredFreq, {ItemY, Pred + ((RatingX + Diff) * DFreq), PFreq + DFreq})<br /> end || {ItemY, _} <- Items] || {ItemX, RatingX} <- ItemRatings],<br /> ets:match_delete(PredFreq, {'_', '_', 0}), % Remove all zero-frequency predictions.<br /> [ets:insert(PredFreq, {Item, Rating, 1}) || {Item, Rating} <- ItemRatings], % Re-insert actual ratings.<br /> Results = [{Item, Rating / Freq} || {Item, Rating, Freq} <- ets:tab2list(PredFreq)],<br /> ets:delete(PredFreq),<br /> Results.<br /><br />print_data(Items, DataRating) -><br /> [begin io:format("~s~n", [Name]),<br /> print_user_ratings(Items, ItemRatings) end<br /> || {_Id, Name, ItemRatings} <- DataRating],<br /> [print_item_diffs(Items, Item) || Item <- Items],<br /> io:format("~n").<br /><br />print_user_ratings(Items, ItemRatings) -><br /> [begin {value, {Item, NameItem}} = lists:keysearch(Item, 1, Items),<br /> io:format(" ~12s --> ~4.2f~n", [NameItem, Rating]) end<br /> || {Item, Rating} <- lists:sort(ItemRatings)].<br /><br />print_item_diffs(Items, {Item, Name}) -><br /> io:format("~n~12s:", [Name]),<br /> [case ets:match(diff_freq, {{Item, Id}, '$1', '$2'}) of<br /> [] -> io:format(" ");<br /> [[Diff, Freq]] -> io:format(" ~6.3f ~1B", [Diff, Freq])<br /> end || {Id, _} <- Items].<br /><br />% Build a matrix (ETS table) of {{itemX, itemY}, Difference, Frequency} entries.<br />build_matrix(DataRating) -><br /> % Gather the sum difference and the total count (frequency).<br /> [[[case ets:lookup(diff_freq, {ItemX, ItemY}) of<br /> [] -> ets:insert(diff_freq, {{ItemX, ItemY}, RatingX - RatingY, 1});<br /> [{Key, Diff, Freq}] -> ets:insert(diff_freq, {Key, Diff + RatingX - RatingY, Freq + 1})<br /> end || {ItemY, RatingY} <- ItemRatings]<br /> || {ItemX, RatingX} <- ItemRatings]<br /> || {_, _, ItemRatings} <- DataRating],<br /> % Divide sum of difference by frequency to get mean difference.<br /> DivideFun = fun({Key, Diff, Freq}, _) -> ets:insert(diff_freq, {Key, Diff / Freq, Freq}) end,<br /> ets:foldl(DivideFun, undefined, diff_freq).</code></pre><br /><br />Some musings:<br /><br />* I do not really consider this code to be a typical "Erlang-style" program. In particular, I am making substantial use of side-effects via ETS tables; comparisons between this code and the Java original will probably not be representative of comparisons between Erlang and Java programs in general.<br /><br />* I may have gone a little overboard with the use of list comprehensions. I did not really like LCs when first exposed to them and it seems that I am overcompensating for that now.<br /><br />* While some functions are obviously shorter in this Erlang version (compare <code>build_matrix</code> and <code>buildDiffMatrix</code>, for example), I am not convinced that they are necessarily clearer in Erlang than in Java. At least one advantage of smaller functions is that I can fit more of them on my 8.9" screen, but if that was my main concern then I would be programming in something even less verbose.<br /><br />* While I haven't delved into this much, I suspect that using ETS has made this program harder to parallelise effectively. While the loops could easily be converted to use a parallel map algorithm, all CRUD activity still has to go through a single ETS process bottleneck. One possible way of utilising multiple CPUs might be to spawn a bunch of processes to manage individual items, send each of these messages regarding specific ratings, and then convert the results into a dictionary via a list comprehension of <code>receive</code> statements and <code>dict:from_list/1</code>.<br /><br />* I did run the Dialyzer over the code but I have not even considered optimising it for performance. It <span style="font-style:italic;">will</span> be slower than the Java version. :-)Philip Robinsonhttp://www.blogger.com/profile/17605642659657207381noreply@blogger.comtag:blogger.com,1999:blog-1893218858608219953.post-84681690419019844872007-04-22T20:54:00.000+10:002007-04-22T22:20:10.562+10:00Erlang Macro Processor (v2), Part V<a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp1.blogger.com/_6zQp6LtHMTo/Ris_WkGm_VI/AAAAAAAAAC8/3tr4VPEa5Y4/s1600-h/CoralCrabLobsterRockSmall.png"><img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer;" src="http://bp1.blogger.com/_6zQp6LtHMTo/Ris_WkGm_VI/AAAAAAAAAC8/3tr4VPEa5Y4/s320/CoralCrabLobsterRockSmall.png" alt="" id="BLOGGER_PHOTO_ID_5056204663534583122" border="0" /></a>The final step for EMP2 is to expand any remote macro function calls and insert the results back into the AST.<br /><br />Naively we would just follow the same pattern as the <code>macro</code> attribute expansion that we have just added:<br /><pre><code>node_parse(Node={call,Line,{remote,_,{atom,_,Mod},{atom,_,Fun}},Args}, Mods) -><br /> case lists:member(Mod, Mods) of<br /> true -><br /> ast_from_results(lists:flatten([apply(Mod,Fun,Args)|" "]), Line, []);<br /> false -> setelement(4,Node,node_parse(Args, Mods))<br /> end;<br /></code></pre>But if we do, we find that there are three(!) problems with this approach.<br /><br />Firstly, <code>ast_from_results</code> is currently using <code>erl_parse:parse_form</code> to turn the textual macro results into an AST. This only works for complete Erlang forms (function definitions) and not for, say, a set of three Erlang expressions to be inserted into a function. We can fix this by using <code>erl_parse:parse_exprs</code> instead, but we will also have to append a full-stop and space to the result string (instead of just a space) to get it to work properly.<br /><br />Secondly, the arguments for the function call are all in AST format with tuples and line numbers everywhere. We cannot just apply the function directly to these arguments; we need to convert them back to something more usable.<br /><br />Finally, we may receive more than one Erlang expression from the macro. To fit these back into the space of a single node we have to wrap them in a <code>block</code> expression.<br /><br /><br />To tackle the first issue we need to update <code>ast_from_results</code> a little:<br /><pre><code>ast_from_results(FunParse, ResultsString, LineStart, ASTResults) -><br /> case remove_leading_whitespace(ResultsString) of<br /> "" -> lists:flatten(lists:reverse(ASTResults));<br /> String -><br /> {done,{ok,Tokens,LineEnd},StringRest} =<br /> erl_scan:tokens([], String, LineStart),<br /> {ok, AST} = erl_parse:FunParse(Tokens),<br /> ast_from_results(FunParse, StringRest, LineEnd, [AST|ASTResults])<br /> end.<br /></code></pre><br />As an aside, you might like to have a closer look at that <code>erl_parse:FunParse</code> call.<br /><br />Yes, instead of hard-coding a function call or adding an extra <code>if</code> statement, we are calling the <code>erl_parse</code> function via a variable whose value we will not know until run-time<span style="font-weight: bold;">[1]</span>. Doesn't thinking about that just make you go all tingly inside? No? Me neither. Of course.<br /><br />We can now use <code>ast_from_results</code> for <code>erl_parse:parse_form</code> and <code>erl_parse:parse_exprs</code> situations with only a single additional "erl_parse function" argument.<br /><br /><br />For the second issue I am going to use a cheap and nasty hack. Because we are not (yet) supporting anything fancier than literal terms in the argument list, we can get away with this little bit of trickery to convert the arguments into something usable by our call to <code>apply</code>:<br /><pre><code>ArgsLiteral = [Value || {_Type,_Line,Value} <- Args]. </code></pre><br /><br />The third issue is also very easily fixed by wrapping the call to <code>ast_from_results</code> in a <code>block</code> expression tuple. We should only do this if there is more than one node in the results list:<br /><pre><code>node_parse(Node={call,Line,{remote,_,{atom,_,Mod},{atom,_,Fun}},Args}, Mods) -><br /> case lists:member(Mod, Mods) of<br /> true -><br /> ArgsLiteral = [Value || {_Type,_Line,Value} <- Args],<br /> Results = lists:flatten([apply(Mod,Fun,ArgsLiteral)|". "]),<br /> case length(Results) of<br /> 1 -> hd(Results);<br /> _ -> {block,Line,ast_from_results(parse_exprs, Results, Line, [])}<br /> end;<br /> false -> setelement(4,Node,node_parse(Args, Mods))<br /> end;<br /></code></pre><br />Oh, and of course we need to update the other <code>node_parse</code> function clause to include the new argument to <code>ast_from_results</code>:<br /><pre><code>node_parse({attribute,Line,macro,{Mod,Fun,Args}}, _Mods) -><br /> ast_from_results(parse_form, lists:flatten([apply(Mod,Fun,Args)|" "]), Line, []);<br /></code></pre><br /><br />And with any luck we are done. Let's try it out on our <a href="http://chlorophil.blogspot.com/2007/04/erlang-macro-processor-v2-part-i.html">example code</a>.<br /><pre><code><br />1> CL = fun(F) -> c(F), l(F) end.<br />#Fun<br />2> CL(emp2), CL(example_macro), CL(example).<br />{module, example}<br />3> [example:lookup(N) || N <- lists:seq(0, 3)]. [0,2,4,6] 4><br /></code></pre><br />Yep. EMP2 is done.<br /><br /><br />The full listing:<br /><pre><code><br />-module(emp2).<br />-author("Philip Robinson").<br />-vsn('1.0').<br />-export([parse_transform/2]).<br /><br />parse_transform(AST, _Options) -><br /> Mods = lists:flatten([Mods || {attribute,_Line,macro_modules,Mods} <- AST]),<br /> lists:flatten([node_parse(Node, Mods) || Node <- AST]).<br />node_parse({attribute,Line,macro,{Mod,Fun,Args}}, _Mods) -><br /> ast_from_results(parse_form, lists:flatten([apply(Mod,Fun,Args)|" "]), Line, []);<br />node_parse(Node={call,Line,{remote,_,{atom,_,Mod},{atom,_,Fun}},Args}, Mods) -><br /> case lists:member(Mod, Mods) of<br /> true -><br /> ArgsLiteral = [Value || {_Type,_Line,Value} <- Args],<br /> Results = lists:flatten([apply(Mod,Fun,ArgsLiteral)|". "]),<br /> case length(Results) of<br /> 1 -> hd(Results);<br /> _ -> {block,Line,ast_from_results(parse_exprs,Results,Line,[])}<br /> end;<br /> false -> setelement(4,Node,node_parse(Args, Mods))<br /> end;<br />node_parse(Node, Mods) when is_list(Node) -><br /> [node_parse(Element, Mods) || Element <- Node];<br />node_parse(Node, Mods) when is_tuple(Node) -><br /> list_to_tuple([node_parse(Element, Mods) || Element <- tuple_to_list(Node)]);<br />node_parse(Node, _Mods) -> Node.<br /><br />args_from_ast(AST) -> [Value || {_Type,_Line,Value} <- AST].<br /><br />ast_from_results(FunParse, ResultsString, LineStart, ASTResults) -><br /> case remove_leading_whitespace(ResultsString) of<br /> "" -> lists:flatten(lists:reverse(ASTResults));<br /> String -><br /> {done,{ok,Tokens,LineEnd},StringRest} =<br /> erl_scan:tokens([], String, LineStart),<br /> {ok, AST} = erl_parse:FunParse(Tokens),<br /> ast_from_results(FunParse, StringRest, LineEnd, [AST|ASTResults])<br /> end.<br /><br />remove_leading_whitespace([9 |String]) -> remove_leading_whitespace(String);<br />remove_leading_whitespace([10|String]) -> remove_leading_whitespace(String);<br />remove_leading_whitespace([32|String]) -> remove_leading_whitespace(String);<br />remove_leading_whitespace( String ) -> String.<br /></code></pre><br />EMP2: Entirely painful compile-time macros for functions and expressions, in 45 lines of obscure, uncommented, and unreadable Erlang code.<br /><span style="font-size:85%;">(If you think that this code is bad, wait until you see EMP3.)<br /></span><br /><br /><span style="font-weight: bold;">[1]</span> Run-time for EMP2 is, of course, compile-time for the module that we are using EMP2 to transform.Philip Robinsonhttp://www.blogger.com/profile/17605642659657207381noreply@blogger.comtag:blogger.com,1999:blog-1893218858608219953.post-77629033728324893522007-04-22T14:41:00.000+10:002007-04-22T14:56:54.043+10:00Erlang Macro Processor (v2), Part IV<a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp3.blogger.com/_6zQp6LtHMTo/RirnuUGm_UI/AAAAAAAAAC0/-MapHnBlMR4/s1600-h/GeckoFlySmall.png"><img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer;" src="http://bp3.blogger.com/_6zQp6LtHMTo/RirnuUGm_UI/AAAAAAAAAC0/-MapHnBlMR4/s320/GeckoFlySmall.png" alt="" id="BLOGGER_PHOTO_ID_5056108314533231938" border="0" /></a>Okay, now we are getting somewhere. Time to expand some macros!<br /><br />To begin with we will start with something easy, like duplicating EMP1's functionality. We already have code from EMP1 to expand the <code>-macro</code> attribute entries, but unfortunately we cannot just cut-and-paste the EMP1 code into EMP2; our AST-walking is slightly different and we need to adjust <code>ast_reversed_results</code>:<br /><pre><code><br />ast_from_results(ResultsString, LineStart, ASTResults) -><br /> case remove_leading_whitespace(ResultsString) of<br /> "" -> lists:reverse(ASTResults);<br /> String -><br /> {done,{ok,Tokens,LineEnd},StringRest} =<br /> erl_scan:tokens([], String, LineStart),<br /> {ok, AST} = erl_parse:parse_form(Tokens),<br /> ast_from_results(StringRest, LineEnd, [AST|ASTResults])<br /> end.<br /></code></pre><br />We change the <code>-macro</code> clause for <code>node_parse</code> to call the new function:<br /><pre><code><br />node_parse({attribute,Line,macro,{Mod,Fun,Args}}, _Mods) -><br /> ast_from_results(lists:flatten([apply(Mod,Fun,Args)|" "]), Line, []);<br /></code></pre><br />And that obscene <code>remove_leading_whitespace</code> function has returned:<br /><pre><code><br />remove_leading_whitespace([9 |String]) -> remove_leading_whitespace(String);<br />remove_leading_whitespace([10|String]) -> remove_leading_whitespace(String);<br />remove_leading_whitespace([32|String]) -> remove_leading_whitespace(String);<br />remove_leading_whitespace( String ) -> String.<br /></code></pre><br />The only difference between <code>ast_from_results</code> and <code>ast_reversed_results</code> is that <code>ast_from_results</code> keeps the resulting AST in the same order as the input ResultsString argument (it kindly reverses its already-reversed results for us before passing them back).<br /><br />Unlike EMP1, EMP2 does NOT want to receive the results of the expanded AST in reversed order. We are not following the "build a list in reverse and then reverse the result" model for our AST (which works just fine for traversing the top level only), but rather using a recursive descent model for AST parsing. In this situation we need to keep the results in the order that they appear.<br /><br /><br />Now we have the EMP2 module reproducing the functionality of EMP1, and at only a few more lines of code. The only thing left to do is identify macro function calls, apply them, and insert the parsed results into the AST in place of the original call.<br /><br />Ha!<br /><br /><br />For remote function calls we have two situations to handle:<br /><ul><li>The remote function call is to a macro, and</li><li>The remote function call is <span style="font-style: italic;">not</span> to a macro.</li></ul><br />The easier case is when the remote function call is <span style="font-style: italic;">not</span> to a macro function. We pretty much just want the default tuple node function to run on the node, but we cannot (easily) get there because this more-specific function clause will have intercepted the node before the default code gets a chance to run on it.<br /><br />We could encapsulate the common default code in another function (or a substitution macro), but for simplicity's sake I will just build the required node in place with the <code>setelement</code> function. It is not a large amount of code:<br /><pre><code><br />node_parse(Node={call,Line,{remote,_,{atom,_,Mod},{atom,_,Fun}},Args}, Mods) -><br /> case lists:member(Mod, Mods) of<br /> true -><br /> io:format("Function-level macro call: ~w~n", [Node]),<br /> Node;<br /> false -> setelement(4,Node,node_parse(Args, Mods))<br /> end;<br /></code></pre><br /><br />Next up: The final installment - expanding remote macro function calls.Philip Robinsonhttp://www.blogger.com/profile/17605642659657207381noreply@blogger.comtag:blogger.com,1999:blog-1893218858608219953.post-35548244930707889192007-04-22T09:24:00.000+10:002007-04-22T09:35:21.739+10:00Erlang Macro Processor (v2), Part III<a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp0.blogger.com/_6zQp6LtHMTo/RiqdbkGm_TI/AAAAAAAAACs/NEn-HWNRTr8/s1600-h/PalmTreesLagoonSmall.png"><img style="float:right; margin:0 0 10px 10px;cursor:pointer; cursor:hand;" src="http://bp0.blogger.com/_6zQp6LtHMTo/RiqdbkGm_TI/AAAAAAAAACs/NEn-HWNRTr8/s320/PalmTreesLagoonSmall.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5056026628550229298" /></a><br />The top level of the AST is a list of nodes, rather than a node in its own right, so we might write our first attempt at an[other] AST walker like this:<br /><pre><code><br />parse_transform(AST, _Options) -><br /> Mods = lists:flatten([Mods || {attribute,_Line,macro_modules,Mods} <- AST]),<br /> lists:flatten([node_parse(Node, Mods) || Node <- AST]).<br /><br />node_parse(Node, _Mods) -> Node.<br /></code></pre><br />The <code>parse_transform</code> function calls <code>node_parse</code> on each top-level node in the AST. It calls <code>lists:flatten</code> on the result because - as we already know - the EMP1-variety of top-level macro expansion may return more than one function definition from a single macro call. These definitions all need to be at the same "height" as the others, so the resulting deep list of nodes needs to be flattened.<br /><br />These two functions together will traverse the top level of the AST but not examine any sub-nodes. To do that we need to split the atom... er, node tuples, and parse each element in sequence:<br /><pre><code><br />node_parse(Node, Mods) when is_tuple(Node) -><br /> list_to_tuple([node_parse(Element, Mods) || Element <- tuple_to_list(Node)]).<br /></code></pre><br />Now if we were to compile and run this on our example.erl file we would get a big fat error... it turns out that not every element in a node tuple is actually another node tuple (but we already knew that, too). Some of the elements are lists, and some of them are atoms or integers. A few extra clauses should take care of these conditions:<br /><pre><code><br />node_parse(Node, Mods) when is_list(Node) -><br /> [node_parse(Element, Mods) || Element <- Node];<br />node_parse(Node, _Mods) -> Node.<br /></code></pre><br /><br />Here is the whole module in one piece:<br /><pre><code><br />-module(emp2).<br />-export([parse_transform/2]).<br /><br />parse_transform(AST, _Options) -><br /> Mods = lists:flatten([Mods || {attribute,_Line,macro_modules,Mods} <- AST]),<br /> lists:flatten([node_parse(Node, Mods) || Node <- AST]).<br /><br />node_parse(Node, Mods) when is_list(Node) -><br /> [node_parse(Element, Mods) || Element <- Node];<br />node_parse(Node, Mods) when is_tuple(Node) -><br /> [Type,Line|ListElements] = tuple_to_list(Node),<br /> Results = [node_parse(Element, Mods) || Element <- ListElements],<br /> list_to_tuple([Type,Line|Results]);<br />node_parse(Node, _Mods) -> Node.<br /></code></pre><br />And that is all we need to generically walk the entire AST.<br /><br />Trapping the specific nodes we want to macro-expand is also rather trivial. We need to catch <code>macro</code> module attributes and remote function calls, and to do that we just add two new clauses to the <code>node_parse</code> function:<br /><pre><code><br />node_parse(Node={attribute,Line,macro,{Mod,Fun,Args}}, _Mods) -><br /> io:format("Line ~B: EMP1-style macro attribute found.~n", [Line]),<br /> % Do macro-expansion of attribute's Mod, Fun, and Args values.<br /> Node;<br />node_parse(Node={call,Line,{remote,L,{atom,_,Mod},{atom,_,Fun}},Args}, Mods) -><br /> io:format("Line ~B: EMP2-style remote function call found.~n", [Line]),<br /> % Test whether the remote call is to a macro module.<br /> % If so, expand it. Otherwise traverse node as usual.<br /> setelement(4, Node, node_parse(Args, Mods));<br /></code></pre><br /><br />Next up: Expanding the macros.Philip Robinsonhttp://www.blogger.com/profile/17605642659657207381noreply@blogger.comtag:blogger.com,1999:blog-1893218858608219953.post-16655199673882317022007-04-21T15:04:00.000+10:002007-04-21T21:29:02.252+10:00Erlang Macro Processor (v2), Part II<a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp1.blogger.com/_6zQp6LtHMTo/RimhyEGm_SI/AAAAAAAAACk/wiANDYvWW3g/s1600-h/FishingBootSmall.png"><img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer;" src="http://bp1.blogger.com/_6zQp6LtHMTo/RimhyEGm_SI/AAAAAAAAACk/wiANDYvWW3g/s320/FishingBootSmall.png" alt="" id="BLOGGER_PHOTO_ID_5055749938167086370" border="0" /></a>You know we need to do it eventually, so let's get the boring "find <code>-macro_modules</code> attributes and store their values" bit out of the way so we can move on to some more interesting stuff. Here we go:<br /><pre><code><br />-module(emp2).<br />-export([parse_transform/2]).<br /><br />parse_transform(AST, _Options) -><br /> Mods = lists:flatten([Mods || {attribute,_Line,macro_modules,Mods} <- AST]),<br /> io:format("Macro Modules: ~p~n", [Mods]),<br /> AST.<br /></code></pre><br />Ah, whoops. One of those <a href="http://www.erlang.org/doc/doc-5.5.4/doc/reference_manual/expressions.html#6.22">list comprehension</a> thingies seems to have slipped into the parse_transform function. To get rid of it we just have to change that line into something like this:<br /><br /><pre><code> Mods = lists:flatten(lists:map(<br /> fun ({attribute,_Line,macro_modules,Mods}) -> Mods;<br /> (_Node) -> []<br /> end,<br /> AST)),<br /></code></pre><br />Hmmm. On second thoughts, maybe we should keep the list comprehension.<br /><br />I believe that list comprehensions are a relatively new feature in Erlang so you may not see too many of them in existing code, but they really are worth learning. (Erlang is in good company: Python and Haskell have list comprehensions too.)<br /><br /><br />Back from that tangent and to the program at hand, we see that the macro module names are being stored in an ordinary list. I expect that only a few macro modules (probably only one at most) will be specified in any given module, and looking for an element in a one-element list is pretty quick, so we should not be needing the indexing overhead of a dictionary. I also don't particularly mind if someone specifies a macro module more than once, or if a specified macro module is never used. (If we were <span style="font-style: italic;">really</span> <span>concerned</span> about duplicate macro module names then we could use one of the <code>list</code> module functions to easily remove them.)<br /><br />We could also roll the gathering of the <code>macro_modules</code> attributes up into the AST-walking code, but conceptually it is nicer to keep it up here and out of the way. Also, as this code only traverses the very top level of the AST it should be quite quick. Pattern-matching one entry per module attribute and function definition is not a computationally expensive task.<br /><br />Right, the boring stuff is done; let's get into parsing that AST.<br /><br /><br />As I briefly mused at the bottom of a previous <a href="http://chlorophil.blogspot.com/2007/04/atomiser-part-iv.html">Atomiser</a> post:<br /><br /><blockquote>Rather than consisting of a bunch of pattern matching clauses, the walk_ast function could be made "smarter" by transforming the given node tuple into a list, and applying some rules-based logic to the elements of that list (from the third element onwards).</blockquote><br />I reckon we could give this a go and see where we end up. (Either it will work and we have learned something new, or it won't work and we will have learned something new, so it is a win-win situation either way.)<br /><br />You might recall that the Atomiser <code>walk_ast</code> function had a clause for each node type. This was a great way for me to implement the Atomiser because I got to see the AST nodes that made up my programs, but in the end it has turned out to be a pretty ugly function.<br /><br />Here are a few lines of the <code>walk_ast</code> function as a quick refresher (the substitution macro actually makes the code nicer than it could be):<br /><pre><code><br />?WALK_AST({call,_Line,_Fun,Args}, Args); % Handles local and module calls.<br />?WALK_AST({'case',_Line,Test,Clauses}, [Test|Clauses]);<br />?WALK_AST({'catch',_Line,Expr}, Expr);<br />?WALK_AST({char,_Line,_Char}, []);<br /></code></pre><br />And those clauses go on (and on!) for about forty different node types...<br /><br />I would much rather only have specific clause for handling each node that we are interested in, and use some generic code to handle the rest. But if we want to create some rules to manage these nodes generically then we had better find some patterns in all of that mess.<br /><br />...<br /><br />The first (and blindingly obvious) thing to notice about the nodes is that - without exception - they are all tuples. (I know, I know: I am a genius. Applause is not strictly necessary. Really. Oh, all right then, a little bit of applause is okay, if you insist.)<br /><br />Two of these tuple nodes are not quite the same as the others: {error, Details} and {warning, Details}. In all of the other nodes the first element is the node type and the second element is the line number of the source file that the node appears in. After that there are a variable number of elements (possibly none) with node-specific meanings.<br /><br />We are interested in catching <code>-macro</code> attributes (so EMP2 can do the work of EMP1) as well as remote function call nodes that are calling a macro function. Everything else is irrelevant, except that we want to recursively descend into sub-nodes to keep searching for other remote macro function calls.<br /><br />If we take a closer look at the elements of nodes we will note that the element is always either a list, a tuple, or atomic (i.e.: an atom or an integer). These elements might have a special meaning to the compiler (depending on their location in the current node tuple) but to us they are just potential sub-nodes. If the node does not match an attribute or remote function call pattern then the elements have no meaning to EMP2 and we can treat them as homogenous lumps of node matter.<br /><br />Of the additional elements in a node (if any), they are either<br /><ul><li>a list, which we can parse as its own sub-AST,</li><li>a tuple, which we can parse as another node, or</li><li>atomic (or integer), which we can pass back as-is.</li></ul><br />I think that all of these notes are probably enough to get us started coding.Philip Robinsonhttp://www.blogger.com/profile/17605642659657207381noreply@blogger.comtag:blogger.com,1999:blog-1893218858608219953.post-32114309003253732832007-04-19T22:50:00.000+10:002007-04-20T11:23:40.599+10:00Erlang Macro Processor (v2), Part I<a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp1.blogger.com/_6zQp6LtHMTo/Ridl_kGm_QI/AAAAAAAAACU/dDsAOTFAiW8/s1600-h/SunFlowerSmall.png"><img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer;" src="http://bp1.blogger.com/_6zQp6LtHMTo/Ridl_kGm_QI/AAAAAAAAACU/dDsAOTFAiW8/s320/SunFlowerSmall.png" alt="" id="BLOGGER_PHOTO_ID_5055121249444232450" border="0" /></a><a href="http://chlorophil.blogspot.com/2007/04/erlang-macro-processor-v1-part-i.html">EMP1</a> is all well and good, but it does have more than its fair share of idiosyncratic behaviour<span style="font-weight: bold;">[1]</span>:<br /><ul><li>EMP1 can only be used to create full functions at the top level of a module. This makes it a bit more difficult to use than strictly necessary, especially if we only want to generate a term to use within a function.</li><li>Arguments passed to the macro must be literal values - no function calls allowed!</li><li>Macros must be defined in a separate module, which must be compiled before the macro-using module is compiled.</li></ul>Quite frankly that first point bugs the hell out of me. I really should not have to write a macro that returns an <span style="font-style: italic;">entire</span> function definition if I only need to generate a small portion of a function.<br /><br />Today we will begin to tackle this issue with EMP2, but before we dive straight into the parse_transform code I would like to spend a few moments updating our example code. The rewrite will make the example_macro.erl and example.erl modules use the as-yet-unwritten EMP2 module functionality. I probably won't explicitly show it in these posts, but the compile errors I get from running EMP2 over example.erl will have a big influence over the direction that its development takes.<br /><br /><br />We will still need a separate macro module, but the macro function will only generate the lookup table itself rather than return a whole function definition:<br /><br /><span style="font-size:85%;">-module(example_macro).<br />-export([lookup_binary/1]).<br /><br />lookup_binary(Size) -><br />&nbsp;&nbsp;&nbsp;&nbsp;[[$,,FirstVal]|NumberString] = lists:map(<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;fun(Offset) -> io_lib:format(",~B", [Offset * 2]) end,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;lists:seq(0, Size - 1)),<br />&nbsp;&nbsp;&nbsp;&nbsp;"<<" ++ [FirstVal] ++ NumberString ++ ">>".</span><br /><br />We have lost the code that produces the whole function and only kept our lookup binary creation function, which also seems to have picked up a jaunty little Size argument from somewhere. As before, each element's value is twice its offset (modulo 256: we are only storing bytes after all).<br /><br />To check that the new macro code works correctly:<br /><br /><span style="font-size:85%;">1> CL = fun(F) -> c(F), l(F) end.<br />#Fun<br />2> CL(example_macro).<br />{module,example_macro}<br />3> M = fun(R) -> io:format("~s~n", [lists:flatten(R)]) end.<br />#Fun<br />4> M(example_macro:lookup_binary(4)).<br /><<0,2,4,6>><br />ok<br />5></span><br /><br />And we also have to rewrite the module that calls this lookup macro:<br /><br /><span style="font-size:85%;">-module(example).<br />-export([lookup/1]).<br /><br />-compile({parse_transform, emp2}).<br />-macro_modules([example_macro]).<br /><br />lookup(Offset) -><br />&nbsp;&nbsp;&nbsp;&nbsp;<<_:offset/binary,value:8/integer,_/binary>> =<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;example_macro:lookup_binary(4),<br />&nbsp;&nbsp;&nbsp;&nbsp;Value.</span><br /><br /><br />This does look a lot nicer than the EMP1 version. Only the snippet of code that needs to be dynamically generated is in the macro module; the rest of the code is in the standard module where it belongs, and the macro call is in a much more appropriate place - inside the function that uses it - than lurking within a module attribute.<br /><br />With EMP1 we had to peek inside another module to see that a lookup/1 function was being generated, but here we can see that fact already in front of us. We can even guess that a binary term will be created just from the context around the macro call.<br /><br />Note that 'emp1' has changed to 'emp2' in the parse_transform compiler directive, and that we need a new 'macro_modules' module attribute to tell EMP2 which remote function calls are to be expanded at compile-time.<br /><br />Once we have written EMP2 and compiled all the modules,we should be able to run the lookup function and receive the same results as we did before:<br /><br /><span style="font-size:85%;">1> lists:map(fun(N) -> example:lookup(N) end, lists:seq(0, 3)).<br />[0,2,4,6]<br />2></span><br /><br /><br />We shall see.<br /><br /><br /><span style="font-weight: bold;">[1]</span> And I cannot have all that competition floating around out there, you know.Philip Robinsonhttp://www.blogger.com/profile/17605642659657207381noreply@blogger.comtag:blogger.com,1999:blog-1893218858608219953.post-27611181761890173492007-04-17T01:43:00.000+10:002007-04-17T22:41:25.235+10:00The Atomiser, Redux<a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp3.blogger.com/_6zQp6LtHMTo/RiOR4WFNYYI/AAAAAAAAACM/vurFtgFX80o/s1600-h/At-omiser1.1Small.png"><img style="float:right; margin:0 0 10px 10px;cursor:pointer; cursor:hand;" src="http://bp3.blogger.com/_6zQp6LtHMTo/RiOR4WFNYYI/AAAAAAAAACM/vurFtgFX80o/s320/At-omiser1.1Small.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5054043604025958786" /></a>I have received some great comments and suggestions regarding the Atomiser; as a result I have added a new (optional) feature to the module. (Don't worry - The Atomiser may be new and improved, but is still 100% backwardly-compatible!)<br /><br /><br />As usual, you may specify a list of globally-valid atoms:<br /><br /><span style="font-size:85%;">-atoms([atom1, atom2...]).</span><br /><br />You may now also specify function-specific atom lists in two ways. The first method is to add a function name (only) to an atoms declaration entry. The atoms specified will then be valid within all 'fun_name' functions, regardless of the arity of those function definitions:<br /><br /><span style="font-size:85%;">-atoms({fun_name, [atom1, atom2...]}).</span><br /><br />(Unfortunately we have to wrap this all information up in a single tuple: 'wild' module attributes can only contain one value.)<br /><br />To be even more specific you may add a function name <span style="font-style:italic;">and</span> an arity to an atoms declaration. These atoms will then be valid within that specific 'fun_name/arity' function definition:<br /><br /><span style="font-size:85%;">-atoms({fun_name, arity, [atom1, atom2...]}).</span><br /><br />Atoms declarations are cumulative: globally-valid atoms (if any) are included along with function and function/arity atoms when checking for valid atoms within a given function definition.<br /><br /><br />You might notice that in the code below I have added a few new clauses into the walk_ast function. I was a bit concerned that I may have missed some node types from the Erlang AST, so I cracked open the only reference I had seen of the <a href="http://www.erlang.org/doc/doc-5.5.4/erts-5.5.4/doc/html/absform.html#4">Abstract Format</a> and added a few more function clauses that I had initially overlooked. I am pretty sure that just about everything is in there now, but feel free to disabuse me of that notion. :-)<br /><br /><br />Finally, I cleaned up the ?WALK_AST macro a little so that it no longer requires a list of ASTs to process: it now works directly off a single AST. Removing embedded lists has simplified the use of this macro quite considerably.<br /><br /><br />The new Atomiser Module:<br /><br /><br /><span style="font-size:85%;">-module(atomiser).<br />-author("Philip Robinson").<br />-vsn('1.1.1').<br />-export([parse_transform/2]).<br />%-compile({parse_transform, atomiser}). % Uncomment after initial compile.<br /><br />-atoms([base_dict_key,error, ok]). % Atoms used in four or more functions.<br />-atoms({atoms_check, 5, [found]}).<br />-atoms({atoms_unused_print, 1, [found]}).<br />-atoms({key_more_general, 1, [function]}).<br />-atoms({parse_transform, 2, [report_warnings,true]}).<br />-atoms({walk_ast, 3, [atom, atoms, attribute, b_generate, bc, bin, bin_element,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;block, call, 'case', 'catch', char, clause, clauses, cons, eof, float,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;'fun', function, generate, 'if', integer, lc, match, nil, op, 'query',<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;'receive', record, record_field, string, 'try', tuple, var, warning]}).<br /><br />parse_transform(AST, Options) -><br />&nbsp;&nbsp;&nbsp;&nbsp;DictAtomsAll = dict:store(base_dict_key, dict:new(), dict:new()),<br />&nbsp;&nbsp;&nbsp;&nbsp;case lists:member(report_warnings, Options) of<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;true -> atoms_unused_print(walk_ast(AST, base_dict_key, DictAtomsAll));<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;_ -> ok<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;end,<br />&nbsp;&nbsp;&nbsp;&nbsp;AST.<br /><br />dict_with_added_atoms(Line, AtomList, DictInitial) -><br />&nbsp;&nbsp;&nbsp;&nbsp;AddAtom = fun(Atom, Dict) -><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;case dict:find(Atom, Dict) of<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;{ok,LineAlreadyDefined} -><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;io:format(<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"~s:~B Warning: atom '~w' already defined on line ~B.~n",<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[?FILE, Line, Atom, LineAlreadyDefined]),<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Dict;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;error -> dict:store(Atom, Line, Dict)<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;end<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;end,<br />&nbsp;&nbsp;&nbsp;&nbsp;lists:foldl(AddAtom, DictInitial, AtomList).<br /><br />atoms_from_attr(Line, Key, AtomList, Atoms) -><br />&nbsp;&nbsp;&nbsp;&nbsp;Dict = case dict:find(Key, Atoms) of {ok,D} -> D; error -> dict:new() end,<br />&nbsp;&nbsp;&nbsp;&nbsp;dict:store(Key, dict_with_added_atoms(Line, AtomList, Dict), Atoms).<br /><br />atoms_check(Atom, Line, KeyDict, Atoms) -><br />&nbsp;&nbsp;&nbsp;&nbsp;case dict:find(KeyDict, Atoms) of<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;{ok,Dict} -> atoms_check(Atom, Line, KeyDict, Dict, Atoms);<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;error -> atoms_check(Atom, Line, key_more_general(KeyDict), Atoms)<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;end.<br /><br />atoms_check(Atom, Line, KeyDict, Dict, Atoms) -><br />&nbsp;&nbsp;&nbsp;&nbsp;case dict:find(Atom, Dict) of<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;{ok,found} -> Atoms;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;{ok,_LineDefinedOn} -><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;dict:store(KeyDict, dict:store(Atom,found,Dict), Atoms);<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;error -><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;case KeyDict of<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;base_dict_key -><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;io:format("~s:~B Warning: atom '~w' unexpected.~n",<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[?FILE, Line, Atom]),<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Atoms;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;_ -> atoms_check(Atom, Line, key_more_general(KeyDict), Atoms)<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;end<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;end.<br /><br />key_more_general({function,Fun,_Arity}) -> {function,Fun};<br />key_more_general({function,_Fun}) -> base_dict_key.<br /><br />atoms_unused_print(Atoms) -><br />&nbsp;&nbsp;&nbsp;&nbsp;Filter = fun({_Atom,Line}) -> Line =/= found end,<br />&nbsp;&nbsp;&nbsp;&nbsp;DictsToList = fun({_DictKey,Dict}, UnusedAtoms) -><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;UnusedAtomsNew = lists:filter(Filter, dict:to_list(Dict)),<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;UnusedAtomsNewSorted = lists:keysort(2, UnusedAtomsNew),<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;lists:keymerge(2, UnusedAtomsNewSorted, UnusedAtoms)<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;end,<br />&nbsp;&nbsp;&nbsp;&nbsp;UnusedAtoms = lists:foldl(DictsToList, [], dict:to_list(Atoms)),<br />&nbsp;&nbsp;&nbsp;&nbsp;PrintUnusedAtom = fun({Atom,Line}) -><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;io:format("~s:~B Warning: atom '~w' unused.~n", [?FILE, Line, Atom])<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;end,<br />&nbsp;&nbsp;&nbsp;&nbsp;lists:foreach(PrintUnusedAtom, UnusedAtoms).<br /><br />-define(WALK_AST(PatternToMatch, ExpressionsToProcess),<br />&nbsp;&nbsp;&nbsp;&nbsp;walk_ast([PatternToMatch|ASTRest], Key, Atoms) -><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;walk_ast(ASTRest, Key, walk_ast(ExpressionsToProcess, Key, Atoms))).<br /><br />walk_ast([], _Key, Atoms) -> Atoms;<br />walk_ast([{atom,Line,Atom}|RestAST], Key, Atoms) -><br />&nbsp;&nbsp;&nbsp;&nbsp;walk_ast(RestAST, Key, atoms_check(Atom, Line, Key, Atoms));<br />walk_ast([{attribute,Line,atoms,{Fun,Arity,AtomList}}|RestAST], Key, Atoms) -><br />&nbsp;&nbsp;&nbsp;&nbsp;AtomsNew = atoms_from_attr(Line, {function,Fun,Arity}, AtomList, Atoms),<br />&nbsp;&nbsp;&nbsp;&nbsp;walk_ast(RestAST, Key, AtomsNew);<br />walk_ast([{attribute,Line,atoms,{Fun,AtomList}}|RestAST], Key, Atoms) -><br />&nbsp;&nbsp;&nbsp;&nbsp;AtomsNew = atoms_from_attr(Line, {function,Fun}, AtomList, Atoms),<br />&nbsp;&nbsp;&nbsp;&nbsp;walk_ast(RestAST, Key, AtomsNew);<br />walk_ast([{attribute,Line,atoms,AtomList}|RestAST], Key, Atoms) -><br />&nbsp;&nbsp;&nbsp;&nbsp;AtomsNew = atoms_from_attr(Line, base_dict_key, AtomList, Atoms),<br />&nbsp;&nbsp;&nbsp;&nbsp;walk_ast(RestAST, Key, AtomsNew);<br />?WALK_AST({attribute,_Line,_Name,_Value}, []); % Ignore all other attributes.<br />?WALK_AST({b_generate,_Line,Pattern,Expression}, [Pattern, Expression]);<br />?WALK_AST({bc,_Line,Head,Tail}, [Head|Tail]);<br />?WALK_AST({bin,_Line,BinElements}, BinElements);<br />?WALK_AST({bin_element,_Line,_Name,_Size,_Type}, []);<br />?WALK_AST({block,_Line,Expr}, [Expr]);<br />?WALK_AST({call,_Line,_Fun,Args}, Args); % Handles local and module calls.<br />?WALK_AST({'case',_Line,Test,Clauses}, [Test|Clauses]);<br />?WALK_AST({'catch',_Line,Expr}, Expr);<br />?WALK_AST({char,_Line,_Char}, []);<br />walk_ast([{clause,_Line,Pattern,Guards,Body}|RestAST], Key, Atoms) -><br />&nbsp;&nbsp;&nbsp;&nbsp;AtomsGuard = lists:foldl(<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;fun(ASTGuard, AtomsGuard) -><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;walk_ast(ASTGuard, Key, AtomsGuard)<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;end,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;walk_ast(Pattern, Key, Atoms), Guards),<br />&nbsp;&nbsp;&nbsp;&nbsp;walk_ast(ASTRest, Key, walk_ast(Body, Key, AtomsGuard));<br />?WALK_AST({cons,_Line,Left,Right}, [Left,Right]);<br />?WALK_AST({eof,_Line}, []);<br />?WALK_AST({error,_Details}, []); % Ignore compiler errors.<br />?WALK_AST({float,_Line,_Float}, []);<br />?WALK_AST({'fun',_Line,{clauses,Clauses}}, Clauses);<br />?WALK_AST({'fun',_Line,_ModuleFunArity}, []);<br />walk_ast([{function,_Line,Fun,Arity,Clauses}|RestAST], Key, Atoms) -><br />&nbsp;&nbsp;&nbsp;&nbsp;walk_ast(RestAST, Key, walk_ast(Clauses, {function,Fun,Arity}, Atoms));<br />?WALK_AST({generate,_Line,Pattern,Expression}, [Pattern, Expression]);<br />?WALK_AST({'if',_Line,Clauses}, Clauses);<br />?WALK_AST({integer,_Line,_Integer}, []);<br />?WALK_AST({lc,_Line,Head,Tail}, [Head|Tail]);<br />?WALK_AST({match,_Line,Pattern,Expression}, [Pattern, Expression]);<br />?WALK_AST({nil,_Line}, []);<br />?WALK_AST({op,_Line,_BinaryOp,Left,Right}, [Left, Right]);<br />?WALK_AST({op,_Line,_UnaryOp,_Operand}, []);<br />?WALK_AST({'query',_Line,ListComprehension}, [ListComprehension]);<br />?WALK_AST({'receive',_Line,Clauses}, Clauses);<br />?WALK_AST({'receive',_Line,Clauses1,_TimeAfter,Clauses2}, Clauses1 ++ Clauses2);<br />?WALK_AST({record,_Line,_Record,Fields}, Fields);<br />?WALK_AST({record_field,_Line,Field,Value}, [Field, Value]);<br />?WALK_AST({record_field,_Line,_Variable,_Record,Field}, [Field]);<br />?WALK_AST({string,_Line,_String}, []);<br />?WALK_AST({'try',_Line,Block,CaseClauses,CatchClauses,AfterClauses},<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[Block] ++ CaseClauses ++ CatchClauses ++ AfterClauses);<br />?WALK_AST({tuple,_Line,Elements}, Elements);<br />?WALK_AST({var,_Line,_Name}, []);<br />?WALK_AST({warning,_Details}, []); % Ignore compiler warnings.<br />walk_ast([Node|ASTRest], Key, Atoms) -><br />&nbsp;&nbsp;&nbsp;&nbsp;io:format("Unknown node: ~p~n", [Node]),<br />&nbsp;&nbsp;&nbsp;&nbsp;walk_ast(ASTRest, Key, Atoms).</span><br /><br /><br />PS: Does anyone know of an easy way to get Blogger to indent code properly? I am getting a little tired of pasting loads of "&amp;nbsp;" everywhere...Philip Robinsonhttp://www.blogger.com/profile/17605642659657207381noreply@blogger.comtag:blogger.com,1999:blog-1893218858608219953.post-83335777880153255872007-04-15T22:23:00.000+10:002007-04-16T00:49:37.126+10:00"Dynamic" record access functions with EMP1<a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp0.blogger.com/_6zQp6LtHMTo/RiIuM2FNYXI/AAAAAAAAACE/juMaRw5fgHA/s1600-h/EMPProblemSmall.png"><img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer;" src="http://bp0.blogger.com/_6zQp6LtHMTo/RiIuM2FNYXI/AAAAAAAAACE/juMaRw5fgHA/s320/EMPProblemSmall.png" alt="" id="BLOGGER_PHOTO_ID_5053652530073788786" border="0" /></a>Brian Olsen (over at <a href="http://progexpr.blogspot.com/2007/04/simple-dynamic-record-access.html">Programming Experiments</a>) wrote a small set of functions to make record accesses/updates in Erlang nicer. Ayrnieu wrote a detailed response to this in a comment on <a href="http://programming.reddit.com/info/1i147/comments/c1i1kd">Reddit</a>.<br /><br />Brian wanted to hide some of the (admittedly pretty ugly) syntax of Erlang records in a simple way. He used some run-time list-searching to find the position in the record tuple that a particular field name occurs at, and then located the desired value at that position.<br /><br /><br />Now that we have EMP1 working I thought that perhaps I might see how I would use this particular tool to solve the same problem.<br /><br /><br />First of all we need to figure out what the functions we want should look like. I think something like this would do nicely:<br /><br /><span style="font-size:85%;">recval(FieldName, Record) -> Value.<br />setrecval(FieldName, Record, Value) -> Updated Record.</span><br /><br />Of course under the covers <span style="font-size:85%;">recval</span> and <span style="font-size:85%;">setrecval</span> would examine the record given and work out which field to retrieve / update.<br /><br />Both Brian and Ayrneiu have this work done at run-time. With EMP1 we can build the supporting functions at compile-time based on the record information (which is already known at compile-time).<br /><br />In detail, <span style="font-size:85%;">recval</span> and company would look something like this:<br /><br /><span style="font-size:85%;">recval(FieldName, Record) -> recval(element(1, Record), FieldName, Record).<br />recval(record1, field1, Record) -> element(2, Record);<br />recval(record1, field2, Record) -> element(3, Record);<br />recval(record2, field1, Record) -> element(2, Record);<br />...</span><br /><br />...and similarly for the <span style="font-size:85%;">setrecval</span> versions.<br /><br />These functions can all be created at compile-time with EMP1, like this:<br /><br /><span style="font-size:85%;">-module(dyrec_macro).<br />-export([recval_generate/1]).<br /><br />recval_field(NameRecord, NameField, Posn) -><br />&nbsp;&nbsp;&nbsp;&nbsp;io_lib:format("recval(~w, ~w, Record) -> element(~B, Record)",<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[NameRecord, NameField, Posn]).<br /><br />setrecval_field(NameRecord, NameField, Posn) -><br />&nbsp;&nbsp;&nbsp;&nbsp;io_lib:format(<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"setrecval(~w, ~w, Record, Value) -> setelement(~B, Record, Value)",<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[NameRecord, NameField, Posn]).<br /><br />recval_record(RecordDetails) -> recval_record(RecordDetails, 2, []).<br />recval_record({_NameRecord, []}, _Posn, Text) -> Text;<br />recval_record({NameRecord, [NameField|NameFieldsRest]}, Posn, Text) -><br />&nbsp;&nbsp;&nbsp;&nbsp;recval_record({NameRecord, NameFieldsRest}, Posn + 1,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Text ++ "; " ++ recval_field(NameRecord, NameField, Posn)).<br /><br />setrecval_record(RecordDetails) -> setrecval_record(RecordDetails, 2, []).<br />setrecval_record({_NameRecord, []}, _Posn, Text) -> Text;<br />setrecval_record({NameRecord, [NameField|NameFieldsRest]}, Posn, Text) -><br />&nbsp;&nbsp;&nbsp;&nbsp;setrecval_record({NameRecord, NameFieldsRest}, Posn + 1,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Text ++ "; " ++ setrecval_field(NameRecord, NameField, Posn)).<br /><br />recval_generate(ListRecordDetails) -><br />&nbsp;&nbsp;&nbsp;&nbsp;[$;,32|CodeGet] = lists:flatten(<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;lists:map(fun(E) -> recval_record(E) end, ListRecordDetails)),<br />&nbsp;&nbsp;&nbsp;&nbsp;[$;,32|CodeSet] = lists:flatten(<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;lists:map(fun(E) -> setrecval_record(E) end, ListRecordDetails)),<br />&nbsp;&nbsp;&nbsp;&nbsp;"recval(Field, Record) -> recval(element(1, Record), Field, Record). "<br />&nbsp;&nbsp;&nbsp;&nbsp;"setrecval(Field, Record, Value) -> "<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"setrecval(element(1, Record), Field, Record, Value). " ++<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;io_lib:format("~s. ~s.", [CodeGet, CodeSet]).</span><br /><br /><br />And here is a test program:<br /><br /><span style="font-size:85%;">-module(dyrec_test).<br />-export([start/0]).<br />-compile({parse_transform, emp1}).<br /><br />-record(data1, {this, that}).<br />-record(data2, {this, the_other}).<br /><br />-macro({dyrec_macro, recval_generate,<br />&nbsp;&nbsp;&nbsp;&nbsp;[[{data1, [this, that]}, {data2, [this, the_other]}]]}).<br /><br />start() -><br />&nbsp;&nbsp;&nbsp;&nbsp;D1 = #data1{this=a, that=b},<br />&nbsp;&nbsp;&nbsp;&nbsp;D2 = #data2{this=c, the_other=d},<br />&nbsp;&nbsp;&nbsp;&nbsp;D3 = setrecval(this, D1, e),<br />&nbsp;&nbsp;&nbsp;&nbsp;io:format("~p~n~p~n~p~n~p~n~p~n",<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[recval(this, D1), recval(that, D1),<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;recval(this, D2), recval(the_other, D2),<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;D3]).</span><br /><br /><br />After compiling both of them, we can run this at the REPL:<br /><br /><span style="font-size:85%;">1> dyrec_test:start().<br />a<br />b<br />c<br />d<br />{data1,e,b}<br />ok<br />2></span><br /><br /><br />Personally I would not use EMP1 for this (particular) purpose. I do not mind Erlang's record syntax, but if I really did not want to use it I would rather build a parse-transformation (a la Yariv's recless module) to convert a different syntax into the record tuples Erlang uses behind the scenes.<br /><br />By layering function calls on top of record/tuple field accesses we destroy the ability of Erlang's compiler to convert the usual record syntax into direct tuple <span style="font-size:85%;">element</span> l