tag:blogger.com,1999:blog-5048733824376310579.post-65314388975572915632007-07-07T20:49:00.000-05:002007-07-07T21:52:23.592-05:00optimizing the solver part III: a bit more about measurementThe code called `solve5' is reasonably fast. I suspect we could make it a little faster, but at this point we've gone about as far with it as I'd like. After all, we're just tuning a particular algorithm, and eventually we have to hit diminishing returns --- as it is we have a clean, easy to follow implementation that is many orders of magnitude faster than to naive code we started off with.<br /><br />We haven't really asked if the algorithm is good. It isn't, particularly. It's not terrible, but it really isn't that good. So it doesn't make a lot of sense to keep tuning it. Which brings me to the second issue I mentioned a few posts ago; in optimizing code we have to balance effort spent on better algorithms vs. effort spent on tuning.<br /><br />I talked a bit about profilers. If you're doing this sort of thing, you should acquaint yourself with yours. If you're using a free lisp, you may not have a lot of options (for example, sbcl on darwin/ppc doesn't seem to have a working statistical profiler). There's lots of information out there, but most of it is implementation dependent and I won't spend time on it now. There are other things you can do; what follows is an example.<br /><br />If you think about the kind of search problem we are doing here, you're bound to start wondering about the order of search. As it is, we just look at the sites in order, lowest to highest, and try and fill in the digits. It should be clear that if someone knew how you were doing this, they could construct a puzzle that was very difficult for you to solve (but might be quite easy if you searched highest-to-lowest, say). This comes back to the issue I earlier raised -- that performance on this problem is controlled by two things: how long it takes you to try a single vertex (i.e. position in the puzzle), and how many times you have to try. This latter part is strongly affected by the order in which we look.<br /><br />If we're worried about this sort of thing, it doesn't make a lot of sense to look at only one or two puzzles, does it? That tells us a lot about the first part of the issues (how fast can one vertex be tested) but nothing much about how many things I have to try on average. <br /><br />Before we do something about that, here is a little experiment. I made a new solver, solve6, with a small change:<br /><pre><br />...<br /> else do<br /> (setf (aref soln n) x))<br />(setf unassigned (shuffle! unassigned))<br />...<br /></pre><br />At the end of the loop setting things up, I randomly shuffle the list of unassigned sites. So every time I run solve6, I look at the sites in a different (random) order. This should have only a tiny effect on execution speed, because it a) only happens once, and b) is done with fast code. Look what happens:<br /><pre><br />BLOG> (avg-runtime 30 (solve5 *easy*))<br />1.3333333333333334d-4<br />1.1954022988505748d-7<br />BLOG> (avg-runtime 30 (solve6 *easy*))<br />0.47433333333333333d0<br />0.8706354712643679d0<br /></pre><br />My macro (avg-runtime n body) executes body n times and returns the mean and variance of the runtime in seconds (The details probably aren't important, but I'll note it is computing seconds based on get-internal-runtime. This is implementation dependent, but a handy userspace tool to measure things with. I'll describe it more at the end)<br /><br />I used a puzzle called *easy* this time. Why? Because solve6 takes too long to solve the *slow* puzzle, even once. I got tired of waiting for just one solution after 5 minutes or so (compared to solve5's 1.3e-4s, remember). This little change made a <span style="font-weight:bold;">huge</span> difference. Why? Because the order we visit sites really, really matters. Which suggests that rather than fiddling around with tuning what we have, we should really think a bit about how to visit sites in a smart way.<br /><br />There's another clue here. solve6 execution speed has high variance, which means picking any old order of visitation can get you in all kinds of trouble. But also note, solve5 is comparatively very, very fast. Why? We're doing something clever, without thinking about it. What we're doing is walking along in row-major order through the array, basically. But remember that rows are one of the constraints for this puzzle. So as we move along, we are constraining the next choice (and the one after that, etc.) each time. Looks like this works for us pretty well!<br /><br />We can do a lot better, and I'll talk about that next time. And we're going to have to talk about average performance across a larger number of examples. I poked around on the web and found some <a href="http://magictour.free.fr/sudoku.htm">here</a>, that are labeled as `hard to solve'. I'll be using the file `top 2635' if you want to play along next time.<br /><br />Before I end though, ss promised, a bit more on the macro I used. First off, I used a handy little function that computes the mean and variance of any sequence, something like this:<br /><pre><br />(defun mean-and-variance (seq)<br /> (let ((n 0)<br /> (mean 0)<br /> (sum^2 0))<br /> (flet ((update (x)<br /> (let ((dev (- x mean)))<br /> (incf n)<br /> (incf mean (/ dev n))<br /> (incf sum^2 (* dev (- x mean))))))<br /> (map 'nil #'update seq)<br /> (values mean (/ sum^2 (1- n))))))<br /></pre><br />I mention this particularly, because when I first started lisping I would far too often special case different sequence types in a loop (a c programmers approach, I guess) and forgo code like this that is clean, reasonably quick, and doesn't care if you hand it a list or a vector, etc. Common lisp is full of nice function like map, and if you haven't found them yet you should look.<br /><br />Here is avg-runtime, and a macro it depends on:<br /><pre><br />(defmacro tictoc (&body body)<br /> (with-unique-names (t0 t1 res)<br /> `(let (,t0 ,t1 ,res) <br /> (setf ,t0 (get-internal-run-time)<br /> res ,@body<br /> ,t1 (get-internal-run-time))<br /> (values (float (/ (- ,t1 ,t0) internal-time-units-per-second) 1.0d0) res))))<br /><br />(defmacro avg-runtime (n &body body)<br /> (with-unique-names (res)<br /> `(loop repeat ,n <br /> collecting (tictoc ,@body) into ,res<br /> finally (return (mean-and-variance ,res)))))<br /></pre><br /><br />Of course there are loads of ways to do this, and if I was worried about long lists I wouldn't collect one up to pass to mean-and-variance, but it's fine for our purposes. I'm sure there are any number of weaknesses in the above, too. It's something I whipped off both to measure the solve6 run, and more as an example of how easy it is to build this sort of thing to help you.skahttp://www.blogger.com/profile/07206056709352200940noreply@blogger.com