tag:blogger.com,1999:blog-2642265899447052902008-07-19T22:22:19.802-04:00God Plays DiceIsabel Lugohttp://www.blogger.com/profile/15671307315028242949noreply@blogger.comBlogger606125tag:blogger.com,1999:blog-264226589944705290.post-49800303911594250122008-07-18T20:39:00.001-04:002008-07-18T20:40:35.362-04:00Lower speed limits, part twoOne thing people complain about in regards to slower speed limits, which I <a href="http://godplaysdice.blogspot.com/2008/07/five-miles-hour-30-cents-gallon.html">wrote about earlier today</a>, is that when speed limits are lower it takes longer to get places. This is, of course, true. But on the other hand you use less fuel.<br /><br />From Wikipedia on <a href="http://en.wikipedia.org/wiki/Fuel_economy_in_automobiles">fuel economy in automobiles</a>: "The power to overcome air resistance increases roughly with the cube of the speed, and thus the energy required per unit distance is roughly proportional to the square of speed." Furthermore, this is the dominant factor for large velocity.<br /><br />So let's say your fuel usage, measured in fuel used per unit of distance (say, gallons per mile), at velocity <i>v</i>, is kv<sup>2</sup>. (<i>k</i> is some constant that depends on the car. A typical value of k, for a car using 0.05 gallons per mile at 60 mph, is 0.000014.) Let's say you value your time at a rate <i>c</i> -- measured in, say, dollars per hour, and the price of fuel is <i>p</i>. <br /><br />Then for a journey of length <i>d</i>, you'll spend dpkv<sup>2</sup> in fuel, and cd/<sup>v</sup> in time. Your total cost is <DIV STYLE="text-align: center;"><IMG SRC="http://snappy.at.org/~cola/tex2img/image.php?id=cdefdfd2b24874d5a5d58cb41517bf22" STYLE="vertical-align: middle;" ALT="f(v) = d\left(pkv^2 + {c\over v}\right)" HEIGHT="39" WIDTH="185"></DIV> and differentiating and setting f'(v) = 0, the optimal speed is (c/2pk)<sup>1/3</sup>. The cost of the journey at this speed is <br /><DIV STYLE="text-align: center;"><IMG SRC="http://snappy.at.org/~cola/tex2img/image.php?id=4b3ee026f11b8036d79b70341c7ad8a6" STYLE="vertical-align: middle;" ALT="f \left( \left( c/2pk\right)^{1/3} \right) = {3d \over 2} (c^2 pk)^{1/3}" HEIGHT="44" WIDTH="269"></DIV><br />So according to this model, if you value your time more you should go faster; not surprisingly your value of time <i>c</i> and the price of fuel <i>p</i> show up only as c/p -- effectively, your value of time measured in terms of fuel.<br /><br />Also, the optimal speed doesn't go down that slowly as <i>p</i> increases -- it only goes as p<sup>-1/3</sup>. But a doubling in gas prices still leads to a 20 percent reduction in optimal speed -- perhaps roughly in line with what people are suggesting. Taking c = 10, p = 4.05, k = 0.000014 gives an optimal speed of 45 miles per hour, although given the crudeness of this model (I've assumed that <i>all</i> the fuel is used to fight air resistance) I'd take that with a grain of salt, and I won't even touch the fact that different people place different values on their time and get different fuel economy. We can't just let everyone drive at their optimal speed.<br /><br />Besides, part of the whole point of this is that if we use less fuel, demand for fuel will drop significantly below supply and oil prices will go down. So to forecast the effects of a lower speed limit I'd have to factor in that gasoline could get cheaper -- and let's face it, I can't predict the workings of the oil market.Isabel Lugohttp://www.blogger.com/profile/15671307315028242949noreply@blogger.comtag:blogger.com,1999:blog-264226589944705290.post-57297644066258996362008-07-18T18:44:00.004-04:002008-07-19T08:32:36.918-04:00Five miles an hour = 30 cents a gallon?"Every five miles an hour faster costs you an extra 30 cents a gallon." From <a href="http://www.nytimes.com/2008/07/17/nyregion/17towns.html">yesterday's New York Times</a>, among others. This is often mentioned in reference to bringing back the national 55 mile per hour speed limit.<br /><br />What does this even mean? I assume it means that it takes, say, seven percent more gasoline per mile to drive 65 mph than to drive 60 mph. (30 cents is around seven percent of the current average gasoline price, $4.10 or so per gallon.) Why not just say that? This also has the advantage that when gas prices change, the fact doesn't become outdated.<br /><br />Although as many people point out, the lower speed limit is a hard sell, in part because of the value of time. If you're about to drive 65 miles at 65 mph, it'll take you an hour; say you get 20 miles per gallon, so that uses 3.25 gallons of gasoline. Slowing to 60 mph, it takes five minutes longer, but saves seven percent of that gasoline, or 0.23 gallons -- perhaps $1 worth. So if you value an hour at more than $12 (more generally, at more than three gallons of gasoline), you should drive faster! Of course I've committed the twin fallacies of "everything is linear" and a bunch of sloppy arithmetic, and I've ignored that different cars get different gas mileage, but the order of magnitude is right -- and it's clear to me some people value their time at more than this and some at less. And a better analysis would take into account the probability of getting in accidents, speeding tickets, etc. (I'm mostly pointing this out because otherwise some of you will.)<br /><br />Oh, and on a related note, <a href="http://www.reuters.com/article/domesticNews/idUSN1739264020080718">people will do things for $100 worth of gas that they wouldn't do for $100 worth of money.</a>Isabel Lugohttp://www.blogger.com/profile/15671307315028242949noreply@blogger.comtag:blogger.com,1999:blog-264226589944705290.post-8745353858706538592008-07-17T16:48:00.004-04:002008-07-17T16:55:15.400-04:00Population densities vary over nine orders of magnitudeThe United States has an area of 3,794,066 square miles, and a population, as of the 2000 census, of 281,421,906. This gives a population density of 74.2 people per square mile.<br /><br />But what is the average population density that Americans live at? It's not 74.2 per square mile. Only about 11 percent of Americans live in <A href="http://www.census.gov/geo/www/cob/bg_metadata.html#cbg">census block groups</a> (the smallest resolution the census goes down to; there are about 200,000 of these, corresponding to about 1,500 people each) lower than this density. That's not <i>too</i> surprising; that average includes lots of empty space.<br /><br />But the median American, it turns out, lives in a block group with a density of 2,521.6 per square mile. At least, when I asked the web site I was using for the distribution of block groups by population density that's what it said; the front page says this number is 2,059.23. I suspect the smaller number is actually the median population density <i>of block groups</i>, not <i>of individuals</i>; the block groups tend to have lower populations in less dense areas, which explains the difference. This number was surprisingly high to me, and seems to illustrate how concentrated population is.<br /><br />In case you're wondering, the most densely populated block group is one in New York County, New York -- 3,240 people in 0.0097 square miles, for about 330,000 per square mile. The least dense is in the North Slope Borough of Alaska -- 3 people in 3,246 square miles, or one per 1,082 square miles. The Manhattan block group I mention here is 360 million times more dense than the Alaska one; population densities vary over a <i>huge</i> range.<br /><br />Here's a table; in the first row is a percentile n, in the second row the population density such that n% of Americans live in a block group with that density (in people per square mile) or less. (Generating such a table at fakeisthenewreal.com is slow, which is why I'm providing it here.) <table><tr><td>Percentile</td><td>5</td><td>10</td><td>20</td><td>30</td><td>40</td><td>50</td></tr><br /><tr><td>Density</td><td>29.3</td><td>64.9</td><td>226.9</td><td>677.5</tD><td>1499.8</td><td>2521.6</td></tr><tr><td>Percentile</td><td>60</td><td>70</td><td>80</td><td>90</td><td>95</td></tr><tr><td>Density</td><td>3737.2</td><td>5257.1</td><td>7529.0</td><td>13261.9</td><td>24219.5</td></tr></table> I hesitate to interpret this. But I must admit that I'm curious if demographers have some way of predicting the general shape of this data. It's clear in the US that more people live at "intermediate" densities than at very high or low ones -- but that's not exactly a meaningful statement.<br /><br />(Facts from <a href="http://www.fakeisthenewreal.org/by_density/">fake is the new real</a>, crunching Census Bureau data.)<br /><br />By the way, Wikipedia has an article entitled <a href="http://en.wikipedia.org/wiki/List_of_U.S._states_by_area">list of U. S. states by area</a>. This includes an almost entirely useless map which colors the larger states darker. I can see which states are larger without the colors, because they're <i>larger</i>, which is kind of the point of a <i>map</i>. The area the state takes up on my screen should be proportional to its actual area.Isabel Lugohttp://www.blogger.com/profile/15671307315028242949noreply@blogger.comtag:blogger.com,1999:blog-264226589944705290.post-33309724196681954942008-07-16T12:50:00.003-04:002008-07-16T12:58:24.411-04:00Base sixty is kind of tricky<a href="http://www.thisisplymouth.co.uk/regional/Traffic-warden-couldn-t-tell-time/article-222675-detail/article.html">Base sixty is kind of tricky</a>. A traffic warden used a calculator to tell when the parking a driver had paid for would expire, got the wrong answer, and gave him a ticket. He got the wrong answer because he was treating time as a decimal -- so 2:49 became 2.49 -- and as you know, there are sixty minutes in an hour, not one hundred. The driver had paid for 75 minutes, so the warden found 2.49 + .75 = 3.24 and decided he had paid until 3.24. (I shudder to think what would have happened if the warden had noticed that 75 minutes is one hour fifteen minutes, and done the computation 2.49 + 1.15 = 3.64 -- obviously the time 3:64 doesn't exist.)<br /><br />Have there been cheap calculators that work in hours and minutes? I feel like there would be a demand for that; calculations involving time are probably among the most common ones in ordinary life. Then again, most people seem able to do them; this sounds like an isolated incident.<br /><br />(via <a href="http://ericberlin.com/?p=2281">Eric Berlin</a>.)Isabel Lugohttp://www.blogger.com/profile/15671307315028242949noreply@blogger.comtag:blogger.com,1999:blog-264226589944705290.post-30914672426774905612008-07-15T15:54:00.002-04:002008-07-17T17:16:45.409-04:00Translating popular votes to electoral votesBy sheer chance, I came across the book <i>Predicting Party Sizes</i> by Rein Taagepera, a political scientist who was trained as a physicist. I was interested to run into a "theorem" (I'm not sure whether I can call it this, because the derivation in the book is rather heuristic) which states the following. Let V be the number of voters in a country like the United States which elects its president through an electoral college, and let E be the number of states in that country. Then let n = (log V)/(log E). For the United States at present, V is about 121 million (I'm using the turnout in the 2004 election), E is 51 (the District of Columbia is a "state" for the purposes of this discussion), and so n is about 4.7.<br /><br />This quantity n is called the "responsiveness" of the system, and its rough interpretation is that if the party in control receives (1/2 + &epsilon;) of the popular vote, then it will receive (1/2 + n&epsilon;) of the electoral vote, for small &epsilon;. More generally, let V<sub>D</sub> and V<sub>R</sub> be the number of popular votes obtained by the Democratic and Republican candidates, respectively; let E<sub>D</sub> and E<sub>R</sub> be their numbers of electoral votes. Then E<sub>D</sub>/E<sub>R</sub> is approximately (V<sub>D</sub>/V<sub>R</sub>)<sub>n</sub>. When V<sub>D</sub>/V<sub>R</sub> = 1 this reduces to the first statement.<br /><br />Anyway, Nate Silver at fivethirtyeight.com showed the results of some of his simulations about a month ago and claimed that <a href="http://www.fivethirtyeight.com/2008/06/popular-vote-v-electoral-vote.html">a one-percent swing in the popular vote corresponds to 25 electoral votes</a>. It turns out that 25 electoral votes is 4.6 percent of the electoral college at a whole, so based on his simulations n = 4.6. I take this as evidence that Silver is doing something right. (n is also in this neighborhood for data from actual elections.)Isabel Lugohttp://www.blogger.com/profile/15671307315028242949noreply@blogger.comtag:blogger.com,1999:blog-264226589944705290.post-42950253606900927802008-07-15T12:02:00.004-04:002008-07-15T12:23:46.238-04:00Shortage of fours<a href="http://www.nytimes.com/2008/07/15/nyregion/15four.html">Gas stations have a shortage of fours.</a><br /><br />My American readers probably know why -- gas has been over $4 per gallon for a while. Apparently the numbers come in sets of forty, four of each digit. They can also be bought individually. But there aren't too many manufacturers.<br /><br />I'm kind of curious if there are more stations selling at $4.43 or $4.45 than $4.44 just because they don't have the appropriate digits. (I would have asked a similar question at $3.33 or $2.22. And I'll ask it again if we get to $5.55.) Stations might also price at $4.39 instead of $4.40, or $4.50 instead of $4.49, for similar reasons. It sounds like some of them are improvising digits, but reporters wouldn't know if a particular station charging $4.43 is doing this or not; it could only be figured out by looking at large amounts of data, and I'm not <i>that</i> curious.<br /><br />And in New Hampshire some stations are <a href="http://www.seacoastonline.com/apps/pbcs.dll/article?AID=/20080714/NEWS/80714015/-1/NEWS19&sfad=1">pricing gas by the half-gallon</a>, because their pumps can't handle prices higher than $3.999. So they indicate that they're doing so, set the pump at something like $2.05, and charge double what the pump reads, namely $4.10. Apparently some people are troubled by the mathematical demands this places on the consumer:<br /><blockquote>"If for no other reason, half pricing is confusing and can be inconvenient for the customer. When I buy gasoline I stop the pump at the dollar amount I want to spend. So let's say I have $60 to spend and the meter, if it's on half pricing — reads $31.50 and I forgot to stop it at $30, what do I do?" he said.</blockquote><br />I hope people can double and halve in their heads. But there's the psychological issue -- they might <i>forget</i> to.Isabel Lugohttp://www.blogger.com/profile/15671307315028242949noreply@blogger.comtag:blogger.com,1999:blog-264226589944705290.post-45691441814551857592008-07-14T18:17:00.002-04:002008-07-14T18:48:26.615-04:00Some quick statistics on the calibration quizOn Saturday I gave a quiz from Ian Ayres' book Super Crunchers which asked you to provide 90% confidence intervals for ten numerical questions with well-defined answers. Roughly speaking, you should select your answers so that you expect to get nine of the questions right and you believe you're equally likely to have gotten each of them wrong. <br /><br />Nineteen people have taken the quiz.<br />Out of the 190 individual answers received, 97 were correct -- slightly over half. The distribution of scores on the quiz is as follows:<br /><table><br /><tr><td><b>Score</b></td><td>2</td><td>3</td><td>4</td><td>5</td><td>6</td><td>7</td><td>8</td><td>9</td><td>10</td></tr><br /><tr><td><b>Number of people</b></td><td>1</td><td>4</td><td>3</td><td>4</td><td>2</td><td>3</td><td>1</td><td>0</td><td>1</td></tr><br /></table><br />In short, the respondents as a group confirm Ayres' claim that "almost everyone who answers these questions has the opposite problem of overconfidence -- they can't help themselves from reporting ranges that are too small." Ayres cites a book by J. Edward Russo and Paul J. H. Schoemaker, <i>Decision Traps: Ten Barriers to Brilliant Decision-Making and How to Overcome Them</i>, which I haven't read; supposedly "most" people get between three and six questions right. I'm actually soewhat surprised that you as a group don't seem all that different from the general population.<br /><br />I have some other comments -- which questions seem particularly difficult or easy, what we might say about confidence intervals other than 90 percent -- but I'm hoping more people might answer, so I'll wait for that. (Although if the remaining answers are suspiciously better-calibrated that the answers so far, that might turn out to be not such a good idea.)Isabel Lugohttp://www.blogger.com/profile/15671307315028242949noreply@blogger.comtag:blogger.com,1999:blog-264226589944705290.post-29522593229290375212008-07-12T12:42:00.001-04:002008-07-12T12:44:24.272-04:00A prediction-making quizI just read Ian Ayres' book <a href="http://www.amazon.com/Super-Crunchers-Thinking-Numbers-Smart/dp/0553805401">Super Crunchers</a>, which talks about how the large amounts of data that are now routinely collected enable better predictions than before. Sort of like <i>Freakonomics</i> but a bit more statistical. (Although all the math is hidden -- but I knew that going in.)<br /><br />Now, there was a recent article <a href="http://www.wired.com/science/discoveries/magazine/16-07/pb_theory">The End of Theory</a> which predicts that we don't need theories, we can just mine our data for correlations; I don't believe this. And Ayres talks about how some predictive models need human input -- for example, a model for predicting how Supreme Court justices will vote needs people to read previous input on the cases in order to decide whether the ruling being appealed was liberal or conservative, and also to determine what the major issues involved in the case are. But he ponts out that people are bad at predicting things because we are overconfident about our predictions.<br /><br />This piqued my curiosity. Here's a quiz; I want to see how good you are at calibrating your own predictions. (This is taken from Ayres' book, p. 113.) For each of the following ten questions, give a <b>range</b> that you are 90 percent confident contains the correct answer. Ayres' test implicitly uses English units, but if you want to use metric (which I suspect a lot of you are more comfortable in) that's fine; I'll convert. <br /><br />So, for example, if one of the questions were "What is the population of Philadelphia?", and you gave the numbers "1.2 million, 1.6 million", that would indicate that you believe with probability 90 percent that the population of Philadelphia is in that interval. (The 2006 Census estimate for this, by the way, is 1,448,394.)<br /><br />Your goal is to get exactly <i>nine</i> of these right. Yes, I know that sounds weird! But the point is that if you get all ten right, you're proabably underestimating your own abilities to predict things. If you get eight or less, you're probably overestimating them. <br /><br />Send your answers to me at <b>izzycat AT gmail DOT com</b>; <i>don't</i> leave them in comments. <br /><br />Here are the questions:<br />1. How old was Martin Luther King, Jr. at death?<br />2. What is the length of the Nile River?<br />3. How many countries belong to OPEC?<br />4. How many books are there in the Old Testament?<br />5. What is the diameter of the moon?<br />6. What is the weight of an empty Boeing 747-400?<br />7. In what year was Mozart born?<br />8. What is the gestation period of an Asian elephant?<br />9. What is the air distance from London to Tokyo?<br />10. What is the depth of the deepest known point in the ocean?<br /><br /><br />Also:<br />1. feel free to forward this quiz to other people. (I encourage it, although there's a non-negligible chance I might regret this if I get too many answers. I'll survive.)<br />2. if you have stories about how you made your guess, send them to me; I may use them in a future post.<br />I'm not going to post the answers; none of them are hard to find. Once answers stop coming in I'll make a post about how good you are at making these predictions.Isabel Lugohttp://www.blogger.com/profile/15671307315028242949noreply@blogger.comtag:blogger.com,1999:blog-264226589944705290.post-52098952043419806042008-07-11T20:26:00.003-04:002008-07-11T21:16:24.211-04:00Good's "singing logarithms"I've <a href="http://godplaysdice.blogspot.com/2008/01/street-fighting-mathematics.html">previously mentioned</a> <a href="http://www.inference.phy.cam.ac.uk/sanjoy/">Sanjoy Mahajan</a>'s <a href="http://mit.edu/6.099/">Street Fighting Mathematics</a>. (Yes, that's right, almost the entire sentence is links, deal with it.)<br /><br />One thing I didn't mention is <a href="http://mit.edu/6.099/handouts/singing-logarithms.pdf">approximating logarithms using musical intervals</a>, from that course. We all know 2<sup>10</sup> and 10<sup>3</sup> are roughly equal; this is the approximation that leads people to use the metric prefixes kilo-, mega-, giga-, tera- for 2<sup>10</sup>, 2<sup>20</sup>, 2<sup>30</sup>, and 2<sup>40</sup> in computing contexts. Take 120th roots; you get 2<sup>1/12</sup> &asymp; 10<sup>1/40</sup>.<br /><br />Now, 2<sup>1/12</sup> is the ratio corresponding to a <a href="http://en.wikipedia.org/wiki/Semitone">semitone</a> in <A href="http://en.wikipedia.org/wiki/Twelve-tone_equal_temperament">twelve-tone equal temperament</a>. So, for example, we know that 2<sup>7/12</sup> is approximately 3/2, because seven semitones make a perfect fifth. So log<sup>10</sup> 3/2 &asymp; 7/40 = 0.175; the correct value is 0.17609... Some more complicated examples are in Mahajan's handout.<br /><br />You might think "yeah, but when do I ever need to know the logarithm of something?" And that may be true; they're no longer particularly useful as an aid for calculation, except when you don't have a computer around. But I often find myself doing approximate calculations while walking, and I can't pull out a calculator or a computer! (To be honest I don't use this trick, but that's only because I have an arsenal of others.)<br /><br />Is this pointless? For the most part, yes. But amusingly so. <br /><br />The method is supposedly due to I. J. Good, who is annoyingly difficult to Google.<br /><br />Oh, and a few facts I find myself using quite often -- (2&pi;)<sup>1/2</sup> &asymp; 2.5, e<sup>3</sup> &asymp; 20.Isabel Lugohttp://www.blogger.com/profile/15671307315028242949noreply@blogger.comtag:blogger.com,1999:blog-264226589944705290.post-81742679620113363232008-07-10T21:41:00.004-04:002008-07-10T21:46:47.131-04:00Three beautiful quicksorts<a href="http://en.wikipedia.org/wiki/Jon_Bentley">Jon Bentley</a> gives a lecture called <a href="http://video.google.com/videoplay?docid=-1031789501179533828">Three Beautiful Quicksorts</a>, as three possible answers to the question "what's the most beautiful code you've ever written?" (An hour long, but hey, I've got time to kill.)<br /><br />Watch the middle third, in which some standard code for <A href="http://en.wikipedia.org/wiki/Quicksort">quicksort</a> is gradually transformed into code for performing an analysis of the number of comparisons needed in quicksort, and vanishes in a puff of mathematical smoke.<br /><br />Although I must admit, I'm kind of annoyed that he slips into the idea that an average-case analysis is the most important thing somewhere in there. The first moment of a distribution is not everything you need to know about it! Although I admit that at times I subscribe to the school of thought that says "the first two moments are everything", but that's only because most distributions of normal.<br /><br />(Note to those who don't get sarcasm: I don't actually believe that most distributions are normal.)Isabel Lugohttp://www.blogger.com/profile/15671307315028242949noreply@blogger.comtag:blogger.com,1999:blog-264226589944705290.post-70734175108534554892008-07-10T17:49:00.003-04:002008-07-10T17:56:46.410-04:00Why medians are dangerous<a href="http://gregmankiw.blogspot.com/2008/07/bimodality.html">Greg Mankiw provides a graph of the salaries of newly minted lawyers</a>, originally from <a href="http://www.elsblog.org/the_empirical_legal_studi/2007/09/distribution-of.html">Empirical Legal Studies</a>.<br /><br />There are two peaks, one centered at about $45,000 and one centered at about $145,000. The higher peak corresponds to people working for Big Law Firms; the lower to people working for nonprofits, the government, etc.<br /><br />The median is reported at $62,000, just to the right of the first peak, since the first peak contains slightly more people. But one gets the impression that if a few more people were to shift from the left peak to the right peak, the median would jump drastically upwards. We usually hear that it's better to look at the median than the mean when looking at distributions of incomes, house prices, etc. because these distributions are heavily skewed towards the right. But even that starts to break down when the distribution is bimodal.Isabel Lugohttp://www.blogger.com/profile/15671307315028242949noreply@blogger.comtag:blogger.com,1999:blog-264226589944705290.post-89048652842725059012008-07-09T14:49:00.002-04:002008-07-09T14:51:35.175-04:00Why devil plays dice?<a href="http://arxiv.org/abs/0806.4875">Why devil plays dice?</a>, by Andrzej Dragan, from the arXiv. I haven't read it; this post basically exists to forestall e-mails of the form "Have you seen the title of this paper?"<br /><br />(Hat tip to <A href="http://scienceblogs.com/pontiff/2008/07/devilish_dice_games.php">The Quantum Pontiff</a>.)Isabel Lugohttp://www.blogger.com/profile/15671307315028242949noreply@blogger.comtag:blogger.com,1999:blog-264226589944705290.post-8908336706239314652008-07-09T10:54:00.000-04:002008-07-09T10:55:24.598-04:00Lottery tickets with really bad oddsA CNN.com article talks about <a href="http://www.cnn.com/2008/US/07/07/lottery.tickets/index.html">lottery tickets with zero probability of winning</a>.<br /><br />Why, you ask? Because some state lotteries continue selling the tickets for scratch-off games even after the top prize has been awarded. Therefore the odds stated on the ticket are, as of the time the ticket was purchased, incorrect.<br /><br />But let's say that half the tickets for some game have already been sold, and the top prize not awarded -- then the tickets that are still out there have double the probability of winning that they did originally. You wouldn't see anybody complaining about <i>that</i>.<br /><br />One way to fix this would be to have all the tickets be independent of each other, but drawn from the same distribution -- so instead of having one grand prize among the 100,000 tickets, each ticket <i>independently</i> has probability 0.00001 of being a grand prize ticket. But then there's a significant probability that there will be no grand prizes awarded, or that there would be two or more.<br /><br />And some lottery websites actually state which prizes have already been awarded. So it might be possible for somebody to use this information to their advantage, by betting only in lotteries where a disproportionate number of prizes remain to be awarded. This is basically the same principle as card-counting in blackjack, where the player bets more when the cards in the deck are more favorable. I suspect, though, that this wouldn't work well because the house edge in lotteries is much higher than that in casinos.Isabel Lugohttp://www.blogger.com/profile/15671307315028242949noreply@blogger.comtag:blogger.com,1999:blog-264226589944705290.post-38098800814944751552008-07-08T22:57:00.001-04:002008-07-09T21:01:07.274-04:00On today's New York Times crosswordToday's New York Times crossword is by Tim Wescott. There is someone who's commented at <a href="http://sbseminar.wordpress.com/2008/06/16/request-long-distance-collaboration/">Secret Blogging Seminar</a> with that name.<br /><br />Anyway, here are some of the answers:<br /><br />4 down: EVEN TENOR<br />6 down: PERFECT GAME<br />11 down: ODD MEN OUT<br />25 down: SQUARE KNOTS<br />33 down: REAL MCCOY<br />37 down: PRIME TIME<br /><br />There was one more clue saying that the first word of each of those answers (which had a star before the clue) described the number of its clue. So 4 is even, 6 is perfect, 11 is odd, 25 is square, 33 is real, and 37 is prime.<br /><br />33 down seems like a bit of a cop-out to me. But I'm not saying I could do better at making a crossword. Crosswords (especially American-style ones) are hard to make; read the <a href="http://www.cs.toronto.edu/~mackay/itprnn/ps/260.262.pdf">information-theoretic argument</a> in <a href="http://www.inference.phy.cam.ac.uk/mackay/itila/book.html">MacKay's book</a> for some justification why.<br /><br />For the non-mathematicians who may have stumbled in (and the mathematicians who don't remember this particular bit of trivia), I feel like I should point out what a perfect number is. A number is perfect if it's equal to the sum of all the numbers it's divisible by. So 6 is divisible by 1, 2, and 3, and 1 + 2 + 3 = 6. 28 is the next perfect number; it's divisible by 1, 2, 4, 7, and 14, and 1 + 2 + 4 + 7 + 14 = 28. But 12 isn't perfect; it's divisible by 1, 2, 3, 4, and 6, and 1 + 2 + 3 + 4 + 6 = 16, which isn't 12. We call 12 "abundant" because 16 (the sum of its divisors) is more than 12. <a href="http://godplaysdice.blogspot.com/2007/11/density-of-abundant-numbers.html">Just under one quarter of integers are abundant</a>, which is entirely irrelevant.Isabel Lugohttp://www.blogger.com/profile/15671307315028242949noreply@blogger.comtag:blogger.com,1999:blog-264226589944705290.post-75300779156033125732008-07-07T13:00:00.001-04:002008-07-07T13:00:46.860-04:00A political minimum spanning treeThis morning, Nate Silver of fivethirtyeight.com posted <a href="http://www.fivethirtyeight.com/2008/07/state-similarity-scores.html">State Similarity Scores</a>. For each pair of states, Silver reports a score that gives the political "distance" between the two states. (He actually reports only the three states closest to each state.)<br /><br />These are based on an analysis of certain variables that appear to be important in US politics, weighted by their importance in determining state-by-state polling in the 2004 and 2008 presidential elections. As it turns out, the pair of states that are closest to each other in this metric are the Carolinas, followed by the Dakotas; Kentucky-Tennessee; Michigan-Ohio and Oregon-Washington.<br /><br />It occurred to me that the <a href="http://en.wikipedia.org/wiki/Minimum_spanning_tree">minimum-weight spanning tree</a> for this data might look interesting. And indeed it does. I'm having some trouble articulating <i>why</i> it's interesting, but I just wanted to post the tree. There may be a slight issue because I don't have the full set of similarity scores, but the tree generated from the subset of the data that I do have is probably pretty close to the "true" tree and is quite interesting to look at. (The weight for the edge between any two states is 100 minus Silver's similarity score for that pair of states; Silver's similarity scores have a theoretical maximum of 100.)<br /><br />Note that the positioning of the states in the drawing of the tree below is entirely irrelevant; I just attempted to draw the tree in such a way that people wouldn't be inclined to see edges that weren't actually there. In particular, Ohio is not somehow "unusual" even though the edges connecting it to adjacent states are long. (As a start, though, it does seem to be useful to think of Ohio as the center of the graph, in line with the conventional political wisdom that Ohio is at the political center of the US.) I thought about trying to make the distances in the drawing reflect the weights, but that was more trouble than I wanted to go to.<br /><br />Also, some states that are close to each other in Silver's metric aren't close in the tree. There may be errors, since I did this by hand.<br /><br /><a href="http://www.math.upenn.edu/~isabel/tree.gif">Here's the tree.</a>Isabel Lugohttp://www.blogger.com/profile/15671307315028242949noreply@blogger.comtag:blogger.com,1999:blog-264226589944705290.post-47379885172815715962008-07-06T12:41:00.003-04:002008-07-06T12:44:32.261-04:00Nomenclature clash<a href="http://www.nytimes.com/imagepages/2008/07/06/weekinreview/20080706_WEEK_GRAPHIC.html">Prime Numbers</a> for June 29 to July 5, from today's New York Times. (I don't know if this is a weekly thing; it could be but I don't recall seeing it before.)<br /><br />The numbers are 46, 62000, 30, 18%, and 30000; each is important to some news story from this week. (If you want to get technical, 62000 and 30000 are approximations.)<br /><br />Presumably they mean "prime" in the sense of "important". Or in the sense of "composite", but that would be a bit perverse.Isabel Lugohttp://www.blogger.com/profile/15671307315028242949noreply@blogger.comtag:blogger.com,1999:blog-264226589944705290.post-59614795294469764372008-07-05T17:21:00.003-04:002008-07-05T17:41:56.967-04:00A couple of links1. <a href="http://quomodocumque.wordpress.com/2008/07/05/one-to-nine-review-in-the-nytimes/">Jordan Ellenberg's</a> <a href="http://www.nytimes.com/2008/07/06/books/review/Ellenberg-t.html">review of</a> <a href="http://www.synth.co.uk/main.html">Andrew Hodges'</a> book <a href="http://www.cryptographic.co.uk/onetonine/">One To Nine</a>. Read the review, if only because it uses the word "mathiness". Ellenberg's review seems to imply that the book has similar content to most popular math books; sometimes I wonder how the publishing industry manages to keep churning out these books, but then I remember that the same thing is true in most other subjects and I'm just more conscious of it in mathematics.<br /><br />2. <a href="http://garden.irmacs.sfu.ca/">Open Problem Garden</a>, which is a user-editable (?) repository of open problems in mathematics. Thanks to <a href="http://rigtriv.wordpress.com/2008/07/05/mathematical-odds-and-ends/#comments">Charles Siegel</a>, my fellow Penn mathblogger, for pointing this out. The majority of the problems given there are in graph theory; that seems to be because <a href="http://www.sfu.ca/~mdevos/">Matt Devos</a>, one of the most prolific contributors, is a graph theorist. <br /><br />But I have to say that "garden" feels like the wrong word here; gardens are calm and peaceful and full of well-organized plants, which doesn't seem like a good way to describe problems that haven't been solved yet. "Forest" seems like a better metaphor to me -- certainly when I'm working on a problem that's not solved, it feels like hacking my way through a forest, not walking around a garden. Also, the use of "forest" enables bad graph theory jokes -- the problem of <a href="http://garden.irmacs.sfu.ca/?q=op/negative_association_in_uniform_forests">"negative assocation in uniform forests"</a>, due to Robin Pemantle, in particular sounds like it could be about sketchy people you meet in the woods.<br /><br />(I gave a talk back in February where I mentioned this problem. I'm glad I didn't think of that joke then, because it's really bad and I would have just embarrassed myself.)Isabel Lugohttp://www.blogger.com/profile/15671307315028242949noreply@blogger.comtag:blogger.com,1999:blog-264226589944705290.post-84364467366660048382008-07-03T23:33:00.003-04:002008-07-03T23:50:16.411-04:00Lightning and lotteriesFrom a rerun of <i>Friends</i>:<br /><blockquote>Ross: Do you know what your odds are of winning the lottery? You have a better chance of being struck by lightning 42 times.<br />Chandler: Yes, but there's six of us, so we'd only have to get struck by lightning 7 times.<br />Joey: I like those odds!<br /></blockquote> Unsurprisingly, Chandler seems to know that probability doesn't work this way; Joey doesn't.<br /><br />Also, Ross is wrong. It seems the record for getting struck by lightning is <a href="http://en.wikipedia.org/wiki/Roy_Sullivan">Roy Sullivan</a>, seven times. So nobody's been hit 42 times, while plenty of people have won the lottery.<br /><br /> I don't know how to calculate the odds that someone gets hit 42 times by lightning in their life; the lifetime incidence of getting hit is three thousand to one, and if you figure that lightning strikes are a Poisson process with rate 1/3000 per lifetime, as <a href="http://news.nationalgeographic.com/news/2004/06/0623_040623_lightningfacts.html">this article states</a>, then the probability that lightning hits one person seven times is something like one in (1/3000)<sup>7</sup>/7!, or one in about 10<sup>28</sup>. (That's the probability that a Poisson with parameter 1/3000 takes the value exactly 7; I'm ignoring the normalizing factor of exp(1/3000) and the even-more-negligible probability that someone gets hit eight or more times.)<br /><br />Since the number of people who have existed is much less than 10<sup>28</sup>, the existence of a person who's been hit seven times is very strong evidence that that's not the right model. My hunch is that events of each person getting hit by lightning are a Poisson process, but with a separate parameter depends on the person. Roy Sullivan was a park ranger.<br /><br />But the 1 in 3000 figure can't be trusted; the article also claims the annual risk of getting hit by lightning is one in 700,000. People don't live 700,000/3,000 (i. e. 233) years.Isabel Lugohttp://www.blogger.com/profile/15671307315028242949noreply@blogger.comtag:blogger.com,1999:blog-264226589944705290.post-7543834915345685042008-07-03T17:40:00.004-04:002008-07-03T17:48:30.265-04:00Li's proof of Riemann has a flaw -- but all might not be lost?<a href="http://terrytao.wordpress.com/2008/02/07/structure-and-randomness-in-the-prime-numbers/#comment-30714">Terry Tao</a> claims that Li's proof of the Riemann hypothesis <a href="http://godplaysdice.blogspot.com/2008/07/lis-proof-of-riemann.html">(which I wrote about yesterday)</a> is flawed. (via <a href="http://www.arsmathematica.net/archives/2008/07/03/lis-preprint/">Ars Mathematica</a>.) But that was, I think, version 2 at the arXiv; the paper is now up to <a href="http://arxiv.org/abs/0807.0090">version 4</a>, which apparently attempts to fix the flaw Tao claims in version 2.<br /><br /><a href="http://noncommutativegeometry.blogspot.com/2008/06/fun-day-two.html?showComment=1215071400000#c8876982000013974667">Alain Connes</a> has also weighed in at his blog; Li's paper relies on his work.Isabel Lugohttp://www.blogger.com/profile/15671307315028242949noreply@blogger.comtag:blogger.com,1999:blog-264226589944705290.post-580090313467038762008-07-02T18:13:00.003-04:002008-07-02T18:21:31.433-04:00Obama isn't average -- and that's a good thing.<a href="http://www.washingtonpost.com/wp-dyn/content/article/2008/07/01/AR2008070103008_pf.html">Someone at the Washington Post is a bit confused about averages.</a><br /><br />Basically, Barack and Michelle Obama (you've heard of them, right?) got a mortgage at a rate of 5.625% at a time when the average rate was 5.93% -- and so the Obama campaign finds itself playing defense. But as <a href="http://www.fivethirtyeight.com/2008/07/most-irresponsible-piece-of-journalism.html">Nate Silver</a> pointed out, this is evidence that the Obamas have good credit, and as various people commenting there pointed out, it's an <i>average</i>. <br /><br />Some people get better than average rates. That's true by definition. (Although I suspect that more than half of people get a rate below the mean, because the right tail is probably longer than the left tail. <br /><br />Personally, I <i>want</i> my presidential candidates to be getting a good interest rate -- because it's evidence that they have good credit, which in turn is evidence for some sort of financial prudence. (Yes, I know, some people with bad credit got there because they got dealt a bad hand. It's <i>evidence</i>, not a <i>proof</i>.) And if someone is good at managing their own money, they might be good at managing the country's money.<br /><br />And do we really want our president to be average?Isabel Lugohttp://www.blogger.com/profile/15671307315028242949noreply@blogger.comtag:blogger.com,1999:blog-264226589944705290.post-39875585155921660302008-07-02T17:10:00.003-04:002008-07-02T17:20:08.822-04:00Li's proof of Riemann?<a href="http://arxiv.org/abs/0807.0090">A proof of the Riemann hypothesis</a>, by <a href="http://www.math.byu.edu/~xianjin/">Xian-Jin Li</a>.<br /><br />I'm not qualified to judge the correctness of this, but glancing through it, I see that it at least <i>looks</i> like mathematics. Most purported proofs of the Riemann hypothesis set off the crackpot alarm bells in my head; this one doesn't. Li has also stated <a href="http://en.wikipedia.org/wiki/Li%27s_criterion">Li's criterion</a> in 1997, which is one of the many statements that's equivalent to RH, although I don't think it's used in the putative proof, and wrote a PhD thesis titled <a href="http://genealogy.math.ndsu.nodak.edu/id.php?id=16641">The Riemann Hypothesis For Polynomials Orthogonal On The Unit Circle</a> (1993), so this is at least coming from someone who's been thinking about the problem for a while and is part of the mathematical community.Isabel Lugohttp://www.blogger.com/profile/15671307315028242949noreply@blogger.comtag:blogger.com,1999:blog-264226589944705290.post-40266687000783078172008-07-01T20:33:00.002-04:002008-07-01T20:36:04.414-04:00Yudkowsky on Bayesian reasoning<a href="http://yudkowsky.net/bayes/bayes.html">An Intuitive Explanation of Bayesian Reasoning</a>, by <a href="http://yudkowsky.net/">Eliezer Yudkowsky</a> (of <a href="http://www.overcomingbias.com/">Overcoming Bias</a> fame).<br /><br />A sequel to this is <A href="http://yudkowsky.net/bayes/technical.html">A Technical Explanation of Technical Explanation</a> [sic].Isabel Lugohttp://www.blogger.com/profile/15671307315028242949noreply@blogger.comtag:blogger.com,1999:blog-264226589944705290.post-78801815525986544652008-06-30T17:50:00.000-04:002008-06-30T17:52:05.113-04:00A tail bound for the normal distributionOften one wants to know the probability that a random variable with the standard normal distribution takes value above <i>x</i> for some positive constant <i>x</i>. <br /><br />(Okay, I'll be honest -- by "one" I mean "me", and the main reason I'm writing this post is to fix this idea in my head so I don't have to go looking for my copy of <a href="http://www.math.cornell.edu/~durrett/books.html">Durrett's</a> text <i>Probability: Theory and Examples</i> every time I want this result. Durrett gives a much shorter proof -- two lines -- on page 6 of that book, but it involves an unmotivated-seeming change of variables, which is why I have trouble remembering it.)<br /><br /> The probability density function of the standard normal is <IMG SRC="http://snappy.at.org/~cola/tex2img/image.php?id=96a1548f888bc35fd02bf1bc5f51ab21" STYLE="vertical-align: middle;" ALT="${1 \over \sqrt{2\pi}} \exp( -x^2/2)$" HEIGHT="29" WIDTH="142">, and so the probability in question is<br /><DIV STYLE="text-align: center;"><IMG SRC="http://snappy.at.org/~cola/tex2img/image.php?id=c7ed4c2e2da435cc817357b13c9680f2" STYLE="vertical-align: middle;" ALT="f(x) = \int_x^\infty {1 \over \sqrt{2\pi}} \exp (-t^2/2) \, dt" HEIGHT="50" WIDTH="280"></DIV> It's a standard fact, but one that I can never remember, that this is bounded above by <IMG SRC="http://snappy.at.org/~cola/tex2img/image.php?id=227be97e49a7a0510e2dbec5109e47be" STYLE="vertical-align: middle;" ALT="${1 \over \sqrt{2\pi} x} \exp(-x^2/2)$" HEIGHT="29" WIDTH="152"> (and furthermore bounded below by 1 - 1/x<sup>2</sup> times the upper bound, so the upper bound's not a bad estimate). <br /><br />How to prove this? Well, here's an idea -- approximate the tail of the standard normal distribution's density function by an exponential. Which exponential? The exponential of the <i>linearization</i> of the exponent at t. The exponent has negative second derivative, so the new exponent is larger (less negative) than the old one and this is an overestimate. That is,<br /><DIV STYLE="text-align: center;"><IMG SRC="http://snappy.at.org/~cola/tex2img/image.php?id=cba972dd625f2a28b198e4062370ca4d" STYLE="vertical-align: middle;" ALT="f(x) &lt; \int_x^\infty {1 \over \sqrt{2\pi}} \exp(-x^2/2-x(t-x)) \, dt" HEIGHT="50" WIDTH="383"></DIV> where the new exponent is the linearization of -t<sup>2</sup>/2 at t=x.<br /><br />Then pull out factors which don't depend on <i>t</i> to get <DIV STYLE="text-align: center;"><IMG SRC="http://snappy.at.org/~cola/tex2img/image.php?id=edf04b534b32befc768f107e783674c4" STYLE="vertical-align: middle;" ALT="{\exp(x^2/2) \over \sqrt{2\pi}}\int_x^\infty \exp(-xt) \, dt" HEIGHT="52" WIDTH="243"></DIV> and doing that last integral gives the desired bound.<br /><br />Basically, the idea is that since the density to the right of <i>x</i> is dropping off as the exponential of a quadratic, most of it's concentrated very close to <i>x</i>, so we might as well approximate the density of the function by the exponential of a <i>linear</i> function, which is easier to work with.<br /><br />By similar means one can show that the expectation of a real number selected from the standard normal distribution, given that it's greater than <i>x</i>, is something like <i>x</i> + 1/<i>x</i>. The tail to the right of <i>x</i> looks like an exponential random variable with mean 1/<i>x</i>. For example, the expectation of a real number selected from the standard normal distribution, conditioned on being larger than 10, is 10.09809.... But this is probably useless, because the probability of a real number selected from the standard normal distribution being larger than 10 is, by the previous bound, smaller than 1 in 10(2&pi;)<sup>1/2</sup>e<sup>50</sup>, or about one in 1.3 x 10<sup>23</sup>.Isabel Lugohttp://www.blogger.com/profile/15671307315028242949noreply@blogger.comtag:blogger.com,1999:blog-264226589944705290.post-58119496417635718562008-06-28T16:31:00.003-04:002008-06-28T16:40:00.921-04:00Baseball batsOh, apparently baseball people use "length-to-weight ratio" to describe a bat, as I learned from the people who talk too much before the Saturday afternoon game on Fox today. This is calculated by taking the weight (in ounces) minus the length (in inches), and in the major leagues can't be less than -3.5. <br /><br />Of course, it's actually a difference, not a ratio.<br /><br />It looks like some people call it the "differential", though, which is fine with me -- to me "differential" has other connotations, but expecting mathematical terminology not to collide with terminology used in other things is a Bad Idea. (Although why not just call it the "difference"?)Isabel Lugohttp://www.blogger.com/profile/15671307315028242949noreply@blogger.comtag:blogger.com,1999:blog-264226589944705290.post-70587316121800172522008-06-28T15:55:00.001-04:002008-06-28T15:56:42.371-04:00A thing I'm tired of hearing"X does especially well/badly in interleague play."<br /><br />Small sample size, people.Isabel Lugohttp://www.blogger.com/profile/15671307315028242949noreply@blogger.com