<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss'><id>tag:blogger.com,1999:blog-6555947</id><updated>2009-11-25T10:04:53.104-07:00</updated><title type='text'>The Geomblog</title><subtitle type='html'>Ruminations on computational geometry, algorithms, theoretical computer science and life</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://geomblog.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6555947/posts/default'/><link rel='alternate' type='text/html' href='http://geomblog.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><link rel='next' type='application/atom+xml' href='http://www.blogger.com/feeds/6555947/posts/default?start-index=26&amp;max-results=25'/><author><name>Suresh</name><uri>http://www.blogger.com/profile/15898357513326041822</uri><email>noreply@blogger.com</email></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>986</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-6555947.post-5890744182516033062</id><published>2009-11-24T22:57:00.002-07:00</published><updated>2009-11-24T23:12:26.947-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='soda'/><category scheme='http://www.blogger.com/atom/ns#' term='community'/><title type='text'>Locating SODA outside the US</title><content type='html'>David Johnson forwards a note from the SODA Steering Committee about the possibility of having SODA outside the US. Summary: it can be done, with the kind of bid information that accompanies bids to other conferences with independent local arrangement. What follows is the note:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;To those who might be interested in having a future SODA conference take place outside North America:&lt;br /&gt;&lt;br /&gt;As you may know, local arrangements for SODA have to date always been handled by SIAM.  The  conference Department at SIAM has wide experience in organizing conferences in North America, but typically relies on help from local organizers when a conference they sponsor takes place elsewhere.&lt;br /&gt;&lt;br /&gt;This presented difficulties when, at last January's SODA, the Business Meeting voted a preference for holding the 2011 SODA in Paris.  SIAM and the SODA steering committee were unable to find any French organization that was prepared to handle the meeting in Paris.  One organization might have been willing to host the meeting in the Parisian suburbs, but we judged that most of the voters for Paris would not have voted for "suburbs of Paris".&lt;br /&gt;&lt;br /&gt;(No hotel was available in the second vote-getter, the Virgin Islands, and so SODA will be back at San Francisco, the 3rd-place choice, in 2011, January 22-25.)&lt;br /&gt;&lt;br /&gt;In light of the experience in finding a location for the 2011 SODA, we are going to change the ground rules a bit for the 2012 site selection. Proposals of sites outside of North America will still be welcome, but to be considered, they must include the details of what group will handle local arrangements, where precisely the conference will be held, and what the expected hotel costs will be.  In other words, the kind of information typically provided by groups proposing to host STOC or FOCS. (Proposals from groups that can collect registrations and take financial responsibility are preferred because of the added costs of SIAM providing these services in a country outside North America.)&lt;br /&gt;&lt;br /&gt;If you are interested in making such a proposal at the SODA 2009 Business Meeting, please contact Kirsten Wilden (wilden@siam.org) for more details about what kinds of information SIAM will need about your proposal.&lt;br /&gt;&lt;br /&gt;There has also been some thought that costs for SODA in North America could be reduced if local arrangements were placed in the hands of local volunteers, as they are for FOCS and STOC.  If anyone wishes to propose to host SODA 2012 in this way, their detailed proposals will also be considered at the Business Meeting.  (Again, contact Kirsten Wilden at SIAM in advance.)&lt;br /&gt;&lt;br /&gt;And of course, proposals for sites that the SIAM conference staff can handle on their own can be made as usual.&lt;br /&gt;&lt;br /&gt;Hope to see you all at SODA 2009 in Austin, January 16-19.&lt;br /&gt;&lt;br /&gt;David Johnson, SODA Steering Committee Chair.&lt;br /&gt;&lt;/blockquote&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6555947-5890744182516033062?l=geomblog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://geomblog.blogspot.com/feeds/5890744182516033062/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=6555947&amp;postID=5890744182516033062' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6555947/posts/default/5890744182516033062'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6555947/posts/default/5890744182516033062'/><link rel='alternate' type='text/html' href='http://geomblog.blogspot.com/2009/11/locating-soda-outside-us.html' title='Locating SODA outside the US'/><author><name>Suresh</name><uri>http://www.blogger.com/profile/15898357513326041822</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='05767624388557425525'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6555947.post-7039038898865862995</id><published>2009-11-08T01:09:00.006-07:00</published><updated>2009-11-08T21:41:08.941-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='miscellaneous'/><category scheme='http://www.blogger.com/atom/ns#' term='society'/><title type='text'>Anathem, and mathematical stereotypes</title><content type='html'>Neal Stephenson is (or should be !) a familiar figure in the sci-fi/speculative fiction landscape: his &lt;a href="http://www.nealstephenson.com/crypt/"&gt;Cryptonomicon&lt;/a&gt; is a retelling of the story of Turing, along with much modern day drama involving advanced crypto and security. His new book, &lt;a href="http://www.nealstephenson.com/anathem/"&gt;Anathem&lt;/a&gt;, came out with much fanfare, and is a vast tale set in a place where mathematics is a pursuit conducted in monastery-like places with strong religious overtones.&lt;br /&gt;&lt;br /&gt;I'm reading Anathem right now, and am only at the beginning (so no spoilers please), but there's already a beautifully rendered discourse on stereotypes of the mathematican (or scientist in general). Inside the 'concent' (the monastery), these are termed 'Iconographies', as formal templates by which to understand how the "saecular" world perceives the mathematicians. I was reminded of this when writing I was writing &lt;a href="http://geomblog.blogspot.com/2009/11/soviet-style-mathematics.html"&gt;the post on Soviet-style mathematics&lt;/a&gt; and realized the stereotypes at the heart of the referenced WSJ article.&lt;br /&gt;&lt;br /&gt;So here are some of the iconographies discussed early on (you get one point for each one that you've encountered in real life):&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Tennestrian&lt;/span&gt;: seemingly clownish eccentric figures that have a darker side, luring impressionables and innocents into folly (&lt;a href="http://lzeit.blogspot.com/2009/02/anathem-iconographies.html"&gt;a play&lt;/a&gt; on the story of Socrates)&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Doxan&lt;/span&gt;: brilliant, unemotional and cold, but as a consequence subordinate to passionate leaders with true human feeling. (&lt;span style="font-style: italic;"&gt;can you say 'Spock'?&lt;/span&gt;)&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Yorran&lt;/span&gt;: Criminally insane mad scientist shut up in a lab with plans to conquer the world. &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Rhetorian&lt;/span&gt;: Sinister conspiracy plotters, out to take over the world by planting minions out in the secular world to attain positions of power. &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Muncostran&lt;/span&gt;: Eccentric, lovable, dishevelled theoreticians, absent-minded and good-natured (&lt;span style="font-style: italic;"&gt;the subtext being: ignorable&lt;/span&gt;) &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Pendarthan&lt;/span&gt;: High-strung nervous meddling know-it-alls, unable to understand the realities of life, always subordinate to more 'masculine' (&lt;span style="font-style: italic;"&gt;his words, not mine&lt;/span&gt;) secular folk. &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Klevan&lt;/span&gt;: Awesomely wise elder statesman who can solve all the problems of the world (&lt;span style="font-style: italic;"&gt;apart from Einstein, I don't know of anyone who achieves such stature in our world&lt;/span&gt;)&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Baudan&lt;/span&gt;: Cynical frauds living in luxury at the expense of the common man (&lt;span style="font-style: italic;"&gt;this sounds like the viewpoint of letter-writers to the Salt Lake Tribune&lt;/span&gt;)&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Penthabrian&lt;/span&gt;: keepers of mystical secrets handed down from above, with 'theory' as a smokescreen used to fool the lay folk (&lt;span style="font-style: italic;"&gt;if only...&lt;/span&gt;)&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Moshianic&lt;/span&gt;: A combination of Klevan and Penthabrian - viewed as the most dangerous iconograph because of the high expectations placed on the theorist shoulders. &lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;It's a neat trick to identify the ways in which the outside world perceives the 'theors', as they are called, and in doing so understand where the outsider is coming from, and what kind of threat they pose. I suspect I'm going to start classifying people in "the real world" the same way when I describe what I do.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6555947-7039038898865862995?l=geomblog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://geomblog.blogspot.com/feeds/7039038898865862995/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=6555947&amp;postID=7039038898865862995' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6555947/posts/default/7039038898865862995'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6555947/posts/default/7039038898865862995'/><link rel='alternate' type='text/html' href='http://geomblog.blogspot.com/2009/11/anathem-and-mathematical-stereotypes.html' title='Anathem, and mathematical stereotypes'/><author><name>Suresh</name><uri>http://www.blogger.com/profile/15898357513326041822</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='05767624388557425525'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6555947.post-3951068804440291075</id><published>2009-11-07T23:32:00.002-07:00</published><updated>2009-11-08T00:11:16.677-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='miscellaneous'/><title type='text'>Soviet-style mathematics</title><content type='html'>Via Anand Kulkarni (aka &lt;a href="http://twitter.com/polybot/status/5509519576"&gt;polybot&lt;/a&gt;) comes an interesting article in the WSJ by Masha Gessen on Grigori Perelman, Soviet-era mathematics and the question of 'big math'. The premise of the article (Masha Gessen has a book out on Perelman and the Poincare conjecture) is that special environments are needed to prove big results, and the Soviet-era mathematical enclaves fostered this environment both because of, and inspite of the Soviet political system. &lt;br /&gt;&lt;br /&gt;It is indeed true that amazing work came out of the isolated confines of Soviet mathematical institutes, often parallel to or well before similar work in the Western world. There's a joke that goes around theoryCS circles that for every theorem proved before the 80s in the west, there's an equivalent result proved 10 years earlier by a Russian mathematician. We need look no further than the Cook-Levin theorem, the Koebe-Andreev-Thurston theorem (on circle packings), Kolmogorov-Chaitin-Solomonoff complexity (and according to some, the &lt;a href="http://geomblog.blogspot.com/2007/04/cauchy-bunyakovsky-schwarz-inequality.html"&gt;Cauchy-BUNYAKOVSKY-Schwarz&lt;/a&gt; inequality, &lt;a href="http://geomblog.blogspot.com/2007/04/cauchy-bunyakovsky-schwarz-inequality.html?showComment=1189265700000#c7516130803219960110"&gt;though this is disputed&lt;/a&gt;).&lt;br /&gt;&lt;br /&gt;But in the article is a more thought-provoking claim:&lt;br /&gt;&lt;blockquote&gt;The flow is probably unstoppable by now: A promising graduate student in Moscow or St. Petersburg, unable to find a suitable academic adviser at home, is most likely to follow the trail to the U.S.&lt;br /&gt;&lt;br /&gt;But the math culture they find in America, while less back-stabbing than that of the Soviet math establishment, is far from the meritocratic ideal that Russia's unofficial math world had taught them to expect. American math culture has intellectual rigor but also suffers from allegations of favoritism, small-time competitiveness, occasional plagiarism scandals, as well as the usual tenure battles, funding pressures and administrative chores that characterize American academic life. This culture offers the kinds of opportunities for professional communication that a Soviet mathematician could hardly have dreamed of, but it doesn't foster the sort of luxurious, timeless creative work that was typical of the Soviet math counterculture.&lt;br /&gt;&lt;br /&gt;For example, the American model may not be able to produce a breakthrough like the proof of the Poincaré Conjecture, carried out by the St. Petersburg mathematician Grigory Perelman.&lt;/blockquote&gt;&lt;br /&gt;This is a reflection of one of the enduring myths of mathematical research, "a mathematician would be happy in jail if they had paper and pen", with a bit of the 'a mathematician is a solitary (and slightly crazy) genius'. I can see the allure in the idea: mathematics requires great concentration, and removal of distractions would surely make it easier to focus on a big problem.&lt;br /&gt;&lt;br /&gt;But is this really impossible to achieve in the Western model of research ? After all, even Perelman's work built heavily on a program first outlined by Richard Hamilton from Columbia. Andrew Wiles proved Fermat's theorem while at Princeton. Ketan Mulmuley has been banging away at P vs NP while shuttling between Chicago and IIT Bombay (yes, I know it's not a perfect comparison because it hasn't been resolved yet). Stephen Cook proved that SAT is NP-Complete while at Toronto. And so on and so forth. &lt;br /&gt;&lt;br /&gt;Possibly one argument in favor of the 'isolation: good' theory is that Perelman didn't need to prove himself for 6-7 years, maintain a steady stream of funding, and teach lots of classes in order to "earn" the right to study such a hard problem. It's hard to imagine a researcher in the US being able to do this before they get some kind of job security (tenure, or otherwise).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6555947-3951068804440291075?l=geomblog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://geomblog.blogspot.com/feeds/3951068804440291075/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=6555947&amp;postID=3951068804440291075' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6555947/posts/default/3951068804440291075'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6555947/posts/default/3951068804440291075'/><link rel='alternate' type='text/html' href='http://geomblog.blogspot.com/2009/11/soviet-style-mathematics.html' title='Soviet-style mathematics'/><author><name>Suresh</name><uri>http://www.blogger.com/profile/15898357513326041822</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='05767624388557425525'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6555947.post-5106444063251275153</id><published>2009-11-02T01:19:00.004-07:00</published><updated>2009-11-02T10:42:36.511-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='research'/><category scheme='http://www.blogger.com/atom/ns#' term='ICS'/><title type='text'>Innovation in Computer Science</title><content type='html'>As the polylogblogdogslogglog blog &lt;a href="http://polylogblog.wordpress.com/2009/11/01/ics-accepts/"&gt;points out&lt;/a&gt;, the &lt;a href="http://conference.itcs.tsinghua.edu.cn/ICS2010/content/abstracts.html"&gt;ICS results are out&lt;/a&gt;. 39 papers were accepted in all - at some point I knew the number of submissions, but I've forgotten since.&lt;br /&gt;&lt;br /&gt;The ICS folks didn't make life easy for themselves by explicitly stating that they wanted "conceptual contributions". But looking over the list of papers, a few things come to mind:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;It's a great list of papers. Nothing to complain about really, and any of these could have been a credible paper at FOCS/STOC&lt;/li&gt;&lt;li&gt;The Arora et al paper on designing derivatives using NP-hard problems has already received so much chatter, one might argue that the conference mandate has already been satisfied. Similarly for the quantum money followup.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Even if the whole 'conceptual contributions' thing doesn't pan out, I see no harm in having a third conference inserted between FOCS and STOC - the more the merrier.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;I guess innovation = "game theory + crypto + quantum + misc" :)&lt;/li&gt;&lt;/ul&gt;A side note. 8 of the 12 PC members have papers at the conference. This is a departure from the theory "norm". While I don't necessarily think it's a problem (everyone except theoryCS does this, and this is also a new conference), it's worth discussing. Is this something we'd like to have happen in other theory conferences as well ? Recall that the first year of the SODA 2-page experiment also had PC-submitted papers (and that was mainly to ensure submission volume, from what I recall).&lt;br /&gt;&lt;br /&gt;Update: &lt;a href="http://bit.ly/p6QtU"&gt;Shiva Kintali has PDFs&lt;/a&gt; for the accepted papers.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6555947-5106444063251275153?l=geomblog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://geomblog.blogspot.com/feeds/5106444063251275153/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=6555947&amp;postID=5106444063251275153' title='8 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6555947/posts/default/5106444063251275153'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6555947/posts/default/5106444063251275153'/><link rel='alternate' type='text/html' href='http://geomblog.blogspot.com/2009/11/innovation-in-computer-science.html' title='Innovation in Computer Science'/><author><name>Suresh</name><uri>http://www.blogger.com/profile/15898357513326041822</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='05767624388557425525'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>8</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6555947.post-1451378175113158040</id><published>2009-11-01T17:39:00.006-07:00</published><updated>2009-11-02T01:19:11.410-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='research'/><category scheme='http://www.blogger.com/atom/ns#' term='topology'/><title type='text'>What is computational topology ?</title><content type='html'>&lt;a href="http://3dpancakes.typepad.com/ernie/2009/11/spoiled-spoiled-spoiled.html"&gt;Jeff's post&lt;/a&gt; on his computational topology class (and &lt;a href="http://compgeom.cs.uiuc.edu/~jeffe/teaching/comptop/schedule.html"&gt;the wonderful class outline&lt;/a&gt;), prompted me to write about something that I've tried to explain to people before. &lt;br /&gt;&lt;br /&gt;Computational topology is arguably the hottest thing in SoCG-land right now, and has been so for a number of years (for curious folk, the "other" hot topic is high dimensional approximate geometry). But if you ask different people, you'll get different answers to the question "what is computational topology ?". I've even had to explain to local students why my computational geometry class is different from the computational topology class being offered. So here goes:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;CT I: Using topological ideas to understand the 'shape' of data&lt;/span&gt;&lt;br /&gt;This is the version of CT that has taken up the largest fraction of CT mindshare in the SoCG land. Dating back to work by Edelsbrunner and Mucke on &lt;a href="http://biogeometry.duke.edu/software/alphashapes/index.html"&gt;alpha-shapes&lt;/a&gt;, the field now encompasses a range of ideas for extracting topological structure from data. New topological constructs that have come from this area include various notions of persistence, as well as related work on reconstruction, data smoothing, and visualization.&lt;br /&gt;&lt;br /&gt;Afra Zomorodian's thesis (&lt;a href="http://www.amazon.com/Computing-Cambridge-Monographs-Computational-Mathematics/dp/0521136091/ref=sr_1_1?ie=UTF8&amp;s=books&amp;qid=1254527757&amp;sr=1-1"&gt;now a book&lt;/a&gt;) is a nice introduction to these concepts for a CS audience. &lt;a href="http://www.amazon.com/Computational-Topology-Introduction-Herbert-Edelsbrunner/dp/0821849255/ref=sr_1_3?ie=UTF8&amp;s=books&amp;qid=1257149385&amp;sr=1-3"&gt;Herbert Edelsbrunner is coming out with a book&lt;/a&gt; on this topic very soon (Jan 16, 2010! Mark your amazons!), and Valerio Pascucci teaches a class on computational topology at the U. &lt;br /&gt;&lt;span style="font-weight:bold;"&gt;&lt;br /&gt;CT II: Asking computational questions about fundamental topological objects and concepts.&lt;/span&gt;&lt;br /&gt;This is closest to the 'Computational X' flavor of work, where a computational perspective is brought to the study of X. There are many interesting problems here, and they have a nice discrete flavor that makes them 'friendly' for theoryCS folk. For example, computing the Betti numbers of a simplicial complex efficiently, or finding homotopy equivalent paths on a surface, or lots of work on graphs on surfaces. &lt;br /&gt;&lt;br /&gt;I don't think there's a single book on this topic. JeffE has put together a fantastic course outline (with reading material), and there's also a great &lt;a href="http://www.fmf.uni-lj.si/~mohar/Book.html"&gt;book on graphs on surfaces&lt;/a&gt; by Mohar and Thomasson. It's worth noting that some of the deeper tools in the Robertson-Seymour graph  minor results take us into graphs-on-surfaces-of-large-genus land. &lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;CT III: Using topological ideas in the heart of complexity theory. &lt;/span&gt;&lt;br /&gt;Almost no one I know uses 'computational topology' in this sense, and there isn't a coherent and connected body of work to talk about as such. But there are some fascinating results at the core of complexity theory that rely on topological constructions. There's the algebraic complexity work of Yao/Steele/Ben-Or and others, showing that lower bounds for algebraic complexity of certain problems can be related to the sum of Betti numbers of associated surfaces. There's the Kahn-Saks-Sturtevant (and Chakrabarti-Khot-Shi) work on evasiveness of graph properties, and then the work (that I don't quite understand) on topology in distributed computing (that got Herlihy, Saks, Shavit and Zahoroglou the &lt;a href="http://en.wikipedia.org/wiki/G%C3%B6del_Prize"&gt;Godel Prize&lt;/a&gt;)&lt;br /&gt;&lt;br /&gt;This is one of the reasons I think a course in topology (of some flavor, whether it be combinatorial, algebraic or point-set) should be required mathematical background for all theoryCS students.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6555947-1451378175113158040?l=geomblog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://geomblog.blogspot.com/feeds/1451378175113158040/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=6555947&amp;postID=1451378175113158040' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6555947/posts/default/1451378175113158040'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6555947/posts/default/1451378175113158040'/><link rel='alternate' type='text/html' href='http://geomblog.blogspot.com/2009/11/what-is-computational-topology.html' title='What is computational topology ?'/><author><name>Suresh</name><uri>http://www.blogger.com/profile/15898357513326041822</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='05767624388557425525'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6555947.post-5733411864284967348</id><published>2009-10-28T21:19:00.001-06:00</published><updated>2009-10-28T21:21:23.232-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='clustering'/><title type='text'>Clustering interlude: Time series</title><content type='html'>In the absence of an actual post on clustering, I encourage all of you to go and read Sorelle's (or should I say ***SORELLE***'s) post on &lt;a href="http://kdphd.blogspot.com/2009/10/clustering-over-space-and-time.html"&gt;time series clustering&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6555947-5733411864284967348?l=geomblog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://geomblog.blogspot.com/feeds/5733411864284967348/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=6555947&amp;postID=5733411864284967348' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6555947/posts/default/5733411864284967348'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6555947/posts/default/5733411864284967348'/><link rel='alternate' type='text/html' href='http://geomblog.blogspot.com/2009/10/clustering-interlude-time-series.html' title='Clustering interlude: Time series'/><author><name>Suresh</name><uri>http://www.blogger.com/profile/15898357513326041822</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='05767624388557425525'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6555947.post-5432654261555437789</id><published>2009-10-19T01:56:00.002-06:00</published><updated>2009-10-19T02:06:02.435-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='productivity'/><category scheme='http://www.blogger.com/atom/ns#' term='software'/><title type='text'>"Wave"ing...</title><content type='html'>(&lt;span style="font-style: italic;"&gt;while I wait for actual inspiration to strike&lt;/span&gt;)&lt;br /&gt;&lt;br /&gt;I just acquired an invite to Google Wave (thanks to &lt;a href="http://ontopo.wordpress.com/"&gt;John Moeller&lt;/a&gt;), and have been finding it quite intriguing (&lt;span style="font-style: italic;"&gt;note: you can't "get" Google Wave unless you're invited, and no, I don't have any invites to give out&lt;/span&gt;).&lt;br /&gt;&lt;br /&gt;Google Wave is a mysterious combination of email, chat, wikis and browser extensions that's hard to describe - you can read descriptions of it all over the web, but unless you are able to get in and start playing, it's like trying to buy wine based on text descriptions of the taste.&lt;br /&gt;&lt;br /&gt;So far I've used it to:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;discuss an outline for next semester's seminar with my students (Topics in Graph Algorithms, for those curious)&lt;/li&gt;&lt;li&gt;Start working with a collaborator on an outline for a tutorial we want to do&lt;/li&gt;&lt;li&gt;set up administrative struts for our research group svn server (I've started using svn for writing papers - it's an amazing experience - more on that some other time)&lt;/li&gt;&lt;/ul&gt;By far the coolest feature of Wave for me is that you can include a LaTeX extension into any "wave"or conversation, and it automatically converts things between $$...$$ marks into latex, which makes me hopeful that it will be useful for more substantive discussions (since I often wish I had such capabilities in Skype/Google chat)&lt;br /&gt;&lt;br /&gt;Although the preview is painfully slow, and is crippled in various ways, the potential is clearly there, and as more people get on it, it will only start getting more effective. I'm looking forward to it.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6555947-5432654261555437789?l=geomblog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://geomblog.blogspot.com/feeds/5432654261555437789/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=6555947&amp;postID=5432654261555437789' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6555947/posts/default/5432654261555437789'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6555947/posts/default/5432654261555437789'/><link rel='alternate' type='text/html' href='http://geomblog.blogspot.com/2009/10/waveing.html' title='&quot;Wave&quot;ing...'/><author><name>Suresh</name><uri>http://www.blogger.com/profile/15898357513326041822</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='05767624388557425525'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6555947.post-745158511677942557</id><published>2009-10-12T03:47:00.002-06:00</published><updated>2009-10-12T04:00:35.353-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='research'/><category scheme='http://www.blogger.com/atom/ns#' term='writing'/><title type='text'>On the virtue of NOT working on a problem</title><content type='html'>Semester has hit with a vengeance, and while I've been busy (among other things) with &lt;a href="http://apollonius.cs.utah.edu/web/2009/10/10/computing-hulls-in-positive-definite-space/"&gt;this&lt;/a&gt; and &lt;a href="http://apollonius.cs.utah.edu/web/2009/10/10/matching-shapes-using-the-current-distance/"&gt;this&lt;/a&gt;, my &lt;a href="http://geomblog.blogspot.com/search/label/clustering"&gt;clustering series&lt;/a&gt; has gone on temporary hiatus, hopefully to return shortly.&lt;br /&gt;&lt;br /&gt;In all the pages and pages of advice given to grad students, postdocs, and starting faculty, I think one item tends to get left by the wayside, or at least is not explicitly stated.&lt;br /&gt;&lt;blockquote&gt;You always underestimate the time spent managing a project from start to finish.&lt;/blockquote&gt;What I mean is this: problems (at least in theoryCS) are easy to state, and fun to work on. Sometimes they take a while to crack, and sometimes they give up their secrets easily. But the time you spend on any given project is much more than the actual time spent thinking about it. There's&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Writing up the first few drafts&lt;/li&gt;&lt;li&gt;Iterating to get a polished submission version&lt;/li&gt;&lt;li&gt;(...this step repeats until the paper is accepted)&lt;/li&gt;&lt;li&gt;Preparing the final (often very cramped) version&lt;/li&gt;&lt;li&gt;Making slides for the talk/talks you'll be giving&lt;/li&gt;&lt;li&gt;Preparing a full/journal/arxiv version, which often involves simplifying, rewriting, reproving, adding new references, etc etc. &lt;/li&gt;&lt;li&gt;Submitting to a journal, and waiting endlessly for updates on its status.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Addressing reviewer concerns, and resubmitting&lt;/li&gt;&lt;li&gt;And finally, getting it into print.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;The few-months thrill of actually thinking about the problem and solving it ends up being a multi-year odyssey filled with many fallow periods punctuated by bursts of activity.&lt;br /&gt;&lt;br /&gt;It's not so much the time involved - papers tend to time-multiplex quite well so you're usually in different phases of the above sequence for different papers.&lt;br /&gt;&lt;br /&gt;It's more a matter of motivation. I don't think I'm the only person who feels this, but once I have some nice results, and especially if there isn't follow-on work to be done, I get bored with a paper. Having to deal with it for months and months afterwards is then as excruciating as killing off zombies that keep coming back (not to mention what happens if it keeps getting rejected).&lt;br /&gt;&lt;br /&gt;So be careful when you choose a project: make sure it can last through at least a few papers, or you'll be spending a lot of time cursing yourself for the time you spend.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6555947-745158511677942557?l=geomblog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://geomblog.blogspot.com/feeds/745158511677942557/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=6555947&amp;postID=745158511677942557' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6555947/posts/default/745158511677942557'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6555947/posts/default/745158511677942557'/><link rel='alternate' type='text/html' href='http://geomblog.blogspot.com/2009/10/on-virtue-of-not-working-on-problem.html' title='On the virtue of NOT working on a problem'/><author><name>Suresh</name><uri>http://www.blogger.com/profile/15898357513326041822</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='05767624388557425525'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6555947.post-6550905757256912364</id><published>2009-09-27T00:34:00.003-06:00</published><updated>2009-09-27T01:28:56.737-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='rajeev motwani'/><category scheme='http://www.blogger.com/atom/ns#' term='memorial'/><title type='text'>Rajeev Motwani Memorial</title><content type='html'>I just returned from the &lt;a href="http://cs.stanford.edu/randomalg/"&gt;technical workshop and memorial&lt;/a&gt; in honor of Rajeev Motwani. The workshop was excellently run by &lt;a href="http://www.stanford.edu/~ashishg/"&gt;Ashish Goel&lt;/a&gt;, and designed (extremely well, I thought) as a mix of retrospective and technical content, with one speaker presenting a brief retrospective and introduction, and a technical speaker laying out a body of work. &lt;br /&gt;&lt;br /&gt;The topics were varied: they started with &lt;a href="http://www.cis.upenn.edu/~sanjeev/"&gt;Sanjeev Khanna&lt;/a&gt; discussing matchings in random graphs (Rajeev's thesis work), and going onto brand new results in this area: for example, an expected time O(n log n) bound for matchings in d-regular bipartite graphs. &lt;a href="http://people.csail.mit.edu/indyk/"&gt;Piotr Indyk&lt;/a&gt; talked about locality-sensitive hashing, &lt;a href="http://people.csail.mit.edu/karger/"&gt;David Karger&lt;/a&gt; talked about randomized min cuts, &lt;a href="http://theory.stanford.edu/~aneeshs/"&gt;Aneesh Sharma&lt;/a&gt; discussed monetization problems in social networks, and &lt;a href="http://www.cis.upenn.edu/~sudipto/"&gt;Sudipto Guha&lt;/a&gt; concluded with a overview of data streams. &lt;br /&gt;&lt;br /&gt;The talks were surprisingly technical: if I closed my eyes, I could have easily imagined being in a SODA conference room. The only difference was that people were actually paying attention, as opposed to clicking away on laptops (or tweeting!). It was a large crowd: over 100 people, by my casual count.&lt;br /&gt;&lt;br /&gt;There were many retrospectives, given by &lt;a href="http://www.eecs.berkeley.edu/~karp/"&gt;Dick Karp&lt;/a&gt;, &lt;a href="http://infolab.stanford.edu/~ullman/"&gt;Jeff Ullman&lt;/a&gt;, &lt;a href="http://www.cs.uiuc.edu/~chekuri/"&gt;Chandra Chekuri&lt;/a&gt;, and &lt;a href="http://en.wikipedia.org/wiki/Ron_Conway"&gt;Ron Conway&lt;/a&gt;. Chandra spoke in particular about the experience of being Rajeev's student, and as a former student myself, his words touched me the most. He talked with feeling, compassion and honesty, and drew a compelling picture of a man that we began to know all over again. &lt;br /&gt;&lt;br /&gt;There was a beautiful memorial service in the Stanford Church, with words in English and Sanskrit, a hauntingly beautiful hymn from the Vedas sung by Rajeev's elder daughter, and testimonials from colleagues and friends old and new. Don Knuth was the organist for the entire ceremoney, and played pieces you didn't think could be played on a church organ. After the service, and a reception, there was a concert by one of Rajeev's favorite bands, &lt;a href="http://www.indianoceanmusic.com/"&gt;Indian Ocean&lt;/a&gt;. They played amazing music, and I'm downloading their songs as we speak, but that's a tale for another time. &lt;br /&gt;&lt;br /&gt;It was good to go back and meet people who I knew so well for a brief period of time, and then lost touch with. Many (if not all) of Rajeev's former students were there, and there were many others who cohabited the Gates Building along with me. All of us older, a little grayer, but still recognizable :). Spread-apart families often only get together at weddings or at funerals, and this was one of those occasions where it was great to see everyone, but as we all kept murmuring "unfortunately it had to happen like this". &lt;br /&gt;&lt;br /&gt;If I had to describe the feeling that dominated my thinking that day, it was a sense of being robbed. Upon hearing testimonial after testimonial, anecdote after anecdote, listening to this divine rock group that Rajeev listened to and loved, I could only wonder at the many sides of this person whom I knew so little of. I wished I had known more about him: that our interactions had been more multidimensional than that of advisor and student, and that I (and my fellow students at the time) had seen more of the ebullience and vivacity that others spoke so vividly of. &lt;br /&gt;&lt;br /&gt;By the end, a new picture began to emerge, of a 'hub', a 'connector' and a 'facilitator', someone who had the clarity to know what people really needed to succeed, and the self-effacement to stand back and make it happen, by connecting people together. He helped legions, and legions came to bid him farewell. &lt;br /&gt;&lt;br /&gt;It therefore seems oddly fitting that his career in research started with studying random matchings, and ended with new explorations of social networks. His life, one might think, has always been about creating, analyzing and enriching connections.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6555947-6550905757256912364?l=geomblog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://geomblog.blogspot.com/feeds/6550905757256912364/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=6555947&amp;postID=6550905757256912364' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6555947/posts/default/6550905757256912364'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6555947/posts/default/6550905757256912364'/><link rel='alternate' type='text/html' href='http://geomblog.blogspot.com/2009/09/rajeev-motwani-memorial.html' title='Rajeev Motwani Memorial'/><author><name>Suresh</name><uri>http://www.blogger.com/profile/15898357513326041822</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='05767624388557425525'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6555947.post-5988792808403322252</id><published>2009-09-24T02:41:00.002-06:00</published><updated>2009-09-24T02:49:37.585-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='focs'/><category scheme='http://www.blogger.com/atom/ns#' term='fwcg'/><category scheme='http://www.blogger.com/atom/ns#' term='conferences'/><title type='text'>Recent items</title><content type='html'>Deadlines are keeping me busy, and away from blogging. &lt;br /&gt;&lt;br /&gt;Speaking of which, the &lt;a href="http://www.cs.tufts.edu/research/geometry/FWCG09/"&gt;Fall Workshop on Computational Geometry&lt;/a&gt; has a submission deadline this Friday. The Fall workshop is a "true" workshop, in that you go, give a talk on whatever you're working on, and there's no messing around with proceedings, publications and things like that. It's a pure meet-and-chat kind of venue, which means the pressure is low, and the visibility is quite high (many of the east-coast geometry folks show up). &lt;br /&gt;&lt;br /&gt;This year it's in Tufts, so the location is even better. So get those 2-page abstracts in ! &lt;br /&gt;&lt;br /&gt;In other news, &lt;a href="http://www.cc.gatech.edu/focs2009/"&gt;FOCS registration deadline&lt;/a&gt; is looming. Bill mentions a &lt;a href="http://www.aco.gatech.edu/conference/focs-aco/"&gt;workshop to celebrate 50 years of FOCS&lt;/a&gt; (which dates back to when it was the conference on switching and circuits (!)) and 20 years of the GATech ACO program. &lt;br /&gt;&lt;br /&gt;The workshop is NOT like the Fall workshop :). It's a series of invited talks by a star-studded cast: KARP !! YANNAKAKIS !! ALON !! BLUM !! All together, for one brief engagement !!&lt;br /&gt;&lt;br /&gt;Sounds like a great idea: the kind of thing we probably need to do more of to &lt;a href="http://blog.computationalcomplexity.org/2009/09/why-you-shouldn-not-go-to-focs.html"&gt;get more folks to the conference&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6555947-5988792808403322252?l=geomblog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://geomblog.blogspot.com/feeds/5988792808403322252/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=6555947&amp;postID=5988792808403322252' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6555947/posts/default/5988792808403322252'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6555947/posts/default/5988792808403322252'/><link rel='alternate' type='text/html' href='http://geomblog.blogspot.com/2009/09/recent-items.html' title='Recent items'/><author><name>Suresh</name><uri>http://www.blogger.com/profile/15898357513326041822</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='05767624388557425525'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6555947.post-4247062473919805862</id><published>2009-09-16T03:17:00.003-06:00</published><updated>2009-09-16T03:20:13.800-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='rajeev motwani'/><category scheme='http://www.blogger.com/atom/ns#' term='memorial'/><title type='text'>Memorial Workshop for Rajeev Motwani: Update</title><content type='html'>&lt;a href="http://geomblog.blogspot.com/2009/09/memorial-workshop-for-rajeev-motwani.html"&gt;I had mentioned&lt;/a&gt; a memorial workshop for Rajeev Motwani to be held at Stanford Friday Sep 25. &lt;a href="http://cs.stanford.edu/randomalg/"&gt;Registration is now open&lt;/a&gt; for the workshop and the memorial service to follow. &lt;br /&gt;&lt;br /&gt;Registration is free, but mandatory. So if you plan on attending either the workshop or the service or both, make sure you register.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6555947-4247062473919805862?l=geomblog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://geomblog.blogspot.com/feeds/4247062473919805862/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=6555947&amp;postID=4247062473919805862' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6555947/posts/default/4247062473919805862'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6555947/posts/default/4247062473919805862'/><link rel='alternate' type='text/html' href='http://geomblog.blogspot.com/2009/09/memorial-workshop-for-rajeev-motwani_16.html' title='Memorial Workshop for Rajeev Motwani: Update'/><author><name>Suresh</name><uri>http://www.blogger.com/profile/15898357513326041822</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='05767624388557425525'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6555947.post-2548274879348122325</id><published>2009-09-16T02:57:00.002-06:00</published><updated>2009-09-16T03:16:22.649-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='ipe'/><category scheme='http://www.blogger.com/atom/ns#' term='latex'/><category scheme='http://www.blogger.com/atom/ns#' term='beamer'/><title type='text'>Beamer + Ipe + views...</title><content type='html'>At this point, mostly everyone is aware of how to use beamer to make presentations in LaTeX. However, many fewer people (mostly only geometry folk) are aware of &lt;a href="http://tclab.kaist.ac.kr/ipe/"&gt;Ipe&lt;/a&gt;, a kind of next-generation xfig.&lt;br /&gt;&lt;br /&gt;You may stop reading right now if&lt;br /&gt;&lt;ul&gt;&lt;li&gt;you always use powerpoint for slides OR&lt;/li&gt;&lt;li&gt;you rarely have to use LaTeX in slides OR&lt;/li&gt;&lt;li&gt;you really hate non-WYSIWYG presentation software&lt;/li&gt;&lt;/ul&gt;Still here ? ok...&lt;br /&gt;&lt;br /&gt;I wanted to share  a workflow tip that I found quite useful when making slides with step-through animations. Suppose you have a sequence of slides in a presentation that unfold a figure step by step, or animate some algorithm execution etc. Ipe, coupled with a few LaTeX commands, provides a really nifty way of rendering the animation without jumps, misalignments, or misdrawings.&lt;br /&gt;&lt;br /&gt;Ipe (and many other drawing programs) has the notion of a layer. More powerfully, Ipe also has the notion of a 'view', which you can think of as a (sub)set of layers. For example, if you have a drawing with layers 1,2,3,4,5, then view 1 might consist of {1,2,3}, and view 2 might consist of {1,2,5}, and so on.&lt;br /&gt;&lt;br /&gt;What this means is that when you want to do step-animation, it's really easy. Each time you move to a new step, you create a new view in Ipe (which also usually creates a new layer), and you can select whichever subset of the current set of layers you want to render, as well as drawing new ones.&lt;br /&gt;&lt;br /&gt;Ipe stores all the views as separate pages in a PDF file, so your final animation consists of a multi-page PDF file. And now comes the cool part.&lt;br /&gt;&lt;br /&gt;Suppose you want to include this sequence of views in a beamer slide, with each view appearing after the next in response to a mouse click. You need two things:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;pdftk (which comes standard in most linux installations), which allows you to split a PDF file into multiple files (one per page), with any format for the filename that you specify. For example, I have a command called 'splitpdf' that does this:&lt;blockquote&gt;pdftk $1.pdf burst output $1-%d.pdf&lt;/blockquote&gt;&lt;br /&gt;which takes a file name.pdf and splits it into name-1.pdf, name-2.pdf and so on.&lt;br /&gt;&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Next, you need the (standard) LaTeX package 'xmpmulti' which gives you the command \multiinclude. It allows you to include multiple figures that share a common prefix. So for example, to include all the figures created by the previous pdftk command, you merely write &lt;blockquote&gt;\multiinclude[start=1,format=pdf]{name}&lt;/blockquote&gt;.The 'start=1' starts counting from 1 instead of 0, and you can also specify an end-counter.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;But the best part is when you instead use&lt;br /&gt;&lt;blockquote&gt;\multiinclude[&amp;lt;+&amp;gt;][start=1,format=pdf]{name}&lt;/blockquote&gt;Now, the files are included with an automatic 'replace each by the next' mode (&amp;lt;+&amp;gt; is standard beamer notation for this). At this point, you have a sequence of animations ready to go. In fact, when I give lectures, I have a number of slides that just look like this:&lt;br /&gt;&lt;blockquote&gt;\begin{frame}&lt;br /&gt;  \frametitle{Title}&lt;br /&gt;  \mi{name}&lt;br /&gt;\end{frame}&lt;/blockquote&gt;&lt;br /&gt;where \mi is a macro I defined for the above multiinclude. Ipe does all the layering and viewing work for me, and multiinclude takes care of the rest. This has made making complicated animations really simple and fast. &lt;br /&gt;&lt;br /&gt;p.s if you're still wondering why one should use Ipe instead of xfig, the LaTeX integration in Ipe is superb. No nonsense with special flags, and pstex_t and craziness like that. You get WYSIWYG LaTeX inside Ipe, you can use whatever macros you have in your text, and the various nifty alignment tools make an Ipe drawing look really clean.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6555947-2548274879348122325?l=geomblog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://geomblog.blogspot.com/feeds/2548274879348122325/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=6555947&amp;postID=2548274879348122325' title='13 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6555947/posts/default/2548274879348122325'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6555947/posts/default/2548274879348122325'/><link rel='alternate' type='text/html' href='http://geomblog.blogspot.com/2009/09/beamer-ipe-views.html' title='Beamer + Ipe + views...'/><author><name>Suresh</name><uri>http://www.blogger.com/profile/15898357513326041822</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='05767624388557425525'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>13</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6555947.post-2877403246708414847</id><published>2009-09-11T00:22:00.002-06:00</published><updated>2009-09-11T00:26:26.985-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='rajeev motwani'/><category scheme='http://www.blogger.com/atom/ns#' term='memorial'/><title type='text'>Memorial Workshop for Rajeev Motwani</title><content type='html'>Plans have been in the offing for a memorial workshop for Rajeev Motwani, and now the details are available. Below is the announcement (sent by Ashish Goel):&lt;br /&gt;&lt;blockquote&gt;Dear friends,&lt;br /&gt;&lt;br /&gt;As you might know, our dear friend and colleague, Rajeev Motwani, passed away in a tragic accident on June 5, 2009. We are holding a technical workshop titled&lt;br /&gt;&lt;br /&gt;        Randomized Algorithms: Theory and Applications&lt;br /&gt;&lt;br /&gt;at Stanford University on Sep 25th, 2009, from 10am - 2:30pm to honor Rajeev's research in the area of algorithms and their applications. If you did not know Rajeev's research, please see &lt;a class="moz-txt-link-freetext" href="http://www.stanford.edu/%7Eashishg/cgi-bin/RememberingRajeev/"&gt;http://www.stanford.edu/~ashishg/cgi-bin/RememberingRajeev/&lt;/a&gt; for a brief introduction.&lt;br /&gt;&lt;br /&gt;The workshop will take place at the Bechtel Conference Center in Encina Hall. Workshop attendees are also invited to a memorial celebration starting at 4:00pm at the Stanford Memorial church, followed by a performance by one of Rajeev Motwani's favorite bands, Indian Ocean. The registration URL, web-page information, parking directions, and talk titles will follow in a later email.&lt;br /&gt;&lt;br /&gt;Registration will be free, but mandatory.  Please feel free to bring this to the attention of any colleagues/students who you think might wish to attend, and send me an email if you have any questions.&lt;br /&gt;&lt;br /&gt;Workshop program:&lt;br /&gt;&lt;br /&gt;10 - 10:45 am&lt;br /&gt;Welcome remarks&lt;br /&gt;Retrospective by Richard Karp, UC Berkeley&lt;br /&gt;Technical talk by Sanjeev Khanna, University of Pennsylvania&lt;br /&gt;&lt;br /&gt;10:45 - 11:30 am&lt;br /&gt;Retrospective by Jeff Ullman, Stanford University&lt;br /&gt;Technical talk by Piotr Indyk, Massachusetts Institute of Technology&lt;br /&gt;&lt;br /&gt;11:30 am - 12:15 pm&lt;br /&gt;Retrospective by Chandra Chekuri, University of Urbana-Champaign&lt;br /&gt;Technical talk by David Karger, Massachusetts Institute of Technology&lt;br /&gt;&lt;br /&gt;12:15 - 1:30 pm&lt;br /&gt;Lunch for registered attendees&lt;br /&gt;&lt;br /&gt;1:30 - 2:30 pm&lt;br /&gt;Retrospective by Ron Conway, Venture Capitalist&lt;br /&gt;Technical talk by Sudipto Guha, University of Pennsylvania&lt;br /&gt;Technical talk by Aleksandra Korolova, Stanford University&lt;br /&gt;&lt;br /&gt;The Scientific committee for the workshop consisted of:&lt;br /&gt;Moses Charikar&lt;br /&gt;Ashish Goel&lt;br /&gt;Richard Karp&lt;br /&gt;Prabhakar Raghavan&lt;br /&gt;Tim Roughgarden&lt;br /&gt;&lt;/blockquote&gt;I'll update this post with the registration URL when it becomes available.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6555947-2877403246708414847?l=geomblog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://geomblog.blogspot.com/feeds/2877403246708414847/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=6555947&amp;postID=2877403246708414847' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6555947/posts/default/2877403246708414847'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6555947/posts/default/2877403246708414847'/><link rel='alternate' type='text/html' href='http://geomblog.blogspot.com/2009/09/memorial-workshop-for-rajeev-motwani.html' title='Memorial Workshop for Rajeev Motwani'/><author><name>Suresh</name><uri>http://www.blogger.com/profile/15898357513326041822</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='05767624388557425525'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6555947.post-1732376682762657003</id><published>2009-09-09T12:06:00.003-06:00</published><updated>2009-09-09T21:34:17.903-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='esa'/><category scheme='http://www.blogger.com/atom/ns#' term='conferences'/><title type='text'>Geometry at ESA (guest post)</title><content type='html'>&lt;span style="font-style: italic;"&gt;(ed. note: Jeff Phillips is a researcher in computational geometry, data analysis and statistics. He's also one of the newly minted &lt;a style="font-style: italic;" href="http://cifellows.org/"&gt;CIFellows&lt;/a&gt;, and will be working with me starting in a week. He kindly agreed to file this report from ESA.&lt;/span&gt;)&lt;br /&gt;&lt;br /&gt;ESA (along with ICALP) is one of the top two European conferences for Theory A.  And probably since the ICALP deadline is typically between the SoCG submission and notification, ESA is the home of many of the best computational geometry talks.  This year was no exception, with two "geometry" sessions, as well as many other geometry talks mixed in other sessions.&lt;br /&gt;&lt;br /&gt;Another interesting pattern at ESA which makes it different to other algorithms conferences is the juxtaposition of theoretical and experimental work.  The two tracks (A: Design and Analysis, B: Engineering and Applications) have separate submissions, but are not treated separately in the program.  This leads to the fun game of trying to determine whether a talk was from track A or track B.  Sometimes you are surprised when a talk starts with a nice theoretical results, and then the speaker presents several experiments to show the algorithm works well in practice, or mentions that it has already been added to CGAL.&lt;br /&gt;&lt;br /&gt;I think this a great practice and should be considered by other conferences such as SODA or SoCG.  For instance ALENEX talks could be mixed in with SODA talks.  This would encourage more algorithms people to actually implement their algorithms, because it would allow them to apply to a separate track that would give more value to practical algorithms.  How would this not be a positive for our community and its perception from other parts of computer science!&lt;br /&gt;&lt;br /&gt;A couple of related data points:  There were 56 track A papers and 10 track B papers, and both were accepted at a rate of about 25%.&lt;br /&gt;&lt;br /&gt;The invited talks followed the theme of theory and practice as well.  Michael Mitzenmacher began with a very clear talk explaining many open problems in Cuckoo hashing.  He has been working on small variations in the algorithm that lead to large differences in performance, and then has been going to great lengths to explain why these variations make such a large difference.&lt;br /&gt;&lt;br /&gt;Erik Demaine gave a very entertaining talk describing how a certain line of his work has alternated between art and the theory behind the art.  He also performed a couple magic tricks, reminding us that most of the best magic requires lots of research, and sometimes algorithms!&lt;br /&gt;&lt;br /&gt;On the third day, Noam Nisan presented a great high level view of how Google's TV ad auctions work.  Much of his talk served to point out all of the important details often ignored in theoretical analysis, but critical in implementing such a system effectively.&lt;br /&gt;&lt;br /&gt;I'd like to note a few papers I found interesting.  This is not by any means an exclusive list, but just enough to give a taste of the (geometric) results.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Manuel Caroli and Monique Teillaud. &lt;/span&gt;&lt;a style="font-weight: bold;" href="http://hal.inria.fr/inria-00356871/en/"&gt;Computing 3D Periodic Triangulations&lt;/a&gt;&lt;span style="font-weight: bold;"&gt;. &lt;/span&gt;&lt;br /&gt;&lt;blockquote&gt;They presented a set of rules and an algorithm for computing triangulations in 3D which are periodic, that is they tesselate to fill an infinite space, but can be defined as a complex within a unit cube.  These triangulations are needed for many simulations where it is difficult to deal with boundary conditions, so these can be avoided, but effectively having no boundary.  Despite, the usefulness of these triangulations, they had not really been properly formally defined until this paper.  Furthermore, their algorithm has been implemented and will soon me in CGAL, if not already. &lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Mohammad Ali Abam, Mark de Berg, Mohammad Farshi, Joachim Gudmundsson and Michiel Smid. &lt;a href="http://www.scs.carleton.ca/%7Emichiel/weightedspanner_esa09.pdf"&gt;Geometric Spanners for Weighted Point Sets&lt;/a&gt;.&lt;/span&gt;&lt;br /&gt;&lt;blockquote&gt;They study Euclidean a size n point sets with weights so that a weighted distance between two points p and q is described&lt;br /&gt; d_w(p,q) = w(p) + d(p,q) + w(q)&lt;br /&gt;where d(.,.) is the Euclidean distance and w(.) is the weight of a point.  This may, for instance, represent a potential road network where it takes w(p) time to get to the center of a city from its outskirts.  This paper shows that typical spanner-type results can be found for a weighted point set using this weighted distance.  I.e. in R^d a (5+eps)-spanner can be found with O(n) edges, and (2+eps)-spanner can be found with O(n log n) edges.&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Sumit Ganguly and Christian Sohler. d-dimensional Knapsack in the Streaming Model.&lt;/span&gt;&lt;br /&gt;&lt;blockquote&gt;This paper studies the d-dimensional knapsack problem, that is given a set of n items with a value v_i and a size in d-dimensions s_{i,1}...s_{i,d} find a set of items I with maximum total value such that for all j \in [1,d] that sum_{i in I} s_{i,j} &lt;= 1.  This problem is studied in the streaming model, which I found interesting, because they were unable to store the actual solution, because it might have linear size.  Instead, they approximate the optimal cost and present a "template" solution where they give an approximate size of each element in their approximate solution I'. &lt;/blockquote&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Bernhard Haeupler, Siddhartha Sen, and Robert E. Tarjan. &lt;a href="http://www.cs.princeton.edu/%7Esssix/papers/rp-heaps.pdf"&gt;Rank-Pairing Heaps&lt;/a&gt;.&lt;/span&gt;&lt;br /&gt;&lt;blockquote&gt;A data structure paper, but for one very useful in geometry, among other places.  This paper does not present any new results from a classical theoretical perspective.  They describe a heap data structure which has the same running time for all operations as Fibonacci heaps.  The contribution is that their data structure is considerably simpler than any variant of Fibonacci heaps, in particular, the structure of the heap never needs to be restructured.  As an added benefit, their data structure easily outperforms Fibonacci heaps.&lt;br /&gt;&lt;/blockquote&gt;&lt;span style="font-weight: bold;"&gt;Other notes (mainly from business meeting):&lt;/span&gt;  &lt;a href="http://mybiasedcoin.blogspot.com/2009/09/esa-2009_08.html"&gt;See also Michael Mitzenmacher's post&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;There were 37 computational geometry submissions, 10 were accepted.&lt;/li&gt;&lt;li&gt;Mark deBerg (next years track A chair), declared that next year the page size will be STRICTLY ENFORCED and he is not opposed to automatically rejecting papers if they play too many games to squeeze more text in the alloted number of pages.  Be warned.&lt;/li&gt;&lt;li&gt;next year, ESA 2010 (and ALGO) will be in Liverpool, England, and it was voted for ESA 2011 to be in Saarbrucken.  Which won over a bid from Greece. (&lt;span style="font-style: italic;"&gt;ed: HOW ON EARTH ? !!!!!!&lt;/span&gt;)&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Proceedings were not included in registration, but could be bought.  On the second day, after politely asking Springer, participants were able to download a single pdf of the proceedings from a couple USB keys.  This was very useful!, but it would have been better earlier in the conference.&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Things I missed&lt;/span&gt; (I had to leave partway through the third day).&lt;br /&gt;&lt;ul&gt;&lt;li&gt; - best student paper:  Heidi Gebauer. &lt;a href="http://arxiv.org/abs/0904.2541"&gt;Disproof of the Neighborhood Conjecture with Implications to SAT&lt;/a&gt;.&lt;/li&gt;&lt;li&gt; - best paper: Christoph Dürr, Flavio Guiñez and Martín Matamala. &lt;a href="http://arxiv.org/abs/0904.3169"&gt;Reconstructing 3-colored grids from horizontal and vertical projections is NP-hard&lt;/a&gt;.&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Attached workshops which are part of ALGO on Thursday and Friday&lt;/span&gt;:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;IWPEC: &lt;a href="http://algo2009.itu.dk/4th-IWPEC"&gt;International Workshop on Parameterized and Exact Computation&lt;/a&gt;&lt;/li&gt;&lt;li&gt;WAOA: &lt;a href="http://algo2009.itu.dk/7th-waoa"&gt;Workshop on Approximation and Online Algorithms&lt;/a&gt;&lt;/li&gt;&lt;li&gt;ATMOS: &lt;a href="http://algo2009.itu.dk/atmos"&gt;Workshop on Algorithmic Approaches for Transportation Modeling, Optimization, and Systems&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6555947-1732376682762657003?l=geomblog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://geomblog.blogspot.com/feeds/1732376682762657003/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=6555947&amp;postID=1732376682762657003' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6555947/posts/default/1732376682762657003'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6555947/posts/default/1732376682762657003'/><link rel='alternate' type='text/html' href='http://geomblog.blogspot.com/2009/09/geometry-at-esa-guest-post.html' title='Geometry at ESA (guest post)'/><author><name>Suresh</name><uri>http://www.blogger.com/profile/15898357513326041822</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='05767624388557425525'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6555947.post-5331821637579283437</id><published>2009-09-04T22:22:00.002-06:00</published><updated>2009-09-04T22:37:31.790-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='soda'/><title type='text'>SODA 2010</title><content type='html'>Ironically (I'm on the PC), I'm probably the last one to post about the &lt;a href="http://bit.ly/RfT1G"&gt;SODA acceptances&lt;/a&gt; (although I did&lt;a href="http://twitter.com/geomblog/status/3751226432"&gt; tweet it a while ago&lt;/a&gt;)&lt;br /&gt;&lt;br /&gt;In case you don't already know, one major change this year is the 20 page limit on final versions, brought about by the all-electronic proceedings. It's worth pointing out here that this is a major change that I don't think ANY conference (theoretical or otherwise) has put into place (I can say this because I had nothing to do with it :)). 20 pages is huge: there's no way you can wiggle out of writing full proofs at this point, and I hope that this trend will continue as more of our conferences go all-electronic.&lt;br /&gt;&lt;br /&gt;One caveat: I don't know what the final version format is. It seems unnecessary to go with the butt-ugly Sheridan 2-column format, but we'll see.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;On the papers: &lt;/span&gt;&lt;br /&gt;I'm not going to comment on the papers, since I reviewed many of them. Suffice it to say that (even though it sounds like a cliche), I was impressed by the quality of the submissions, and there were many good papers that unfortunately couldn't make it. We're in the process of writing decision summaries for the authors: these are intended to capture more of the discussion surrounding a paper. Hopefully, they will give a better sense of what the reviewers liked and disliked about the paper than just the actual reviews.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;On the process:&lt;/span&gt;&lt;br /&gt;I felt a bit more disconnected from the overall deliberations this time, and maybe others felt this way too. I spent a lot of time on "my pile", but I didn't have a lot of time to look over papers that I wasn't in some way responsible for. Given the number of submissions, this is unavoidable I guess.&lt;br /&gt;&lt;br /&gt;I actually think the reason I felt this way was because of the software interface. Instead of the (clunky) SIGACT server interface used in years gone by, we used the snazzy HotCRP interface. In almost every way, this is a vastly superior interface (and I've used EasyChair, the Microsoft CMT, and other packages as well). It feels lightweight, it has great searching and tagging capabilities, and most of the interfaces are one or two clicks from the home page.  It cleanly combines email notifications and uploads/downloads with web-based editing, and I'd recommend it very strongly for anyone organizing a conference in the future.&lt;br /&gt;&lt;br /&gt;The only feature the SIGACT server had which this doesn't, was a landing page where you got a stream of all comments on all papers. It was a serendipitious way of picking up on discussions not related to papers you were in charge of, and I remember in the past getting interested in a discussion and a paper and actually reading and submitting a review myself. In HotCRP, you land at a page containing your reviews, and it doesn't give you a direct view into the global picture (except maybe for the PC chair).&lt;br /&gt;&lt;br /&gt;One amazing feat of social engineering that HotCRP also does: at this landing page, it tells you how many reviews you've put in compared to the average. There was a time during the review process where I'd reload the page every few hours and see myself falling behind the PC average, increasing the pressure on me to submit reviews :).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6555947-5331821637579283437?l=geomblog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://geomblog.blogspot.com/feeds/5331821637579283437/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=6555947&amp;postID=5331821637579283437' title='24 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6555947/posts/default/5331821637579283437'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6555947/posts/default/5331821637579283437'/><link rel='alternate' type='text/html' href='http://geomblog.blogspot.com/2009/09/soda-2010.html' title='SODA 2010'/><author><name>Suresh</name><uri>http://www.blogger.com/profile/15898357513326041822</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='05767624388557425525'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>24</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6555947.post-2435452722309593642</id><published>2009-08-30T23:55:00.002-06:00</published><updated>2009-10-30T08:22:06.427-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='research'/><category scheme='http://www.blogger.com/atom/ns#' term='data-mining'/><category scheme='http://www.blogger.com/atom/ns#' term='clustering'/><title type='text'>Spectral Clustering.</title><content type='html'>&lt;span style="font-style: italic;"&gt;(this is part of an &lt;/span&gt;&lt;a style="font-style: italic;" href="http://geomblog.blogspot.com/2009/06/clustering-occasional-series.html"&gt;occasional series of essays on clustering&lt;/a&gt;&lt;span style="font-style: italic;"&gt;: for all posts in this topic, &lt;/span&gt;&lt;a style="font-style: italic;" href="http://geomblog.blogspot.com/search/label/clustering"&gt;click here&lt;/a&gt;&lt;span style="font-style: italic;"&gt;)&lt;/span&gt;&lt;br /&gt;&lt;blockquote&gt;I don't like you, and I like them, but maybe if we change, we can get along. &lt;/blockquote&gt;Spectral clustering is not clustering: it's actually a metric modification method. It's another way of capturing things that are close versus things that far, using a more global method.&lt;br /&gt;&lt;p&gt;We saw how correlation clustering could be used to capture both the ``I don't like you'' and ``I like them'' aspects of clustering by assigning positive and negative weights to individual element pairs. However, it's not easy to come by such a labelling of pairs. More often than not, all we have is a distance function. We could threshold it, declaring all distances within some parameter r to be positive, and all distances larger to be negative, but this is completely arbitrary and depends on the choice of r.&lt;br /&gt;&lt;/p&gt;&lt;p&gt;Let's think of the two-class problem for now: we merely want to divide the points into two clusters. Suppose we were able to guess a function that assigns one label (0) to points in one cluster and another label (1) to points in the other cluster. Then we could write down the cost of the clustering (in a correlation sense) by summing up, over all pairs, the difference &lt;/p&gt;&lt;img src="http://www.cyberroadie.org/cgi-bin/mathtex.cgi?f_i%20-%20f_j" align="middle" border="0" /&gt;. In fact, just to make sure it's always positive, we'll square it, and thus will sum up the function &lt;img src="http://www.cyberroadie.org/cgi-bin/mathtex.cgi?%28f_i%20-%20f_j%29%5E2" align="middle" border="0" /&gt;. We'll also assume a weight function &lt;img src="http://www.cyberroadie.org/cgi-bin/mathtex.cgi?w_%7Bij%7D" align="middle" border="0" /&gt; on the edge &lt;img src="http://www.cyberroadie.org/cgi-bin/mathtex.cgi?ij" align="middle" border="0" /&gt;, so that what we're really summing up is &lt;img src="http://www.cyberroadie.org/cgi-bin/mathtex.cgi?%5Csum%20w_%7Bij%7D%28f_i%20-%20f_j%29%5E2" align="middle" border="0" /&gt;.&lt;br /&gt;&lt;p&gt;&lt;/p&gt;This is where things start to get very interesting. It's not hard to see that this can be written in term of a variant of the &lt;a href="http://en.wikipedia.org/wiki/Laplacian_matrix"&gt;graph Laplacian&lt;/a&gt;, treating the f values as a vector. If we now allow the f values to be reals, rather than 0 or 1, we get a classical eigenvalue problem on the Laplacian, in effect determining the second smallest eigenvalue of the Laplacian.&lt;br /&gt;&lt;p&gt;Once we do that, the corresponding eigenvector gives us an assignment for f. we can either merely take positive or negative values of f to group the data, or &lt;em&gt;we can use the f values as a feature representation&lt;/em&gt; and recluster the data into two groups based on that.&lt;br /&gt;&lt;/p&gt;&lt;p&gt;&lt;br /&gt;This is the key insight of spectral ``clustering''. We perform a spectral decomposition of the Laplacian, and take the top k eigenvectors. Those vectors give us a k-dimensional feature represntation of the data in an inner product space, and we can now recluster. The real question is: why should this feature representation help us in any way at all ?&lt;br /&gt;&lt;/p&gt;&lt;p&gt;&lt;br /&gt;Here's my take on what's really going on. We know that the graph Laplacian is a discrete equivalent of the &lt;a href="http://en.wikipedia.org/wiki/Laplace%E2%80%93Beltrami_operator"&gt;Laplace-Beltrami operator&lt;/a&gt; on a manifold, which in turn (via the &lt;a href="http://en.wikipedia.org/wiki/Heat_equation"&gt;heat equation&lt;/a&gt;) captures the ``flow'' of material in a manifold. Return for a second to our cut-based view of the distance function. In essence, two points are close if there are many ways of getting from one to the other (i.e the min cut between them is large) and they are far if not. This is a flow-based way of thinking about distances, and in effect, what the Laplacian does is map the original data into a new metric space in which distance is based on flow, rather than just on metric distance.&lt;br /&gt;&lt;/p&gt;&lt;p&gt;&lt;br /&gt;This intuition can be formalized: if we switch now to the stochastic viewpoint, in which we're doing random walks on the graph, then the commute time between two vertices turns out to be precisely the Euclidean distance (&lt;span style="font-style: italic;"&gt;squared: thanks, &lt;a href="http://twitter.com/dsivakumar"&gt;dsivakumar&lt;/a&gt;&lt;/span&gt;) between the feature vectors constructed from the eigenvalues of the Laplacian.&lt;br /&gt;&lt;br /&gt;... take a deep breath...&lt;br /&gt;&lt;br /&gt;There are a number of different concepts swirling around here that are worth keeping track of. The cut-based viewpoint of node similarity in graphs lends itself naturally to a Laplacian-based treatment, which in turn leads us to random walks and the heat equation on manifolds. At a deeper level, the Laplacian of a graph, by virtue of how it acts on functions, tells us something very basic about the structure of the distances involved. For more on this particular aspect, you should check out the discussion '&lt;a href="http://en.wikipedia.org/wiki/Hearing_the_shape_of_a_drum"&gt;Can you hear the shape of a drum&lt;/a&gt;'.&lt;br /&gt;&lt;/p&gt;&lt;p&gt;&lt;br /&gt;But it's very important to keep in mind that spectral clustering is not so much a clustering technique as a data transformation method. Although k-means is the clustering algorithm of choice once we get down to the feature representation, other methods could be used as well. Also, as relaxations go, this paricular one (relaxing from the hypercube to the reals), isn't that great: the ``integrality gap'' can be quite high. It turns out that this doesn't matter terribly though.&lt;/p&gt;&lt;p&gt;&lt;br /&gt;As an aside, the idea of using the graph Laplacian to define a ``stretched'' distance that looks Euclidean is not limited to this problem. The most "famous" version of this idea is the &lt;a href="http://www.mitpressjournals.org/doi/abs/10.1162/089976603321780317"&gt;Laplacian eigenmaps&lt;/a&gt; work by Belkin and Niyogi in the realm of manifold learning. There's also work by &lt;a href="http://www.stanford.edu/%7Esunjian/paper_ppt/global_symmetry.pdf"&gt;Guibas, Ovsjanikov and Sun&lt;/a&gt; that uses similar ideas to create signatures of shapes. The key idea, once again, is that if you're on a manifold of some kind, the Laplacian gives you a handle on the shape of this manifold, as well as how to ``stretch it out'' with local coordinates to make it look flat (and therefore Euclidean). It's a useful trick to keep in your data analysis toolbox.&lt;/p&gt;&lt;p&gt;For further reading, Ulrich von Luxburg has a &lt;a href="http://www.kyb.mpg.de/publications/attachments/Luxburg06_TR_%5B0%5D.pdf"&gt;great survey on spectral clustering&lt;/a&gt;.&lt;br /&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6555947-2435452722309593642?l=geomblog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://geomblog.blogspot.com/feeds/2435452722309593642/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=6555947&amp;postID=2435452722309593642' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6555947/posts/default/2435452722309593642'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6555947/posts/default/2435452722309593642'/><link rel='alternate' type='text/html' href='http://geomblog.blogspot.com/2009/08/spectral-clustering.html' title='Spectral Clustering.'/><author><name>Suresh</name><uri>http://www.blogger.com/profile/15898357513326041822</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='05767624388557425525'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6555947.post-6850625428870671055</id><published>2009-08-08T03:43:00.012-06:00</published><updated>2009-08-09T03:19:58.073-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='kernels'/><category scheme='http://www.blogger.com/atom/ns#' term='metrics'/><title type='text'>Negative-type distances and kernels</title><content type='html'>This is a story of two constructions that actually end up being essentially the same thing, as I recently discovered.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;1. Kernels&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.cs.utah.edu/~hal/docs/daume04rkhs.pdf"&gt;The story of kernels&lt;/a&gt; in machine learning goes somewhat like this:&lt;br /&gt;&lt;br /&gt;Take a positive definite function $$K(x,y)$$ that captures some notion of similarity between objects (a standard example of such a kernel is the Gaussian $$K(x,y) = \exp(-\|x - y\|^2)$$). A positive definite function, btw, is like the generalization of a p.d matrix: the integral $$\int f(x)f(y)k(x,y)dx dy \ge 0$$.&lt;br /&gt;&lt;br /&gt;You can then construct a Hilbert space that captures the structure of the kernel. Specifically, for a fixed set of points S, construct a vector space from the basis $$\{k_x | x \in S\}$$, where $$k_x(y) = K(x,y)$$, and then define an inner product of two vectors in this space in the usual way: If $$v_a = \sum a_x k_x$$ and $$v_b = \sum b_x k_x$$, then $$v_a \cdot v_b = \sum a_x b_y K(x,y)$$.&lt;br /&gt;&lt;br /&gt;You get nice properties from this construction: the so-called reproducing property is one of them, which tells you that $$K(x,y) = k_x \cdot k_y$$. In other words, we can capture the similarity function K(x,y) by a "standard" inner product in a Hilbert space. &lt;br /&gt;&lt;br /&gt;What's even neater is that by invoking Mercer's theorem, you can construct an orthogonal basis, and make sure that the Hilbert space is actually a Euclidean space. The squared Euclidean distance in this space can be written in kernel form, as&lt;br /&gt;\[d_K(x,y) = K(x,x) + K(y,y) - 2K(x,y)\]&lt;br /&gt;which is what you'd expect when treating K(.) as an inner product.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;2. Negative-type distance.&lt;/span&gt;&lt;br /&gt;The second story comes from &lt;a href="http://www.amazon.com/Geometry-Cuts-Metrics-Algorithms-Combinatorics/dp/354061611X/ref=sr_1_3?ie=UTF8&amp;qid=1249809406&amp;sr=8-3"&gt;Deza and Laurent&lt;/a&gt;. When can a distance function  be embedded in Euclidean space ? It turns out that it's more convenient to talk about the square of the distance function, which we'll call D.&lt;br /&gt;&lt;br /&gt;There's an elegant characterization of when D can be embedded isometrically into $$\ell^2_2$$: this can be done if and only if D satisfies the negative-type inequality:&lt;br /&gt;\[ \sum b_i b_j D(x_i, x_j) \le 0, \sum b_i = 0\]&lt;br /&gt;for all possible assignments to $$b_i$$.&lt;br /&gt;&lt;br /&gt;The proof works via a construction called a covariance mapping that takes $D$ to the function $$k_D$$ defined as:&lt;br /&gt;\[ k_D(x,y) = (1/2)(d(x, x_0) + d(y, x_0) - d(x,y)) \]&lt;br /&gt;which differential geometry folks will recognize as the &lt;a href="http://en.wikipedia.org/wiki/Gromov_product"&gt;Gromov product&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;The proof completes by showing that the negative-type condition on D implies positive definiteness of $$k_D$$, and this in turn means that $$k_D$$ can be expressed as an R-covariance:&lt;br /&gt;\[ k_D(x,y) = \int f_x(\omega)f_y(\omega) d\mu(\omega) \]&lt;br /&gt;for some measure space $\mu$. &lt;br /&gt;&lt;br /&gt;Note that the RHS of the equation is an infinite-dimensional inner product. &lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;3. Where the two stories come together&lt;/span&gt;&lt;br /&gt;The mapping that takes a kernel to a distance is the inverse of the covariance mapping used to map a distance to a metric. In other words, if we take a kernel K, compute $$d_K$$, and then use the distance to kernel mapping to compute $$k_{d_K}$$, we get back K. Further, since we can show that the negative-type condition on D implies a positive-definite condition on $$k_D$$, we can start off with either "D satisfies negative-type inequalities" or "K is a positive definite kernel" and yield the same conclusion on $$\ell_2^2$$ embeddability. &lt;br /&gt;&lt;br /&gt;What's interesting is that the two ways of thinking about this don't seem to have been connected together explicitly before. &lt;br /&gt;&lt;br /&gt;p.s This is not a fully rigorous correspondence except possibly in the finite dimensional case, but I find that it's often convenient to replace the negative-type argument with a kernel positive-definiteness argument in my head.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6555947-6850625428870671055?l=geomblog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://geomblog.blogspot.com/feeds/6850625428870671055/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=6555947&amp;postID=6850625428870671055' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6555947/posts/default/6850625428870671055'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6555947/posts/default/6850625428870671055'/><link rel='alternate' type='text/html' href='http://geomblog.blogspot.com/2009/08/negative-type-distances-and-kernels.html' title='Negative-type distances and kernels'/><author><name>Suresh</name><uri>http://www.blogger.com/profile/15898357513326041822</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='05767624388557425525'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6555947.post-1818265551109014111</id><published>2009-08-02T02:11:00.005-06:00</published><updated>2009-08-02T02:34:11.813-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='research'/><category scheme='http://www.blogger.com/atom/ns#' term='data-mining'/><category scheme='http://www.blogger.com/atom/ns#' term='clustering'/><title type='text'>Correlation Clustering: I don't like you, but I like them...</title><content type='html'>&lt;span style="font-style: italic;"&gt;(this is part of an &lt;/span&gt;&lt;a style="font-style: italic;" href="http://geomblog.blogspot.com/2009/06/clustering-occasional-series.html"&gt;occasional series of essays on clustering&lt;/a&gt;&lt;span style="font-style: italic;"&gt;: for all posts in this topic, &lt;/span&gt;&lt;a style="font-style: italic;" href="http://geomblog.blogspot.com/search/label/clustering"&gt;click here&lt;/a&gt;&lt;span style="font-style: italic;"&gt;)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Whether it's k-clustering, or any kind of hierarchical clustering, the world we live in is still the world of ``I don't like you'' clustering, where the geometric landscape is defined by distances between points. Consider the following example:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_2-lbnzMwtfQ/SnVNhN81i8I/AAAAAAAAAR4/6IHNyiHU6EE/s1600-h/correlation-example.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 200px; height: 40px;" src="http://1.bp.blogspot.com/_2-lbnzMwtfQ/SnVNhN81i8I/AAAAAAAAAR4/6IHNyiHU6EE/s200/correlation-example.jpg" alt="" id="BLOGGER_PHOTO_ID_5365279764157664194" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;There is an intuitive sense in which the first clustering is not ``real'' and the second one is: the idea of 'well-separatedness'' is a pervasive component of how a good clustering is perceived. But if we only measure distances between points, and only measure the cost of a clustering in terms of how costly each cluster is, we'll never be able to distinguish between these two examples.&lt;br /&gt;&lt;br /&gt;What's needed is a way of declaring likes (similarity) as well as dislikes (distance), and then, critically:&lt;br /&gt;&lt;/p&gt;&lt;blockquote&gt; penalizing similar items in different clusters AS WELL AS different items in the same cluster. &lt;/blockquote&gt;&lt;br /&gt;By that measure, we'd be able to distinguish the first and second clusterings, because in the first case, presumably elements close to each other that lie in different clusters will make the clustering look more expensive. This point is worth reiterating. Unless we have some way of penalizing mis-separations as well as mis-groupings, we'll always be at the mercy of the tradeoff between k and cost.&lt;br /&gt;&lt;br /&gt;Continuing the "clustering via self-help metaphors" theme to these essays, I call this the "I don't like you, but I like them" way of modelling data.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Correlation Clustering&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;This is where &lt;a href="http://www.cs.cmu.edu/~avrim/Papers/clustering.ps"&gt;correlation clustering&lt;/a&gt; enters the picture. The correlation clustering model is as follows: every pair of elements is assigned a 1 or -1, encoding a similarity or dissimilarity. The goal is to find a clustering (note: no k!) in which any pair of points in a cluster is penalized for being dissimilar, and any pair of points in two different clusters is penalized for being similar. This can be generalized to arbitrary weights: for each pair of elements you assign weights w+ and w-, with the possible caveat that w+ + w- = 1.&lt;br /&gt;&lt;p&gt;&lt;br /&gt;So now the goal is merely to minimize the cost of a clustering. The elegance of correlation clustering lies in the natural way that clusters merge or split depending on the number of similar or dissimilar pairs. Do note though that you need to have input data that can be written in that manner: thresholding a distance function will give you the desired input, but is an ad hoc way of doing it, since the thresholding is arbitrary.&lt;br /&gt;&lt;/p&gt;&lt;p&gt;&lt;br /&gt;There are a number of different algorithms for correlation clustering, and also a very simple one that &lt;a href="http://www.cs.yale.edu/homes/el327/papers/clustering.pdf"&gt;yields a good approximation guarantee&lt;/a&gt;: pick a point at random, and pull in all its neighbors (all points similar to it). Repeat this process with a new unpicked point, until all points have been selected. This algorithm gives a 3-approximation for correlation clustering.&lt;br /&gt;&lt;/p&gt;&lt;p&gt;&lt;br /&gt;Correlation clustering is also useful as a way of combining clusterings. We'll talk about this later (&lt;span style="font-style:italic;"&gt;ed: how many times have I said that !&lt;/span&gt;), but the problem of &lt;em&gt;conensus clustering&lt;/em&gt; is to ``cluster the clusterings'', or aggregate them into a single average clustering. An easy way to see the connection is this: given a collection of clusterings of a data set, create an instance of correlation clustering with the positive weight for a pair corresponding to the fraction of clusterings that ``vote'' for that pair being in the same cluster, and the negative weight being the fraction of clusterings that ``vote'' for that pair being separated. &lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6555947-1818265551109014111?l=geomblog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://geomblog.blogspot.com/feeds/1818265551109014111/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=6555947&amp;postID=1818265551109014111' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6555947/posts/default/1818265551109014111'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6555947/posts/default/1818265551109014111'/><link rel='alternate' type='text/html' href='http://geomblog.blogspot.com/2009/08/correlation-clustering-i-dont-like-you.html' title='Correlation Clustering: I don&apos;t like you, but I like them...'/><author><name>Suresh</name><uri>http://www.blogger.com/profile/15898357513326041822</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='05767624388557425525'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_2-lbnzMwtfQ/SnVNhN81i8I/AAAAAAAAAR4/6IHNyiHU6EE/s72-c/correlation-example.jpg' height='72' width='72'/><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6555947.post-3897317099951898167</id><published>2009-07-23T23:32:00.002-06:00</published><updated>2009-07-24T01:34:29.054-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='research'/><category scheme='http://www.blogger.com/atom/ns#' term='data-mining'/><category scheme='http://www.blogger.com/atom/ns#' term='clustering'/><title type='text'>Clustering: Hierarchical methods</title><content type='html'>&lt;span style="font-style: italic;"&gt;(this is part of an &lt;/span&gt;&lt;a style="font-style: italic;" href="http://geomblog.blogspot.com/2009/06/clustering-occasional-series.html"&gt;occasional series of essays on clustering&lt;/a&gt;&lt;span style="font-style: italic;"&gt;: for all posts in this topic, &lt;/span&gt;&lt;a style="font-style: italic;" href="http://geomblog.blogspot.com/search/label/clustering"&gt;click here&lt;/a&gt;&lt;span style="font-style: italic;"&gt;)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;k-center, k-median, k-means, k-medoids, .... The list is endless, but that pernicious k comes up everywhere. What can we do about it ? There are really two ways to go: &lt;br /&gt;&lt;ol&gt; &lt;li&gt; Figure out the "right" k for a problem. This is a complicated matter, and will be the topic of a later post &lt;li&gt; Don't choose: give the user a universal representation from which they can decide what k they want. &lt;br /&gt;&lt;/ol&gt;&lt;br /&gt; This latter formulation takes us into the world of hierarchical clustering, the topic of this post.&lt;br /&gt;&lt;p&gt;&lt;br /&gt;There are two different ways to think about hierarchical clustering: a representational view and an algorithmic view. The algorithmic view is quite simple: let's try to design a clustering via greedy operations. In the top-down view, we find a good split of the data into two parts, and then recurse on each side. In the bottom-up view, we select two clusters for merging (in the beginning, all items are in separate clusters), and merge our way up.&lt;br /&gt;&lt;p&gt;&lt;br /&gt;Perhaps the most well known of the bottom-up methods is the 'single link' clustering algorithm, in which at each step, you merge the two clusters that are closest to each other. If this sounds familiar, it should: this is merely Kruskal's MST algorithm run partially to completion.&lt;br /&gt;&lt;p&gt;&lt;br /&gt;The "single-link" method is optimistic: it creates clusters via connected components, assuming that transitivity is enough to construct a cluster. The "complete-link" approach is more pessimistic: rather than defining the distance between two clusters as their nearest pair distance, it defines it as the furthest pair distance, ensuring a clique-like structure for clusters. Merging happens as before.&lt;br /&gt;&lt;p&gt;&lt;br /&gt;Other variants that are more robust to noise average the pairwise distances to obtain the clustering distance. With the right choice of distance, these amount to comparing the centroids of clusters.&lt;br /&gt;&lt;p&gt;&lt;br /&gt;You'll notice that I didn't actually define a problem that these algorithms solve, keeping in with the grand tradition of clustering :). There is a way of defining a general optimization function that single-link clustering solves optimally. For the others though, although we can define a cost function, we can't show that they're optimal.&lt;br /&gt;&lt;p&gt;&lt;br /&gt;So much for the algorithmic view of HC. The representational view goes somewhat deeper.&lt;br /&gt;&lt;p&gt;&lt;br /&gt;I've had at least one person (you know who you are) tell me that one of the only two clustering algorithms that worked for them is hierarchical clustering. And indeed, the idea is very seductive. Rather than present the user with a single k-clustering - a snapshot, if you will, of the data - we give them a tree of merges, in which there is one leaf for each object. The central idea here, and one that bears emphasizing, beacuse it's so different to how we've thought about clustering thus far, is this:&lt;br /&gt;&lt;p&gt;&lt;br /&gt; &lt;em&gt;we understand the structure of data from its transitions, rather than from its states.&lt;/em&gt;  &lt;br /&gt;&lt;p&gt;&lt;br /&gt;What I mean is this: rather than defining a clustering as a static partitioning of the data, we watch the data ``evolve'' as we start merging points together, and use the merge history to figure out what the real structure is. There are a couple of ways of doing this: the most common way is to imagine drawing the tree from top to bottom, and then &lt;em&gt;cutting it&lt;/em&gt; by a y-monotone curve (one that intersects any vertical line in exactly one point). This can be done to ensure we have k clusters, or it can be done at points where the merge appears to be ``stable'' (the subtree doesn't merge with any other subtrees for a long time).&lt;br /&gt;&lt;p&gt;&lt;br /&gt;A particularly effective visual metaphor for this is the &lt;em&gt;dendrogram&lt;/em&gt;, which is very popular in evolutionary tree analysis.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.mathworks.com/access/helpdesk/help/toolbox/stats/dendrogram.gif"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 236px;" src="http://www.mathworks.com/access/helpdesk/help/toolbox/stats/dendrogram.gif" border="0" alt="" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt; The dendogram is visually appealing because it does two things: firstly, it depicts the tree of merges, permuting the nodes so that there aren't crossings. Secondly, and more importantly, it uses the lengths of edges as a visual marker to indicate the 'time of merge' for clusters. If we think of starting at time 0 and merging clusters, then a cluster that sticks around for a long time will have a tall edge connecting it to its parent, in comparison with a cluster that gets merged quickly. (as an aside, visual metaphors are possibly underrated when it comes to thinking about clustering algorithms. My firm belief is that one of the reasons soft clusterings aren't used as much as hard clusterings is because it's hard to visualize them)&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Sidebar:&lt;/b&gt;&lt;br /&gt; And this is the key operational idea behind ``clustering from the transitions'': clusters that stay unmerged longer are likely to be more interesting than clusters that get merged quickly. Till recently, this was merely a heuristic way of thinking about hierarchical clusterings. However, we now have the idea of &lt;em&gt;persistence&lt;/em&gt; that comes from the realm of computational topology. In short, persistence is a way of quantifying topological features of a shape that persist across different scales. Persistence has been studied extensively as a rigorous way of quantifying the triviality (or non-triviality) of features in a shape, and there's &lt;a href="http://portal.acm.org/citation.cfm?doid=1496770.1496881"&gt;a recent paper&lt;/a&gt; that applies persistence-like concepts to clustering as well. It's still early days for this direction, but I think it's a promising one. &lt;br /&gt;&lt;p&gt;&lt;br /&gt;Returning to hierarchical clustering, one major problem with this approach is that it's local: make the wrong choice of merge early on, and you'll never get to the optimal solution for a k-clustering problem. And since there's no easy way to change your mind (i.e split a clustering), it's hard to reverse bad decisions. If you're so inclined (and I am nowadays), this is the problem of finding monotone paths in the merge-split lattice on partitions of [1..n], but I digress...&lt;br /&gt;&lt;p&gt;&lt;br /&gt;Ultimately, the value of hierarchical clustering comes in its ability to represent an entire history of clusterings, rather than just one, and the relative ease with which a bottom-up algorithm can be written. That, coupled with its uses where you really do want to find a tree (evolutionary or otherwise) is what makes it a popular implement in the clustering toolbag.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6555947-3897317099951898167?l=geomblog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://geomblog.blogspot.com/feeds/3897317099951898167/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=6555947&amp;postID=3897317099951898167' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6555947/posts/default/3897317099951898167'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6555947/posts/default/3897317099951898167'/><link rel='alternate' type='text/html' href='http://geomblog.blogspot.com/2009/07/clustering-hierarchical-methods.html' title='Clustering: Hierarchical methods'/><author><name>Suresh</name><uri>http://www.blogger.com/profile/15898357513326041822</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='05767624388557425525'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6555947.post-7742846132823847860</id><published>2009-07-15T23:31:00.002-06:00</published><updated>2009-07-15T23:39:06.091-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='latex'/><category scheme='http://www.blogger.com/atom/ns#' term='bibtex'/><category scheme='http://www.blogger.com/atom/ns#' term='publishing'/><title type='text'>Consistent BibTeX formatting</title><content type='html'>I try not to write BibTeX by hand any more: too easy to introduce errors. So I usually use either DBLP or the ACM digital library to get BibTeX for papers. Sometimes the journal has BibTeX, or some format that can be converted. As an aside, IEEE is extremely lame: you have to login to their digital library even to get a citation !&lt;br /&gt;&lt;br /&gt;For the most part, I don't need to go beyond ACM or DBLP, which is great. But here's the problem: their formats are different ! I needed the BibTeX for a recent paper of mine, and found it on both sites. Here's what ACM gave me:&lt;br /&gt;&lt;blockquote&gt;@inproceedings{1516372,&lt;br /&gt;author = {Ahmadi, Babak and Hadjieleftheriou, Marios and Seidl, Thomas and Srivastava, Divesh and Venkatasubramanian, Suresh},&lt;br /&gt;title = {Type-based categorization of relational attributes},&lt;br /&gt;booktitle = {EDBT '09: Proceedings of the 12th International Conference on Extending Database Technology},&lt;br /&gt;year = {2009},&lt;br /&gt;isbn = {978-1-60558-422-5},&lt;br /&gt;pages = {84--95},&lt;br /&gt;location = {Saint Petersburg, Russia},&lt;br /&gt;doi = {http://doi.acm.org/10.1145/1516360.1516372},&lt;br /&gt;publisher = {ACM},&lt;br /&gt;address = {New York, NY, USA},&lt;br /&gt;}&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;and here's what DBLP gave me:&lt;br /&gt;&lt;blockquote&gt;@inproceedings{DBLP:conf/edbt/AhmadiHSSV09,&lt;br /&gt; author    = {Babak Ahmadi and&lt;br /&gt;              Marios Hadjieleftheriou and&lt;br /&gt;              Thomas Seidl and&lt;br /&gt;              Divesh Srivastava and&lt;br /&gt;              Suresh Venkatasubramanian},&lt;br /&gt; title     = {Type-based categorization of relational attributes},&lt;br /&gt; booktitle = {EDBT},&lt;br /&gt; year      = {2009},&lt;br /&gt; pages     = {84-95},&lt;br /&gt; ee        = {http://doi.acm.org/10.1145/1516360.1516372},&lt;br /&gt; crossref  = {DBLP:conf/edbt/2009},&lt;br /&gt; bibsource = {DBLP, http://dblp.uni-trier.de}&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;@proceedings{DBLP:conf/edbt/2009,&lt;br /&gt; editor    = {Martin L. Kersten and&lt;br /&gt;              Boris Novikov and&lt;br /&gt;              Jens Teubner and&lt;br /&gt;              Vladimir Polutin and&lt;br /&gt;              Stefan Manegold},&lt;br /&gt; title     = {EDBT 2009, 12th International Conference on Extending Database&lt;br /&gt;              Technology, Saint Petersburg, Russia, March 24-26, 2009,&lt;br /&gt;              Proceedings},&lt;br /&gt; booktitle = {EDBT},&lt;br /&gt; publisher = {ACM},&lt;br /&gt; series    = {ACM International Conference Proceeding Series},&lt;br /&gt; volume    = {360},&lt;br /&gt; year      = {2009},&lt;br /&gt; isbn      = {978-1-60558-422-5},&lt;br /&gt; bibsource = {DBLP, http://dblp.uni-trier.de}&lt;br /&gt;}&lt;br /&gt;&lt;/blockquote&gt;So as you can see, we have a problem. The formats are not consistent, which means that if I need to get some references from DBLP, and others from the ACM, my references file is going to look very irregular.&lt;br /&gt;&lt;br /&gt;Other critiques:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;I have never understood why DBLP splits up the conference and the paper: with BibTeX, if you cite three or more papers that use the same crossref, the crossref is included itself as a reference, which is just strange.&lt;/li&gt;&lt;li&gt;Unless you use double curly braces, capitalizations inside a string get removed, which is mucho annoying: It's "Riemannian", not "riemannian".&lt;br /&gt;&lt;/li&gt;&lt;li&gt;The DBLP name for the conference is too cryptic: who'd even know what EDBT is outside the database community. On the other hand, the ACM citation is clunky, and is a page-length disaster waiting to happen.&lt;/li&gt;&lt;/ul&gt;Thoughts ?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6555947-7742846132823847860?l=geomblog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://geomblog.blogspot.com/feeds/7742846132823847860/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=6555947&amp;postID=7742846132823847860' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6555947/posts/default/7742846132823847860'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6555947/posts/default/7742846132823847860'/><link rel='alternate' type='text/html' href='http://geomblog.blogspot.com/2009/07/consistent-bibtex-formatting.html' title='Consistent BibTeX formatting'/><author><name>Suresh</name><uri>http://www.blogger.com/profile/15898357513326041822</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='05767624388557425525'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6555947.post-7797791710702404906</id><published>2009-07-09T17:24:00.004-06:00</published><updated>2009-07-09T17:29:47.185-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='miscellaneous'/><category scheme='http://www.blogger.com/atom/ns#' term='awards'/><category scheme='http://www.blogger.com/atom/ns#' term='blogosphere'/><title type='text'>Quick note</title><content type='html'>&lt;span style="font-style:italic;"&gt;(am posting this from 37000 ft. isn't technology cool ?)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The clustering series will be on hiatus for a few days while I wrap up some more pressing deadlines. It will restart two weeks from now. &lt;br /&gt;&lt;br /&gt;Congratulations to Adam Smith and Sean Hallgren on getting a &lt;a href="http://bit.ly/13mMOI"&gt;PECASE award&lt;/a&gt;. See what &lt;a href="http://adamdsmith.wordpress.com/"&gt;creating a new blog&lt;/a&gt; can do !&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6555947-7797791710702404906?l=geomblog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://geomblog.blogspot.com/feeds/7797791710702404906/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=6555947&amp;postID=7797791710702404906' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6555947/posts/default/7797791710702404906'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6555947/posts/default/7797791710702404906'/><link rel='alternate' type='text/html' href='http://geomblog.blogspot.com/2009/07/quick-note.html' title='Quick note'/><author><name>Suresh</name><uri>http://www.blogger.com/profile/15898357513326041822</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='05767624388557425525'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6555947.post-1187765454327736034</id><published>2009-07-09T11:58:00.003-06:00</published><updated>2009-07-09T12:10:30.990-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='nsf'/><category scheme='http://www.blogger.com/atom/ns#' term='research'/><title type='text'>NSF EDA Workshop: Wrap up</title><content type='html'>Today morning was wrap up day. The theory group was "tasked" with coming up with a few slides suggested avenues for further cooperation. Dick talked about creating (or re-creating) an environment in academia where researchers are building chips and talking to other experts down the corridor (like theoreticians) rather than the 'planted' model where a theoretician is planted in a corporate design environment for 6 months or so. &lt;br /&gt;&lt;br /&gt;I talked briefly about what I called "new" theory: the twin ideas of randomization and approximation that have been so powerful in algorithm design (even for intractable prolems), and how these play into the problems of dealing with massive high dimensional data. &lt;br /&gt;&lt;br /&gt;Vijaya gave a synopsis of cache-obliviousness, multicore research, and her recent work in this area. There's a lot of interest in EDA and multicore work, and she made the argument that theoreticians are learning how to design efficient multicore algorithms and can be of great help in adapting EDA methods.&lt;br /&gt;&lt;br /&gt;There were also four sub-panels that addressed various aspects of EDA. What to me was interesting that many of the panels brought up "EDA needs in theory" that matched some of the things we talked about. For example, there's a pressing need for methods that run linear (and even sublinear) time, tools that are incremental, in that they can progressively generate better and better solutions given more time, and tools that can parallelize well.&lt;br /&gt;&lt;br /&gt;They also talked about providing benchmarks: for example, a 10,000 variable SAT instance and code that solves it. I thought (and said) that this would be an excellent way to get people dabbling in this area. &lt;br /&gt;&lt;br /&gt;Randomization was a trickier matter. Although the EDA folks recognize the power of randomization, they are concerned about reproducibility. The DA pipelne is long and involved, and once you've fixed the output of a piece of the pipeline, you'd like to keep it in place and not change from iteration to iteration.&lt;br /&gt;&lt;br /&gt;Of course this is something that can be fixed with seed control. For more portability, it's conceivable that you can cache the random coin tosses once you've settled on a reasonable solution to any piece in the pipeline. &lt;br /&gt;&lt;br /&gt;Overall, it was quite interesting, although exhausting. The general idea with such workshops is that the findings make their way into the next call for proposals (sometime in October/November), so if you have EDA people you'd like to collaborate it, this might be a good opportunity. &lt;br /&gt;&lt;br /&gt;It's time to go home.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6555947-1187765454327736034?l=geomblog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://geomblog.blogspot.com/feeds/1187765454327736034/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=6555947&amp;postID=1187765454327736034' title='10 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6555947/posts/default/1187765454327736034'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6555947/posts/default/1187765454327736034'/><link rel='alternate' type='text/html' href='http://geomblog.blogspot.com/2009/07/nsf-eda-workshop-wrap-up.html' title='NSF EDA Workshop: Wrap up'/><author><name>Suresh</name><uri>http://www.blogger.com/profile/15898357513326041822</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='05767624388557425525'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>10</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6555947.post-9101619299838130794</id><published>2009-07-09T07:55:00.002-06:00</published><updated>2009-07-09T09:26:17.800-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='nsf'/><category scheme='http://www.blogger.com/atom/ns#' term='research'/><category scheme='http://www.blogger.com/atom/ns#' term='eda'/><title type='text'>NSF EDA Workshop: A comment</title><content type='html'>Paul Beame had a brilliant comment on my post about the EDA Workshop, and I liked it so much I'm reproducing it in full here:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;One fault is that we don't even teach the proper tools in our algorithms courses! There are relatively few algorithms texts that even support the teaching of backtracking as an algorithmic paradigm. Moreover, the sort of very nice work with provable small exponential upper bounds that some theory researchers (e.g., David Eppstein, Richard Beigel, Martin Furer) have done in backtracking algorithms for combinatorial problems is only a very small part of the landscape for algorithmic solutions to hard problems.&lt;br /&gt;&lt;br /&gt;Too often, we in theory leave the important hard problems to those in AI and formal methods. One of the standard paradigms in practical verification is to attack problems that in their full generality are PSPACE-hard (or worse), find a useful NP subclass, apply a reduction from these problems to SAT, and use SAT solvers that involve worst-case exponential backtracking (or to a much lesser extent local search) algorithms that are fast in many cases. The key techniques used in these solvers are powerful heuristics that are usually studied in AI but rarely in algorithms research.&lt;br /&gt;&lt;br /&gt;The potential for practical impact of improved versions of these algorithms (even with only polynomial speed-up) could be huge. Traditional algorithmic research has occasionally had something useful to say about these topics (such as Moser's recent beautiful analysis of variants of the random walk algorithm on the special classes of CNF formulas satisfying the Lovasz Local Lemma conditions) but it is somehow not viewed as part of the mainstream.&lt;br /&gt;&lt;br /&gt;Applications in EDA are only part of the utility of some of the exponential algorithms used in formal methods. There has been a large and surprisingly successful body of work on software model checking that uses quite clever theoretical ideas. One of my favorite examples is the following: Though techniques for checking properties of communicating finite state machines (one of those nasty PSPACE-complete problems) had been in use for EDA since the early 1990's, model checking software presented an additional problem because of the need to handle the call stack. Suppose that one needs to check a safety property (i.e., an invariant). The key observation (which goes back to the early 1960's) is that the language consisting of the possible stack contents of a PDA is always a regular language for which an NFA can be easily constructed from the PDA! One can then apply a small extension the previous algorithm to check safety properties (which must hold for all possible stack contents).&lt;/blockquote&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6555947-9101619299838130794?l=geomblog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://geomblog.blogspot.com/feeds/9101619299838130794/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=6555947&amp;postID=9101619299838130794' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6555947/posts/default/9101619299838130794'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6555947/posts/default/9101619299838130794'/><link rel='alternate' type='text/html' href='http://geomblog.blogspot.com/2009/07/nsf-eda-workshop-comment.html' title='NSF EDA Workshop: A comment'/><author><name>Suresh</name><uri>http://www.blogger.com/profile/15898357513326041822</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='05767624388557425525'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6555947.post-332867922452184238</id><published>2009-07-08T20:20:00.005-06:00</published><updated>2009-07-09T11:58:04.081-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='nsf'/><category scheme='http://www.blogger.com/atom/ns#' term='research'/><title type='text'>NSF Workshop: Electronic Design Automation</title><content type='html'>I'm currently at an NSF Worshop on Electronic Design Automation (the thing that used to be called VLSI design). I'm here as part of the 'theory audience' along with Vijaya Ramachandran and Dick Lipton (blogger power!).&lt;br /&gt;&lt;br /&gt;Thankfully, Dick has posted an &lt;a href="http://rjlipton.wordpress.com/2009/07/08/nsf-workshop-on-design-automation-and-theory/"&gt;extensive summary&lt;/a&gt; of the day of talks, so I don't have to. Our mandate here is to listen in on the discussions and come up with a response that suggests avenues where theory folk might have something useful to contribute. &lt;br /&gt;&lt;br /&gt;From a purely biased algorithms perspective, one thing that strikes me about the EDA community (at least in the formal methods/verification realm) is their unwillingness to give up in the face of beyond-NP-completeness. What I mean is this: most of my training in algorithms (and this is likely true for you as well) is in the polynomial-time regime: all the algorithic paradigms we learn are effective at reducing the complexity of an algorithm from one polynomial to another. &lt;br /&gt;&lt;br /&gt;When we engage with NP-hardness, we switch modes to approximations, and focus on the issue of quality: even the approximation algorithms themselves run in poly time. There are very few people (David Eppstein comes to mind) who work on algorithms in the exponential/subexponential realm and will worry about (say) reducing the base of the exponent for SAT or graph coloring or other hard problems. &lt;br /&gt;&lt;br /&gt;The verification folks don't necessarily solve their (very hard) problems exactly, but they do design all kinds of tricks (and heuristics) to deal with these problems, because they actually need to solve them ! In my view, it wouldn't be a bad idea for students learning algorithms to learn at least a few tricks for designing algorithms that might run in exponential time, but are efficient. Remember that &lt;a href="http://rjlipton.wordpress.com/2009/07/03/is-pnp-an-ill-posed-problem/"&gt;exponential might be better than n^100 for many values of n&lt;/a&gt;. &lt;br /&gt;&lt;br /&gt;One thing that came to mind as I listened to talks. With the exception of a talk by Rupak Mazumdar on faulty computations, and a talk by Ed Clarke (yes, &lt;a href="http://geomblog.blogspot.com/2008/02/2007-acm-turing-award-and-perspective.html"&gt;that Ed Clarke&lt;/a&gt;) on statistical model checking (based on the Neyman-Pearson hypothesis testing framework), there was little talk of the role that randomization might have to play in the problems of EDA. &lt;br /&gt;&lt;br /&gt;A second thought was how the lessons of massive data analysis might be useful in the realm of DA. One speakr described one critical problem as being the degree of complexity associated with current DA tools: there are over 4000 "knobs" to turn in one such tool ! It's believed that these knobs are not independent, and might even be contradictory. If we think of each "run" of the DA tool, outputing some kind of chip layout, as a point in this 4000+ dimensional space, I wonder whether techniques for dimensionality reduction and manifold analysis might be useful to find a set of "core knobs" that control the process. &lt;br /&gt;&lt;br /&gt;I have to say that it's nice to attend a workshop with a community that throws out terms like NP-Complete, \Sigma_2, and PSPACE so freely :).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6555947-332867922452184238?l=geomblog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://geomblog.blogspot.com/feeds/332867922452184238/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=6555947&amp;postID=332867922452184238' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6555947/posts/default/332867922452184238'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6555947/posts/default/332867922452184238'/><link rel='alternate' type='text/html' href='http://geomblog.blogspot.com/2009/07/nsf-workshop-electonic-design.html' title='NSF Workshop: Electronic Design Automation'/><author><name>Suresh</name><uri>http://www.blogger.com/profile/15898357513326041822</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='05767624388557425525'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6555947.post-4253584603827229177</id><published>2009-07-03T10:30:00.000-06:00</published><updated>2009-07-03T10:57:50.574-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='focs'/><category scheme='http://www.blogger.com/atom/ns#' term='research'/><title type='text'>FOCS 2009 results.</title><content type='html'>The FOCS list is out, (with abstracts). 73 papers accepted. Some papers that popped up on my radar screen (post more in the comments since I'll probably miss many good ones):&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;a target="_top" href="http://cs.uiowa.edu/%7Emrgibson"&gt;Matt Gibson&lt;/a&gt; and &lt;a target="_top" href="http://www.cs.uiowa.edu/%7Ekvaradar/"&gt;Kasturi Varadarajan&lt;/a&gt;. Decomposing Coverings and the Planar Sensor Cover Problem.&lt;br /&gt;&lt;br /&gt;This paper continues a line of work that I like to think we started in a SODA paper a few years ago. The problem is this: you're given a collection of sensors that monitor a region. But they have limited battery life, and they're also redundant (many different subsets of sensors have the range to cover the region). Can you schedule the sensors to go on and off so that at all times,  the entire region is covered, and the region is covered for as long as possible ?&lt;br /&gt;&lt;br /&gt;Early work on this problem formulated it in a combinatorial setting, which immediately led to things like log n-approximability lower bounds via set cover variants. Our belief was that the geometry of the regions would make things a bit easier. We made some small progress, getting a sublogarithmic approximation ratio for intervals on the line, and a constant factor for ranges with special structure. Subsequent work made steady improvements, and this work has brought things down to a constant  for general classes of regions in the plane&lt;/li&gt;&lt;li&gt;&lt;a target="_top" href="http://www.math.purdue.edu/%7Esbasu"&gt;Saugata Basu&lt;/a&gt; and &lt;a target="_top" href="http://ww2.lr.edu/mat/zell/zell.htm"&gt;Thierry Zell&lt;/a&gt;. Polynomial hierarchy, Betti numbers and a real analogue of Toda's Theorem&lt;br /&gt;&lt;br /&gt;Betti number computation has been a hot topic in computational geometry of late, but the idea of using Betti numbers to bound the (algebraic complexity) of basic geometric operations dates back to work by Dobkin and Lipton that relates the complexity of certain computations to the number of connected components of "invariant paths" in a computation. Ben-Or generalized these results to more complicated algebraic computation trees, and noting that "number of connected components" is the zero-th Betti number of the set of computations, asked whether higher-order Betti numbers might be useful for proving lower bounds. A series of results by Bjorner, Lovasz and Yao, Bjorner and Lovasz, and finally Yao showed that indeed the higher-order Betti numbers can be used to lower bound the complexity of algebraic computations. &lt;br /&gt;&lt;br /&gt;So Betti numbers are important as a measure of "counting" in algebraic complexity. This paper reinforces this intuition, by relating #P over the reals to the complexity of computing Betti numbers over semi-algebraic sets. Interestingly, this paper mentions the difficult nature of Toda's original proof, and Lance just recently tweeted a new very simple proof of Toda's theorem that he just published in ToC.  &lt;br /&gt;&lt;/li&gt;&lt;br /&gt;&lt;li&gt;David Arthur, &lt;a target="_top" href="http://www-cc.cs.uni-saarland.de/manthey/"&gt;Bodo Manthey&lt;/a&gt; and &lt;a target="_top" href="http://www.roeglin.org/"&gt;Heiko Roeglin&lt;/a&gt;. k-Means has Polynomial Smoothed Complexity&lt;br /&gt;&lt;br /&gt;Read my post on k-means for more on the significance of such results. This paper finally resolves the smoothed complexity issue, giving a polynomial bound (expected).&lt;br /&gt;&lt;/li&gt;&lt;br /&gt;&lt;li&gt;&lt;a target="_top" href="http://www.daimi.au.dk/%7Epeyman/"&gt;Peyman Afshani&lt;/a&gt;, Jeremy Barbay and &lt;a target="_top" href="http://www.cs.uwaterloo.ca/%7Etmchan"&gt;Timothy M. Chan&lt;/a&gt;. Instance-Optimal Geometric Algorithms&lt;br /&gt;&lt;br /&gt;An interesting result that seems to be able to take away order-dependence from some geometric problems.&lt;br /&gt;&lt;/li&gt;&lt;br /&gt;&lt;li&gt;&lt;a target="_top" href="http://www.almaden.ibm.com/cs/people/jayram"&gt;T.S. Jayram&lt;/a&gt; and             &lt;a target="_top" href="http://www.almaden.ibm.com/cs/people/dpwoodru/"&gt;David Woodruff&lt;/a&gt;. The Data Stream Space Complexity of Cascaded Norms&lt;br /&gt;&lt;br /&gt;Cascaded norms (where you compute one aggregate on each of a collection of streams, and compute another aggregate on these aggregates) are tricky for streaming algorithms, and this paper provides some interesting results. Again, I'm waiting for the document to show up, and I'd be interesting in seeing the kinds of techniques they use.&lt;br /&gt;&lt;/li&gt;&lt;br /&gt;&lt;/ul&gt;Papers I'd like to learn more about:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;a target="_top" href="http://cs.yale.edu/people/kannan.html"&gt;Ravindran Kannan&lt;/a&gt;. A new probability using typical moments and concentration results&lt;/li&gt;&lt;li&gt;&lt;a target="_top" href="http://www.cs.uu.nl/%7Ehansb"&gt;Hans Bodlaender&lt;/a&gt;, &lt;a target="_top" href="http://www.ii.uib.no/%7Efomin/"&gt;Fedor Fomin&lt;/a&gt;, &lt;a target="_top" href="http://www.ii.uib.no/%7Edaniello"&gt;Daniel Lokshtanov&lt;/a&gt;, &lt;a target="_top" href="http://www.cs.uu.nl/staff/penninkx.html"&gt;Eelko Penninkx&lt;/a&gt;, &lt;a target="_top" href="http://www.imsc.res.in/%7Esaket/"&gt;Saket Saurabh&lt;/a&gt; and &lt;a target="_top" href="http://www.thilikos.info/"&gt;Dimitrios Thilikos&lt;/a&gt;. (Meta) Kernelization&lt;/li&gt;&lt;li&gt;&lt;a target="_top" href="http://lightless.org/"&gt;Flavio Chierichetti&lt;/a&gt;, &lt;a target="_top" href="http://research.yahoo.com/%7Eravi_k53"&gt;Ravi Kumar&lt;/a&gt;, Silvio Lattanzi, Alessandro Panconesi and Prabhakar Raghavan. Models for the compressible Web&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;br /&gt;Comments ?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6555947-4253584603827229177?l=geomblog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://geomblog.blogspot.com/feeds/4253584603827229177/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=6555947&amp;postID=4253584603827229177' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6555947/posts/default/4253584603827229177'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6555947/posts/default/4253584603827229177'/><link rel='alternate' type='text/html' href='http://geomblog.blogspot.com/2009/07/focs-2009-results.html' title='FOCS 2009 results.'/><author><name>Suresh</name><uri>http://www.blogger.com/profile/15898357513326041822</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='05767624388557425525'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></entry></feed>