tag:blogger.com,1999:blog-70714965062983723272008-07-18T01:03:36.691-07:00The Making of MarkMailJason Hunterhttp://www.blogger.com/profile/00854855078730758915noreply@blogger.comBlogger37125tag:blogger.com,1999:blog-7071496506298372327.post-81641036570273999112008-07-18T00:44:00.000-07:002008-07-18T01:03:36.703-07:00A Tale of Two Search EnginesIf you're local to the Bay Area, you may be interested in attending an upcoming talk from the SDForum <span style="font-style: italic;">Software Architecture &amp; Modeling</span> SIG on August 27th. It's titled <a href="http://sdforum.com/index.cfm?fuseaction=Calendar.eventDetail&amp;eventID=13137"><span style="font-style: italic;">A Tale of Two Search Engines</span></a> and will be given by our own John Mitchell, one of the developers on <a href="http://markmail.org/">MarkMail</a>. Here's his abstract:<br /><blockquote>Betwixt the rigid structure of relational databases and the unbridled chaos of random content lies the world of search engines. Search engines shine in the middle ground where the messy complexity of reality makes everything harder than we imagine.<br /><br />While the soap operas of general-purpose search engines dominate the news, specialized search engines are coming to dominate their vertical niches. Special-purpose search engines can aggressively leverage domain-specific intelligence to return highly relevant results.<br /><br />This talk will present the architecture, implementation, and stories behind the creation of two specialized search engines for code and email: Krugle and MarkMail.<br /></blockquote>If you're interested in MarkMail I think you'll enjoy it. (And no, John doesn't really use words like "betwixt" in daily conversations.)<br /><br /><span class="appOutput"><span style="font-family:Verdana;font-size:85%;color:#003366;"></span></span>Jason Hunterhttp://www.blogger.com/profile/00854855078730758915noreply@blogger.comtag:blogger.com,1999:blog-7071496506298372327.post-84086360301874581222008-07-02T18:15:00.000-07:002008-07-02T18:24:11.909-07:00The Perl Review: Now with VideoThe folks at <a href="http://theperlreview.com/">The Perl Review</a> recently enhanced the <a href="http://www.theperlreview.com/Interviews/jason-hunter-markmail-200805.html">interview</a> I mentioned here Monday with a <a href="http://vimeo.com/1226043">new screencast video showing MarkMail in action</a>. The intro is terrific. There's a guy hitting his Mac with a hammer!<br /><br />It's a strange (happy) feeling to have others produce advertising videos for you. Thanks, <a href="http://use.perl.org/articles/08/07/02/193221.shtml">brian d foy</a>!Jason Hunterhttp://www.blogger.com/profile/00854855078730758915noreply@blogger.comtag:blogger.com,1999:blog-7071496506298372327.post-51441997994410064072008-06-30T12:20:00.000-07:002008-06-30T14:27:22.686-07:00Interview with The Perl Review<a href="http://theperlreview.com/">The Perl Review</a>, a quarterly newsletter about all things Perl, recently published an <a href="http://www.theperlreview.com/Interviews/jason-hunter-markmail-200805.html">interview with us</a> where we discuss several topics relating to <a href="http://markmail.org/">MarkMail</a>:<br /><ul><li>How we load mail</li><li>Our choice between Java and Perl<br /></li><li>Our model of permalinking</li><li>Comparative community sizes<br /></li><li>What's in store for the future</li><li>How this is different than Google</li></ul>It's a more technical interview than some of the ones we've done previously with <a href="http://feathercast.org/?p=60">Apache</a>, <a href="http://www.thecontentwrangler.com/people/forget_listserv_digests_youve_got_markmail_intervew_with_jason_hunter_mark/">The Content Wrangler</a>, and <a href="http://www.infoq.com/news/2008/01/markmail">InfoQ</a>.Jason Hunterhttp://www.blogger.com/profile/00854855078730758915noreply@blogger.comtag:blogger.com,1999:blog-7071496506298372327.post-89667059908460811422008-06-10T13:59:00.000-07:002008-06-10T14:59:52.058-07:00Diacritics, or should I say dịẫçritícsWe changed our indexing this week regarding how we handle diacritics -- those accent marks you see on vowels and some consonants in many languages.<br /><br />Previously we resolved all queries in a <span style="font-style: italic;">diacritic insensitive</span> manner. That meant that a search for "francois" would match both "francois" and "<strong style="font-weight: normal;">françois</strong>", and a search for "<strong style="font-weight: normal;">françois</strong>" would do the same. Basically we specified in our MarkLogic Server configuration that the c versus <strong style="font-weight: normal;">ç difference be ignored.<br /></strong><br /><strong style="font-weight: normal;">Now we've changed the configuration so the diacritic sensitivity choice depends on the search term. A term containing diacritics will trigger a diacritic <span style="font-style: italic;">sensitive</span> match, while a term without diacritics will remain diacritic <span style="font-style: italic;">insensitive</span>. That means a search for "francois" will match with and without diacritics (the same as before), but a search for </strong>"<strong style="font-weight: normal;">françois</strong>" will respect the <strong style="font-weight: normal;">ç character constraint and won't match "francois" anymore.<br /><blockquote>To summarize: If you care enough to type a diacritic, we'll care enough to match it for you!</blockquote></strong>This is a particularly helpful change as we've expanded from English-only content into lists written in <a href="http://markmail.org/search/?q=list%3Aja">Japanese</a>, <a href="http://markmail.org/search/?q=list%3Avi">Vietnamese</a>, <a href="http://markmail.org/search/?q=list%3Aes">Spanish</a>, <a href="http://markmail.org/search/?q=list%3Ade">German</a>, <a href="http://markmail.org/search/?q=list%3Ait">Italian</a>, <a href="http://markmail.org/search/?q=list%3Anl">Dutch</a>, <a href="http://markmail.org/search/?q=list%3Apt">Portuguese</a>, <a href="http://markmail.org/search/?q=list%3Ask">Slovak</a>, <a href="http://markmail.org/search/?q=list%3Apl">Polish</a>, and <a href="http://markmail.org/search/?q=list%3Afa">Farsi</a>. We even have one mail in <a href="http://markmail.org/search/?q=list%3Aorg.kde.kde-i18n-fry">Frisian</a>. Who knew!<br /><br /><strong><strong style="font-weight: normal;"><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp0.blogger.com/_iZU42gX80ZU/SE72qewL_ZI/AAAAAAAAAGI/8o9yqTx21Hw/s1600-h/Picture+1.png"><img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://bp0.blogger.com/_iZU42gX80ZU/SE72qewL_ZI/AAAAAAAAAGI/8o9yqTx21Hw/s400/Picture+1.png" alt="" id="BLOGGER_PHOTO_ID_5210373028584357266" border="0" /></a></strong></strong><br />On a per-message basis, we get more traffic from these lists than our English lists. Perhaps they're underserved by other email archive systems? Maybe the other systems have issues hosting messages with the non-ASCII characters. We've definitely had trouble finding "clean" historical archive records for non-English lists, ones where the diacritics were reliably preserved. Luckily for us, being built on XML, we have native support for all Unicode characters.<br /><br />We hope you find the new indexing logic helpful.Jason Hunterhttp://www.blogger.com/profile/00854855078730758915noreply@blogger.comtag:blogger.com,1999:blog-7071496506298372327.post-6430425035694409472008-05-21T17:17:00.000-07:002008-05-21T17:31:20.271-07:00Loaded TAXACOM: A List about BiodiversityA couple days ago we received a request from <a href="http://iphylo.blogspot.com/">Roderic Page</a> to load <a href="http://taxacom.markmail.org/">TAXACOM</a>. He describes it as:<br /><blockquote>...a mailing list that dates back to the early '90's, and is a forum for taxonomists and other researchers interested in biodiversity. It is lively, with some long conversations. It's also featured as the source for sociological research, such as <a href="http://www.amazon.com/dp/026208371X">"Systematics as Cyberscience: Computers, Change, and Continuity in Science"</a>. Given interest in the <a href="http://www.eol.org/">Encyclopedia of Life</a> (see also <a href="http://www.ted.com/index.php/talks/view/id/83">Ed Wilson's TED talk</a><a class="moz-txt-link-freetext" href="http://www.ted.com/index.php/talks/view/id/83"></a>), which could be viewed as one response to the issues raised on TAXACOM posts over the years, I think it would be a very timely addition to MarkMail.<br /></blockquote>With a description like that, how could we resist! So I'm happy to say we've <a href="http://taxacom.markmail.org/">loaded the list</a>, and (for trivia buffs) it even sets a new earliest list record in MarkMail, with archives starting back in 1992.<br /><br />For more, see Rod's <a href="http://iphylo.blogspot.com/2008/05/taxacom-indexed-by-markmail.html">blog</a> and <a href="http://markmail.org/message/fxl33z4s75d6xjqg">email</a> announcements.Jason Hunterhttp://www.blogger.com/profile/00854855078730758915noreply@blogger.comtag:blogger.com,1999:blog-7071496506298372327.post-54339115202830037882008-05-19T20:41:00.000-07:002008-05-19T22:21:49.818-07:00Loaded OpenMoko: An Open Source Smartphone PlatformLast week we received on our <a href="http://markmail.org/docs/feedback.xqy">feedback form</a> a request to load the <a href="http://openmoko.markmail.org/"><span class="blsp-spelling-error" id="SPELLING_ERROR_0">OpenMoko</span> mailing lists</a>. <a href="http://openmoko.org/">These folks</a> are creating an open source <span class="blsp-spelling-error" id="SPELLING_ERROR_1">smartphone</span> platform, very cool stuff. Along with the request, the requester explained the benefit he saw in having <span class="blsp-spelling-error" id="SPELLING_ERROR_2">MarkMail</span> archive the <span class="blsp-spelling-error" id="SPELLING_ERROR_3">OpenMoko</span> lists, from the perspective of a project participant. I've reprinted it here with permission:<br /><pre wrap="" style="font-family:georgia;"><blockquote>I would LOVE to see the <span class="blsp-spelling-error" id="SPELLING_ERROR_4">OpenMoko</span> lists get into <span class="blsp-spelling-error" id="SPELLING_ERROR_5">MarkMail</span>.. for 2 reasons..<br /><br />1. From a developer perspective, I'm new to the <span class="blsp-spelling-error" id="SPELLING_ERROR_6">OpenMoko</span> platform and still learning the build system, etc. and am eager to start writing my own applications. But there's only so much info on the wiki and like many young communities all the juicy info is buried in the Mailing lists. So I'd love to be able to search all the lists for things like installing the <span class="blsp-spelling-error" id="SPELLING_ERROR_7">sim</span> card, what new hardware bugs they've found on the <span class="blsp-spelling-error" id="SPELLING_ERROR_8">dev</span> list, how to modify the dialer application, etc. This is where <span class="blsp-spelling-error" id="SPELLING_ERROR_9">MarkMail</span> really shines and is the best platform out there for this type of information gathering from community lists. If these lists were in <span class="blsp-spelling-error" id="SPELLING_ERROR_10">MarkMail</span> it would be one of the ONLY places one could find some of this information because of the advanced search functions in <span class="blsp-spelling-error" id="SPELLING_ERROR_11">MarkMail</span>. I think this holds true for a lot of young open source communities and <span class="blsp-spelling-error" id="SPELLING_ERROR_12">MarkMail</span> can really help out.<br /><br />2. I think it would help the community in general by giving users an avenue to find the information they need to start better participating and contributing back. There would be less duplication of questions which distracts everyone on the list and hopefully more "I see how things are being done because I got all caught up by searching <span class="blsp-spelling-error" id="SPELLING_ERROR_13">MarkMail</span>, how about we do it this way.." etc.</blockquote>Of course we loaded the lists for him. Here's the activity chart:<span style="display: block;" id="formatbar_Buttons"><span class="on" style="display: block;" id="formatbar_CreateLink" title="Link" onmouseover="ButtonHoverOn(this);" onmouseout="ButtonHoverOff(this);" onmouseup="" onmousedown="CheckFormatting(event);FormatbarButton('richeditorframe', this, 8);ButtonMouseDown(this);"></span></span><pre style="font-family: georgia;" wrap=""><pre style="font-family: georgia;" wrap=""><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://openmoko.markmail.org/"><img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://bp3.blogger.com/_iZU42gX80ZU/SDJLxq-lx7I/AAAAAAAAAGA/b3ev21KUtLs/s400/Picture+3.png" alt="" id="BLOGGER_PHOTO_ID_5202303836288829362" border="0" /></a></pre></pre><br /></pre>Jason Hunterhttp://www.blogger.com/profile/00854855078730758915noreply@blogger.comtag:blogger.com,1999:blog-7071496506298372327.post-83103187862683466602008-05-12T11:38:00.000-07:002008-05-12T12:12:10.469-07:00Loaded Perforce: High-End Revision ControlRecently we loaded <a href="http://perforce.markmail.org/">seven mailing lists dedicated to the Perforce SCM system</a>. If you haven't heard of Perforce, they're a <a href="http://perforce.com/perforce/products.html">high-end revision control system</a>, with a <a href="http://www.perforce.com/perforce/customers/byindustry.html">long list of corporate clients</a>. They're known for speed and features.<br /><br />I've been using a Perforce system to <a href="http://www.onlamp.com/pub/a/onlamp/2006/11/02/personal_document_management.html">manage my own files</a> for over a decade now, appreciating their free individual license.<br /><br />Their mailing lists have a lot of technical Q&amp;A discussion, so I hope having these lists more easily searched will help people find the answers they need. Here's the historic traffic pattern:<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp0.blogger.com/_iZU42gX80ZU/SCiVxa-lx6I/AAAAAAAAAF4/X7daEITLtfY/s1600-h/Picture+4.png"><img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://bp0.blogger.com/_iZU42gX80ZU/SCiVxa-lx6I/AAAAAAAAAF4/X7daEITLtfY/s400/Picture+4.png" alt="" id="BLOGGER_PHOTO_ID_5199570446087210914" border="0" /></a><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp1.blogger.com/_iZU42gX80ZU/SCiO8q-lx5I/AAAAAAAAAFw/_RlnD-4INe8/s1600-h/Picture+3.png"><br /></a>Jason Hunterhttp://www.blogger.com/profile/00854855078730758915noreply@blogger.comtag:blogger.com,1999:blog-7071496506298372327.post-20282947839353302002008-05-01T17:08:00.000-07:002008-05-01T18:11:29.784-07:00Loaded Eclipse and NetcoolUsersYesterday we loaded the email archive histories for two new communities: <a href="http://eclipse.markmail.org/">Eclipse</a> and <a href="http://netcoolusers.markmail.org/">NetcoolUsers</a>. Normally I wouldn't talk about these two communities in the same blog post, but after the load it occurred to me that both projects (coincidentally) relate to IBM. More about that at the end.<br /><br />Eclipse (<a href="http://eclipse.org/">eclipse.org</a>) is an extremely popular open source development tool project, initiated by IBM back in 2001. (The name was widely seen as an attack on Sun.) It took off and lots of Java developers use it as their IDE. They have a beautiful growth chart:<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://eclipse.markmail.org/"><img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://bp1.blogger.com/_iZU42gX80ZU/SBpgf3tmGAI/AAAAAAAAAFg/zSjyJp7IprY/s400/Picture+4.png" alt="" id="BLOGGER_PHOTO_ID_5195571220772689922" border="0" /></a><br />NetcoolUsers (<a href="http://netcoolusers.org/">netcoolusers.org</a>) is a user community focused on IBM Tivoli Netcool. For a single list it's quite hopping (25 posts/day):<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp0.blogger.com/_iZU42gX80ZU/SBpgtntmGBI/AAAAAAAAAFo/P_fAZh2Z7EQ/s1600-h/Picture+7.png"><img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://bp0.blogger.com/_iZU42gX80ZU/SBpgtntmGBI/AAAAAAAAAFo/P_fAZh2Z7EQ/s400/Picture+7.png" alt="" id="BLOGGER_PHOTO_ID_5195571456995891218" border="0" /></a><br />The fact both these communities relate to IBM is purely coincidental, but it's also interesting because it reflects the direction of pull we're seeing from the MarkMail user base. It's an early sign of what you can expect in the future: more technical content beyond open source.<br /><br />Technical lists come in many flavors: pure open source (<a href="http://apache.markmail.org/">Apache</a>, <a href="http://jdom.markmail.org/">JDOM</a>), corporate-sponsored open source (<a href="http://eclipse.markmail.org/">Eclipse</a>, <a href="http://xensource.markmail.org/">Xen</a>), standards development (<a href="http://w3.markmail.org/">W3</a>), technical user groups (<a href="http://nanog.markmail.org/">NANOG</a>), and groups focused on proprietary technology (<a href="http://netcoolusers.markmail.org/">NetcoolUsers</a>). We plan to expand along each of these axes.<br /><br />If you have a list you'd like us to load, <a href="http://markmail.org/docs/feedback.xqy">let us hear about it</a>.Jason Hunterhttp://www.blogger.com/profile/00854855078730758915noreply@blogger.comtag:blogger.com,1999:blog-7071496506298372327.post-46903534322717841162008-04-29T13:49:00.000-07:002008-04-29T15:18:35.350-07:00Loaded NANOG: North American Network Operators' GroupToday we loaded a new list, <a href="http://markmail.org/list/edu.merit.nanog">NANOG</a>, a discussion forum for the North American Network Operators' Group. In its 100,000 messages it holds some fascinating discussions about internet operations. The chatter around <a href="http://markmail.org/search/?q=list%3Ananog+date%3A20010911">9/11</a>, <a href="http://markmail.org/search/?q=list%3Ananog+katrina+order%3Adate-forward+date%3A20050828-20050921">Katrina</a>, and <a href="http://markmail.org/search/?q=list%3Ananog+y2k+date%3A19991231-20000101">y2k</a> stand out especially.<br /><br />The list extends back to April 1994, two months earlier than any list we previously loaded. It's always fun to break little records like that. It could be a while before we break this one again.<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp2.blogger.com/_iZU42gX80ZU/SBeei3tmF_I/AAAAAAAAAFY/pXGriC9xP2s/s1600-h/Picture+3.png"><img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://bp2.blogger.com/_iZU42gX80ZU/SBeei3tmF_I/AAAAAAAAAFY/pXGriC9xP2s/s400/Picture+3.png" alt="" id="BLOGGER_PHOTO_ID_5194795017103087602" border="0" /></a><br />If you're intrigued by internet operations, the <a href="http://www.nanog.org/listfaq.html">NANOG FAQ</a> has lots of good factoids.Jason Hunterhttp://www.blogger.com/profile/00854855078730758915noreply@blogger.comtag:blogger.com,1999:blog-7071496506298372327.post-23185929500420082242008-04-15T17:49:00.000-07:002008-04-15T19:01:27.411-07:00Loaded Python: A Cool Million MessagesHappy news: We've just finished loading the <a href="http://python.markmail.org/">Python Software Foundation mailing lists</a>. (Python is a popular programming language, overseen by the PSF.) With this load we're breaking a few records:<br /><ul><li>Weighing in at 1,022,479 total messages, Python is now the largest community ever loaded since our initial launch. (We went live back in November with roughly 4 million Apache messages.)</li><li>Half of those million mails are from a single list, <a href="http://markmail.org/list/org.python.python-list">python-list</a>. That means python-list holds the new record for Crazy Huge What The Heck Can They Talk About So Much list. (And, would you believe, there's even more python-list histories from 1992-1995 still to load.)<br /></li><li>This puts our total combined MarkMail message count above 10,000,000. There was much hooting and hollering (and page refreshing) around here as the numbers clicked over.</li><li>It's the biggest community ever loaded by our new hire Evan Paull. OK, it's the only community ever loaded by Evan. He started just a couple weeks ago. We figure after he's wrangled together a million message history, everything else will look easy.<br /></li></ul>Among the million mails are the archives for the Mailman project, something I'm especially happy about because much of our work here involves interfacing with Mailman, and this should help us understand it better.<br /><br />As always, here's the traffic chart:<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://python.markmail.org/"><img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://bp0.blogger.com/_iZU42gX80ZU/SAVNjqmvejI/AAAAAAAAADw/39UqUzpyDcM/s400/Picture+17.png" alt="" id="BLOGGER_PHOTO_ID_5189639420741909042" border="0" /></a>Jason Hunterhttp://www.blogger.com/profile/00854855078730758915noreply@blogger.comtag:blogger.com,1999:blog-7071496506298372327.post-70502813995744347592008-04-07T15:22:00.000-07:002008-04-07T16:03:33.401-07:00Loaded GNOME: 750,000 emailsOver the weekend we loaded the <a href="http://gnome.markmail.org/">mailing list history for the GNOME project</a>. GNOME is a immensely popular GNU project, a free software desktop environment and development framework. Their message traffic shows it has a vibrant and active community. They <a href="http://bethesignal.org/blog/2008/04/08/gnome-in-markmail/">boast</a> a history of 750,000 emails across more than 200 lists:<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://gnome.markmail.org/search/"><img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://bp2.blogger.com/_iZU42gX80ZU/R_qgBdWBpuI/AAAAAAAAADg/-9bCaVDbMEM/s400/Picture+12.png" alt="" id="BLOGGER_PHOTO_ID_5186633867787871970" border="0" /></a>The peak in 2007? That's because in 2007 they started a new <a href="http://markmail.org/list/org.gnome.svn-commits-list">svn-commits-list</a> (a list that captures emails about code check-ins) and it's been archived while the older cvs-commits-list wasn't. If we add -type:checkins to the query, we can graph the history without that list:<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://gnome.markmail.org/search/-type:checkins"> <img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://bp0.blogger.com/_iZU42gX80ZU/R_qgF9WBpvI/AAAAAAAAADo/SvQM2Te6fBw/s400/Picture+11.png" alt="" id="BLOGGER_PHOTO_ID_5186633945097283314" border="0" /></a>It took a fair amount of work to load the GNOME history because the archives had more spam and virus mails than could suitably be removed by hand. We had to use procmail and <a href="http://spamassassin.markmail.org/">SpamAssassin</a> to remove the junk.<br /><br />One neat factoid: It's easier to remove spam from mail sent in 2004 than mail sent today. Spam blocking has always been a competitive arms race, but in this case we're fighting yesterday's war with today's technology! Even running in offline mode, SpamAssassin did a darn fine job.<br /><br />I just wish it ran faster. If anyone out there is a SpamAssassin performance guru, please <a href="http://markmail.org/docs/feedback.xqy">let us know</a>.<br /><br />Our thanks to <a href="http://gnome.markmail.org/search/?q=from%3A%22jeff+waugh%22">Jeff Waugh</a> for helping us get the histories.Jason Hunterhttp://www.blogger.com/profile/00854855078730758915noreply@blogger.comtag:blogger.com,1999:blog-7071496506298372327.post-76407098268847606122008-03-17T20:52:00.000-07:002008-03-17T21:33:26.050-07:00World Wide Web Consortium Lists: 400,000 emailsHTML 4.0, XML, PNG, CSS, DOM, and XQuery: These are but a few of the technologies to come out of the World Wide Web Consortium, commonly referred to as the W3C. We're proud to announce that MarkMail (which by the way uses all of those technologies!) has loaded the full <a href="http://w3.markmail.org/">W3C public mailing lists</a>. They start in 1994 and cover 400,000 emails across 200 mailing lists.<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp0.blogger.com/_iZU42gX80ZU/R99E5v8xG3I/AAAAAAAAADY/yTyKYkcC4zk/s1600-h/Picture+4.png"><img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://bp0.blogger.com/_iZU42gX80ZU/R99E5v8xG3I/AAAAAAAAADY/yTyKYkcC4zk/s400/Picture+4.png" alt="" id="BLOGGER_PHOTO_ID_5178933855413934962" border="0" /></a><br />With such a long and deep history it's fun to do a little archaeology: You can find the <a href="http://markmail.org/message/uvhz4o2uxlpbq5kl">first mention of XML</a> back in 1996. I tried to find the formal "XML 1.0" announcement and saw there wasn't one, but on launch day (February 10, 1998) you can find people <a href="http://markmail.org/message/hzeiqtnernjbaclb">complaining about rendering issues with the spec</a>. Isn't that always the way with mailing lists? By the way, it's fun to use XML to search on the birth of XML.<br /><br /><a href="http://w3.markmail.org/search/?q=google+order%3Adf">Google first came up as a topic</a> in August 1998, back when its domain ended "stanford.edu". That beats any other list by 5 months. The <a href="http://w3.markmail.org/search/?q=xquery+order%3Adf">first mention of XQuery</a> didn't come until January 2001, well after xml-dev and other lists were talking about it. I expect there's more chatter in the private W3C archives.<br /><br />Finally, the <a href="http://markmail.org/message/tn537hr4u5nm2ayz">first mention of MarkMail</a> came in December 2007. And what a great post it was! :)Jason Hunterhttp://www.blogger.com/profile/00854855078730758915noreply@blogger.comtag:blogger.com,1999:blog-7071496506298372327.post-64932008600449626472008-03-13T01:04:00.000-07:002008-03-13T01:05:30.999-07:00Loaded Perl: 530,000 emailsPerl is the duct tape of the internet. Created by <a href="http://history.perl.org/PerlTimeline.html">Larry Wall in 1987</a> and made famous with his <a href="http://www.oreilly.com/catalog/pperl3/">Programming Perl</a> "camel book" published by O'Reilly, it's the tool sysadmins use to keep things running.<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp1.blogger.com/_iZU42gX80ZU/R9djE_8xG1I/AAAAAAAAADI/btLbB4hU5uY/s1600-h/0596000278_cat.gif"><img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 120px; height: 157px;" src="http://bp1.blogger.com/_iZU42gX80ZU/R9djE_8xG1I/AAAAAAAAADI/btLbB4hU5uY/s400/0596000278_cat.gif" alt="" id="BLOGGER_PHOTO_ID_5176715234222611282" border="0" /></a><br />We're proud to announce we've finished loading the Perl.org mailing list history into MarkMail. A total of <a href="http://perl.markmail.org/search/list:org.perl">530,000 emails</a> across 75 lists. The lists don't go back to 1987 (boy that'd be cool if they did). But that's all right; who really needs tech support against Perl 1.000?<br /><br />What we have here is traffic starting with the migration to the Perl.org setup in 1999:<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp3.blogger.com/_iZU42gX80ZU/R9djff8xG2I/AAAAAAAAADQ/HNJQ6xICMFE/s1600-h/Picture+4.png"><img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://bp3.blogger.com/_iZU42gX80ZU/R9djff8xG2I/AAAAAAAAADQ/HNJQ6xICMFE/s400/Picture+4.png" alt="" id="BLOGGER_PHOTO_ID_5176715689489144674" border="0" /></a><br />Enjoy! And if <a href="http://perl.markmail.org/search/?q=from%3A%22larry%20wall%22">anyone</a> has earlier archives, <a href="http://markmail.org/docs/feedback.xqy">let us know</a>.Jason Hunterhttp://www.blogger.com/profile/00854855078730758915noreply@blogger.comtag:blogger.com,1999:blog-7071496506298372327.post-330308399433032232008-03-11T22:13:00.000-07:002008-03-11T22:48:25.863-07:00New Search Feature: "opt:nostem"In the science of Information Retrieval there's a constant tug of war between <a href="http://en.wikipedia.org/wiki/Information_retrieval#Precision">precision</a> and <a href="http://en.wikipedia.org/wiki/Information_retrieval#Recall">recall</a>. As Wikipedia defines the terms, <span style="font-style: italic;">precision</span> is the fraction of the documents retrieved that are relevant to the user's information need, and <span style="font-style: italic;">recall</span> is the fraction of the documents that are relevant to the query that are successfully retrieved. Or as I define the terms, <span style="font-style: italic;">precision</span> is how much of what you wanted you actually got, and <span style="font-style: italic;">recall</span> is how much of what you got is what you wanted.<br /><br />MarkMail increases recall by running stemmed searches. This loosens the query constraint so that searching for <span style="font-style: italic;">proxies</span><span style="font-weight: bold; font-style: italic;"></span> will match <span style="font-style: italic;">proxy</span> as well. Sometimes this is good, and sometimes we hear from users who don't like the behavior all that much! They want more precision.<br /><br />So we're happy to announce a new feature, <span style="font-family:courier new;">opt:nostem</span>, that when added to the search string turns off stemming for that query. You can try it for yourself:<br /><br /><a href="http://markmail.org/search/?q=proxies">http://markmail.org/search/?q=proxies</a><br /><a href="http://markmail.org/search/?q=proxies+opt%3Anostem">http://markmail.org/search/?q=proxies+opt%3Anostem</a>Jason Hunterhttp://www.blogger.com/profile/00854855078730758915noreply@blogger.comtag:blogger.com,1999:blog-7071496506298372327.post-6903251084420647772008-03-07T11:41:00.000-08:002008-03-07T12:27:04.412-08:00Average Load Time: 0.1 SecondsThere are many challenges in running a high-traffic web site. Performance is a challenge we particularly focus on at MarkMail because users get frustrated if they have to wait more than a second for a reply.<br /><br />The challenge in maintaining performance increases as more of a site's content gets built dynamically -- meaning on the fly in response to user requests rather than ahead of time where it can be directly served (like a McDonalds hamburger).<br /><br />With MarkMail we build every page dynamically using XQuery. Even a page that at first blush seems as if it could be pre-built, like an individual email message, we actually build dynamically because we want to highlight the search terms from your query.<br /><br />All this is why I was so happy to notice that Alexa.com calls us a "Very Fast" site...<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.alexa.com/data/details/main/markmail.org"><img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://bp2.blogger.com/_iZU42gX80ZU/R9Gduv8xG0I/AAAAAAAAADA/M1g2j8-iAi8/s400/Picture+2.png" alt="" id="BLOGGER_PHOTO_ID_5175090873296296770" border="0" /></a><ul><li>Markmail.org has a traffic rank of: 128,666 (UP 745,248)</li><li>Speed: Very Fast (99% of sites are slower), Avg Load Time: 0.1 Secs</li></ul>Here's some <a href="http://www.alexa.com/site/help/?index=109">background on how Alexa tracks performance</a>.Jason Hunterhttp://www.blogger.com/profile/00854855078730758915noreply@blogger.comtag:blogger.com,1999:blog-7071496506298372327.post-41797020437179994962008-02-27T23:57:00.000-08:002008-02-28T00:23:17.713-08:00New Feature: Top 10 expands to Top 100Every time you do a search on MarkMail the leftmost pane shows you the top 10 lists, senders, attachments, and message types for all emails matching your query. OK, it's not always 10 that you see. Sometimes it's more, sometimes less. Exactly how many you see depends on your browser size. But even if you're the proud owner of one of those new 17" MacBook Pro laptops with the 1920x1200 screen, the view maxes out around 25.<br /><br />We've added a new feature to help improve this. When there are more values than will fit in the selection box, you'll see a "View more" link in the top right corner.<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp1.blogger.com/_iZU42gX80ZU/R8ZtJGGVvcI/AAAAAAAAACw/1gUkG1d8WS0/s1600-h/Picture+10.png"><img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://bp1.blogger.com/_iZU42gX80ZU/R8ZtJGGVvcI/AAAAAAAAACw/1gUkG1d8WS0/s400/Picture+10.png" alt="" id="BLOGGER_PHOTO_ID_5171941225105046978" border="0" /></a><br />Clicking on "View more" shows the top 100 in an overlay.<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp3.blogger.com/_iZU42gX80ZU/R8ZtqmGVvdI/AAAAAAAAAC4/tvRc78rsIns/s1600-h/Picture+9.png"><img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://bp3.blogger.com/_iZU42gX80ZU/R8ZtqmGVvdI/AAAAAAAAAC4/tvRc78rsIns/s400/Picture+9.png" alt="" id="BLOGGER_PHOTO_ID_5171941800630664658" border="0" /></a><br />Clicking on any of the values in the overlay will limit your search, same as clicking on a value in the short list. Enjoy!Jason Hunterhttp://www.blogger.com/profile/00854855078730758915noreply@blogger.comtag:blogger.com,1999:blog-7071496506298372327.post-79011976321940108442008-02-25T21:06:00.000-08:002008-02-25T21:39:57.138-08:00A Place for XenAt MarkMail you can now find your Zen. Or, to be more accurate, you can find your <a href="http://xen.markmail.org/">Xen</a>.<br /><br /><a href="http://xen.org/">Xen</a> is an open source "hypervisor" (similar to VMWare) that enables operating system virtualization. It's supported by Citrix and used by Amazon EC2, among others.<br /><br />I can joke about finding Xen at MarkMail because we recently loaded a bit over <a href="http://xen.markmail.org/">100,000 messages from the Xen community</a>. If you're into virtualization, enjoy!<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp0.blogger.com/_iZU42gX80ZU/R8Ok9HFDEQI/AAAAAAAAACo/tazuGmmYlws/s1600-h/Picture+1.png"><img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://bp0.blogger.com/_iZU42gX80ZU/R8Ok9HFDEQI/AAAAAAAAACo/tazuGmmYlws/s400/Picture+1.png" alt="" id="BLOGGER_PHOTO_ID_5171158166930657538" border="0" /></a><br />If you want to compare VMWare with Xen, you'll find some <a href="http://markmail.org/message/3afdczqihyyhtf3l">good discussion</a> in the archive.Jason Hunterhttp://www.blogger.com/profile/00854855078730758915noreply@blogger.comtag:blogger.com,1999:blog-7071496506298372327.post-21498541781083627532008-02-20T06:00:00.000-08:002008-02-19T19:47:07.465-08:00PostgreSQL: More Traffic than MySQL (and a first Google spotting)When we <a href="http://markmail.blogspot.com/2007/12/weve-loaded-mysql-lists-and-their.html">announced</a> back in December we'd loaded the <a href="http://mysql.markmail.org/">MySQL database mailing lists</a>, we heard from several people who asked us to load the PostgreSQL lists also. We said we'd be happy to, and MarkMail now has <a href="http://postgresql.markmail.org/">635,000 PostgreSQL emails</a> loaded and searchable.<br /><br />Comparing PostgreSQL and MySQL is kind of interesting. With all the talk about the LAMP (Linux/Apache/MySQL/PHP-Perl-Python) architecture you'd think MySQL had a lock on the open source database market, but based on simple message traffic analytics, PostgreSQL has a much higher level of community involvement. Looking at January 2000 onward, the MySQL lists have amassed 340,000 messages with about 3,000 new messages each month:<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp3.blogger.com/_iZU42gX80ZU/R7OnL3FDENI/AAAAAAAAACQ/NiohpBUEmaQ/s1600-h/Picture+1.png"><img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://bp3.blogger.com/_iZU42gX80ZU/R7OnL3FDENI/AAAAAAAAACQ/NiohpBUEmaQ/s400/Picture+1.png" alt="" id="BLOGGER_PHOTO_ID_5166657019729613010" border="0" /></a><br />In the same time period, the PostgreSQL lists have hit 583,000 messages with 7,000 new each month:<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp0.blogger.com/_iZU42gX80ZU/R7OnbHFDEOI/AAAAAAAAACY/avhjH9SzzrM/s1600-h/Picture+2.png"><img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://bp0.blogger.com/_iZU42gX80ZU/R7OnbHFDEOI/AAAAAAAAACY/avhjH9SzzrM/s400/Picture+2.png" alt="" id="BLOGGER_PHOTO_ID_5166657281722618082" border="0" /></a><br />I wouldn't have thought it, but there it is.<br /><br />Also in the PostgreSQL lists we find the very first mention of Google in all of the messages loaded so far! The first Google sighting was on the pgsql-interfaces list, January 28, 1999, <a href="http://markmail.org/message/yw5xxrpgvhrntlcx">in a post by James Thomson</a>:<blockquote>"I've been using the Oracle Pro*C precompiler manual. I don't have the URL here at work but I found an online copy using www.<strong>google</strong>.com"<br /></blockquote>The first mention in another community happened on the xml-dev list a couple months later, March 10, 1999, <a href="http://markmail.org/message/uid4rqzii36f6nms">in a post by Andrew McNaughton</a>:<blockquote>"You need a new search engine. I've recently been using www.<strong>google</strong>.com with results an order of magnitude better than what I got from altavista (though altavista still has it's place for more complex query definitions)."<br /></blockquote>Here's the query if you want to look for yourself:<br /><br /><a href="http://markmail.org/search/google+order:df">http://markmail.org/search/google+order:df</a><br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp3.blogger.com/_iZU42gX80ZU/R7OrC3FDEPI/AAAAAAAAACg/WomCuKapLmw/s1600-h/Picture+3.png"><img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://bp3.blogger.com/_iZU42gX80ZU/R7OrC3FDEPI/AAAAAAAAACg/WomCuKapLmw/s400/Picture+3.png" alt="" id="BLOGGER_PHOTO_ID_5166661263157301490" border="0" /></a><br />Maybe as we load more community archives we'll get even earlier sightings.Jason Hunterhttp://www.blogger.com/profile/00854855078730758915noreply@blogger.comtag:blogger.com,1999:blog-7071496506298372327.post-84383242301299560242008-02-13T16:17:00.000-08:002008-02-13T15:59:46.221-08:00Announcing an Informal Partnership with CodehausWe're happy to announce we've developed an informal partnership with <a href="http://codehaus.org/">Codehaus</a> to load <a href="http://codehaus.markmail.org/">all their mail archives</a> and receive automatic notification of new Codehaus lists as they get created.<br /><br />The automatic update is particularly important because Codehaus is a fast-growing home to open source projects with new lists being created all the time. How fast is Codehaus growing? Looking at the traffic chart, it shows a beautiful upward trend line. For comparison, it has the same level of activity as Apache had in late 2000.<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp3.blogger.com/_iZU42gX80ZU/R7DFenFDEMI/AAAAAAAAACI/qlfd2ffbpIU/s1600-h/Picture+1.png"><img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://bp3.blogger.com/_iZU42gX80ZU/R7DFenFDEMI/AAAAAAAAACI/qlfd2ffbpIU/s400/Picture+1.png" alt="" id="BLOGGER_PHOTO_ID_5165845902270861506" border="0" /></a>Previously we loaded the <a href="http://groovy.markmail.org/">Groovy</a>, <a href="http://mule.markmail.org/">Mule</a>, and <a href="http://xfire.markmail.org/">XFire</a> archives from Codehaus. We now have the archives from <a href="http://grails.markmail.org/">Grails</a>, <a href="http://castor.markmail.org/">Castor</a>, <a href="http://mojo.markmail.org/">Mojo</a>, <a href="http://jruby.markmail.org/">JRuby</a>, <a href="http://plexus.markmail.org/">Plexus</a>, <a href="http://picocontainer.markmail.org/">PicoContainer</a>, <a href="http://cargo.markmail.org/">Cargo</a>, <a href="http://drools.markmail.org/">Drools</a>, <a href="http://openejb.markmail.org/">OpenEJB</a>, and <a href="http://xstream.markmail.org/">XStream</a> as well as almost a hundred more. In total we're archiving 400,000 emails across the <a href="http://codehaus.markmail.org/">Codehaus lists</a>.<br /><br />P.S. Curious what happened in May 2006? They had some ISP troubles that month. And of course the last month is short since February has only just begun.Jason Hunterhttp://www.blogger.com/profile/00854855078730758915noreply@blogger.comtag:blogger.com,1999:blog-7071496506298372327.post-43093175624010548532008-02-13T13:19:00.000-08:002008-02-13T15:58:08.833-08:00Squid Cache: Searching our own Dog FoodYesterday we loaded 115,000 messages from the <a href="http://squid.markmail.org/">Squid mailing lists</a>. We're particularly pleased about this because <a href="http://squid-cache.org/">Squid</a> plays a prominent role in the MarkMail site architecture and we plan to use these searchable archives to help with our own development.<br /><br />Squid is probably the most famous caching proxy out there. It's been around for years, is fire-tested, and has oodles of configuration options. At MarkMail we use Squid as our "reverse proxy cache". In case you're not familiar with that term, let me explain.<br /><br />On the web a "proxy" is a piece of software that sits between the user and the web server. When a user wants a web page, the user makes the request to the proxy and the proxy makes the request to the web server. Simple proxies provide a means to poke through firewalls, mask user identity, and things like that.<br /><br />A "caching proxy" is a proxy that remembers the traffic passing through it, so later requests for the same content can (subject to configurable rules) be delivered to the user without actually connecting to the destination server. Schools, companies, and even countries use caching proxies to reduce their bandwidth costs and speed their users' web browsing. For example, once any user has pulled a logo image from a web site, every other user at that organization can just pull the proxy's version. Caching proxies make web browsing better, faster, and cheaper.<br /><br />A "reverse proxy cache" is a caching proxy that runs on the server-side instead of the client-side. It gets first crack at each user request. In many cases, like when the requested page is already in its cache, a reverse proxy cache can handle the user request on its own and reduce the load on the actual web server.<br /><br />On MarkMail, Squid sits in front of our MarkLogic Server instance (our web server) and gets first crack at all user requests. It handles several tasks:<br /><ul><li><span style="font-weight: bold;">URL rewriting</span>. This lets us present friendly URLs like /message/xyzzy to our users, while we actually serve the content from a .xqy XQuery page in the MarkLogic Server back-end. We use a Squid plug-in called Squirm for this. It lets us map public URLs to private URLs.</li><li><span style="font-weight: bold;">Caching</span>. Almost every page in MarkMail is dynamic, even the home page with all those count statistics, but that doesn't mean we should regenerate the page on every request. We let Squid cache the results of each page for a few minutes. If you're looking at a page that anyone else saw recently, we're probably serving it to you from cache.</li><li><span style="font-weight: bold;">Connection pooling</span>. On the web there's a feature called Keep-Alive that lets users hold open connections to the web server in case they make later requests. A common Keep-Alive period is 30 seconds. Keep-Alive saves the cost of opening up new connections but holding all the open connections can be resource intensive for a web server. By using Squid, we let Squid hold all the Keep-Alive connections to end users (hundreds of connections) while MarkLogic Server only talks to Squid. This reduces the load on the actual web server, leaving it free to focus its energy on searching, rendering, counting, etc.<br /></li></ul>Of course there's more we'd like Squid to do for us. We'd like some help in blocking abusive users, automatically gzipping content, and things like that. We'll probably look for those features in a new load balancer. More on that later.<br /><br />In the meanwhile, hope you enjoy the <a href="http://squid.markmail.org/">Squid archives</a>.Jason Hunterhttp://www.blogger.com/profile/00854855078730758915noreply@blogger.comtag:blogger.com,1999:blog-7071496506298372327.post-37235265538428117732008-02-09T11:14:00.000-08:002008-02-11T13:41:19.703-08:00New Feature: Sweep the Chart to Select a Date RangePeople often write us saying they want to click or sweep on the chart to select a date range. We're happy to announce that's now possible.<br /><br />To demonstrate, if you search for <a href="http://markmail.org/search/javaone">JavaOne</a> you see a repeating yearly spike which correlates to the dates of the annual Java developer conference. Lets say you want to investigate what people said in just the last few years about the show. You can click and swipe your mouse over the time period of interest:<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp2.blogger.com/_RADq9isGzpw/R64N57krbqI/AAAAAAAAABQ/YepdzALBA4o/s1600-h/javaone.png"><img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://bp2.blogger.com/_RADq9isGzpw/R64N57krbqI/AAAAAAAAABQ/YepdzALBA4o/s400/javaone.png" alt="" id="BLOGGER_PHOTO_ID_5165081111535775394" border="0" /></a>This adds a date: constraint to the query and automatically updates the search results. You can remove the date constraint by clicking on the "Remove date refinements" link in the top right of the graph.<br /><br />You can sweep to select or click on individual months, and if you hold down control (command on a Mac) it toggles the selection, enabling you to create non-contiguous selections.Ryan Grimmhttp://www.blogger.com/profile/14329280406283683610noreply@blogger.comtag:blogger.com,1999:blog-7071496506298372327.post-78839720896488655222008-02-04T17:11:00.000-08:002008-02-04T19:54:32.336-08:00Saxon: Loaded 10,000 emails about XSLT and XQueryWe've recently begun archiving the <a href="http://saxon.markmail.org/">saxon-help mailing list</a>, with its 10,000 emails about the famous XSLT and XQuery processor written and maintained by Michael Kay.<br /><br />Michael's a great guy, in person and online, and he writes long detailed emails answering people's questions. He stays considerate even when the receiver is being a little "dense", ignoring people's help and exasperating the others who try to help out. A recent <a href="http://markmail.org/message/47zd2ji6jhu7svol">quote</a> from Michael on the xquery-talk list: <p></p><blockquote>I've spent five or ten minutes writing this response in the hope that you will learn from it and not make the same mistake again, which will save everyone time in the future. If you come back with another query showing the same error in a week's time, I shall give up.</blockquote> <p></p>I hope that by making the saxon-help archives more easily searchable than the built-in SourceForge search we'll be able to save him and the readers even more time.<br /><br />Extra tidbit: If you admin a project on SourceForge and want your archives in MarkMail, there's an easy way to work with SourceForge to make that happen. Just <a href="http://markmail.org/docs/feedback.xqy">let us know</a>.Jason Hunterhttp://www.blogger.com/profile/00854855078730758915noreply@blogger.comtag:blogger.com,1999:blog-7071496506298372327.post-40583794541203789662008-01-29T18:33:00.000-08:002008-01-29T22:44:16.873-08:00Give us a Date, and We'll Search ItIn MarkMail we like to have both a search box way to do something (easy for experts) and a graphical way to do something (easy for novices). Recently we added support for date-based query constraints. At the moment it's only available in the search box, but we thought it worth talking about anyway. A graphical version will be coming soon (we know, we can't wait to click and swipe the months on the chart either).<br /><br />With the new feature you specify a date or date range by adding a date: term to a query. For example, lets say you'd like to investigate the cause of the sudden spike in messages from <a href="http://php.markmail.org/">the PHP lists</a> for the query "<a href="http://php.markmail.org/search/register+globals">register globals</a>". You see this histogram:<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp3.blogger.com/_RADq9isGzpw/R5lauy9NhuI/AAAAAAAAABA/hdtgbibkRk0/s1600-h/Picture+1.png"><img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://bp3.blogger.com/_RADq9isGzpw/R5lauy9NhuI/AAAAAAAAABA/hdtgbibkRk0/s400/Picture+1.png" alt="" id="BLOGGER_PHOTO_ID_5159254608128607970" border="0" /></a>So let's satisfy that curiosity. In April of 2002 things really start to heat up and it remains a pretty hot topic until roughly November 2003. To restrict our query to these dates all we have to do is add "date:2002/04-2003/11" to our register globals query, yielding "<a href="http://php.markmail.org/search/register+globals+date%3A2002%2F04-2003%2F11">register globals date:2002/04-2003/11</a>". Two dates separated by a hyphen indicate a range. This gives you only the matching messages from April 2002 through November 2003. The chart even highlights the selection range:<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp0.blogger.com/_RADq9isGzpw/R5ld4C9NhvI/AAAAAAAAABI/RnzIUB8zVvQ/s1600-h/Picture+2.png"><img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://bp0.blogger.com/_RADq9isGzpw/R5ld4C9NhvI/AAAAAAAAABI/RnzIUB8zVvQ/s400/Picture+2.png" alt="" id="BLOGGER_PHOTO_ID_5159258065577281266" border="0" /></a>Looking at the statistics for this query, most of the messages were posted to the discuss list so it's likely that users are having problems. We might assume that something changed with the language so let's add "release" to the query. Sure enough, looks like they <a href="http://markmail.org/message/2grqhofbpin7xavw">changed the default behavior of register globals in the 4.2.0 release</a> which was released on April 22nd, 2002.<br /><br />We support more than just date ranges. Here's a just a few of the formats that we support:<br /><table style="width: 750px;"><br /><tbody><tr><td><a href="http://markmail.org/search/date:today">date:today</a></td><td>Messages posted today</td></tr><tr><td><a href="http://markmail.org/search/date:%22last+week%22">date:"last week"</a></td><td>Messages posted in the last 7 days</td></tr><tr><td><a href="http://markmail.org/search/date:%22last+month%22">date:"last month"</a></td><td>Messages posted in the last 30 days</td></tr><tr><td><a href="http://markmail.org/search/date:lastmonth">date:lastmonth</a></td><td>Spaces are optional, for convenience</td></tr><tr><td><a href="http://markmail.org/search/?q=date%3A2008%2F01%2F01">date:2008/01/26</a></td><td>Selection by day</td></tr><tr><td><a href="http://markmail.org/search/date:20080126">date:20080126</a></td><td>Slashes are optional, if you prefer</td></tr><tr><td><a href="http://markmail.org/search/date:2008/01">date:2008/01</a></td><td>Selection by month</td></tr><tr><td><a href="http://markmail.org/search/?q=date%3A2007">date:2007</a></td><td>Selection by year</td></tr><tr><td><a href="http://markmail.org/search/date:2005/06-">date:2005/06-</a></td><td>Everything from June 2005 onward, because of the trailing hyphen</td></tr><tr><td><a href="http://markmail.org/search/date:-2005/06">date:-2005/06</a></td><td>Everything up to the end of June 2005, because of the leading hyphen</td></tr><tr><td><a href="http://markmail.org/search/date:2007-2008">date:2007-2008</a></td><td>This year and last</td></tr><tr><td><a href="http://markmail.org/search/?q=date%3A%22July+4th%2C+2007%22">date:"July 4th, 2007"</a></td><td>Human readable formats are supported too</td></tr><tr><td><a href="http://markmail.org/search/?q=date%3A%22Julio+4th%2C+2007%22">date:"Julio 4th, 2007"</a></td><td>For all of you Spanish speakers</td></tr><tr><td><a href="http://markmail.org/search/?q=date%3At90d">date:t90d</a></td><td>Messages from the last 90 days, don't forget the t</td></tr><tr><td><a href="http://markmail.org/search/-date:2008">-date:2008</a></td><td>Negation is also allowed, put the hyphen in front of the date:</td></tr></tbody></table><br />Don't worry if you can't remember all this. The question mark graphic next to the search box will pop up a reminder. So now while we continue to work on allowing you to interact with the graphs, give this a spin and let us know what you think.<br /><br />P.S. Jason really likes this because it lets him examine all the <a href="http://jdom.markmail.org/search/list:commits+date:20040910-20071118">changes between JDOM 1.0 and JDOM 1.1</a>.Ryan Grimmhttp://www.blogger.com/profile/14329280406283683610noreply@blogger.comtag:blogger.com,1999:blog-7071496506298372327.post-23952574289933773002008-01-27T13:13:00.000-08:002008-01-27T13:53:50.083-08:00Ruby vs Groovy: What Can List Traffic Tell Us?Over the weekend we loaded the main <a href="http://ruby-lang.markmail.org/">Ruby lists from ruby-lang.org</a>, about 300,000 messages across the last six years. The <a href="http://ruby-talk.markmail.org/">ruby-talk</a> list alone weighs in at 245,000 messages and is our new second place traffic champ, trailing only php-general.<br /><br />With both the Ruby and <a href="http://groovy.markmail.org/">Groovy</a> archives loaded, we have an exciting opportunity to compare the two communities. In my experience talking with developers at Java conferences, they often look to both Ruby and Groovy as possible next languages to learn. The Java developers have a natural desire to go toward Groovy because it lets them keep their Java stack, but they're concerned about Groovy's level of support relative to Ruby, which has been around much longer and has a larger community.<br /><br />How much larger? Is the community growing or shrinking? It can be hard to tell with open source, having no revenue numbers and with download counts skewed by bundling. I think looking at email list traffic patterns are about as good a gauge as anything.<br /><br />Below you'll see the a composite graphic showing the traffic from the five Ruby lists compared to the five Groovy lists, with the Groovy lists inlaid at matching scale:<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp0.blogger.com/_iZU42gX80ZU/R5z5JhSQwJI/AAAAAAAAACA/V3eXsC1R-vk/s1600-h/ruby-groovy.png"><img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://bp0.blogger.com/_iZU42gX80ZU/R5z5JhSQwJI/AAAAAAAAACA/V3eXsC1R-vk/s400/ruby-groovy.png" alt="" id="BLOGGER_PHOTO_ID_5160273215008850066" border="0" /></a><br />The Ruby lists are more active, by about double. In the months before the Groovy 1.0 launch in January 2007, the spread was even larger. Both communities seem to have plateaued in 2007. I look forward to seeing what 2008 brings.<br /><br />P.S. The Ruby lists are half English and half Japanese. <a href="http://ruby-lang.markmail.org/search/from:%22Yukihiro%20Matsumoto%22">Yukihiro (Matz) Matsumoto</a> who created Ruby is Japanese, and the language first took off in Japan. If you speak Japanese, feel free to search for Japanese words. It should work but do <a href="http://markmail.org/docs/feedback.xqy">let us know</a> if you spot any issues.Jason Hunterhttp://www.blogger.com/profile/00854855078730758915noreply@blogger.comtag:blogger.com,1999:blog-7071496506298372327.post-61288114432157179692008-01-24T17:29:00.000-08:002008-01-24T20:52:51.167-08:00Groovy: Traffic Doubled with a Formal ReleaseA few days ago we loaded the <a href="http://groovy.markmail.org/">Groovy lists</a> and their 70,000 messages. The list traffic chart helped change my mind about the language.<br /><br />The <a href="http://groovy.codehaus.org/">groovy project</a> calls itself, "an agile and dynamic language for the Java Virtual Machine". I'd call it a cool scripting language that compiles to Java bytecodes and so lets you write in a scripting language while accessing the vast set of Java libraries out there.<br /><br />The first time I saw Groovy, years back, I got very excited -- but then it didn't seem to be catching on, and I thought it was slowly on the downturn. In fact it's not like that at all, it just <a href="http://markmail.org/message/iqmnj4z4jbq4st3c">takes time to develop a language</a>. Look at the shape of the message traffic histogram:<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp2.blogger.com/_iZU42gX80ZU/R5lAlxSQwGI/AAAAAAAAABs/m2GNFqjxfIo/s1600-h/Picture+1.png"><img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://bp2.blogger.com/_iZU42gX80ZU/R5lAlxSQwGI/AAAAAAAAABs/m2GNFqjxfIo/s320/Picture+1.png" alt="" id="BLOGGER_PHOTO_ID_5159225865758883938" border="0" /></a><br />Looks like the project caught some fire. <a href="http://glaforge.free.fr/weblog/index.php?itemid=228">Guillaume Laforge</a> blogged that the big jump you see here starting in January 2007 was due to the release of <a href="http://markmail.org/message/qspd5ufq35v7yk3a">Groovy 1.0</a>. I see they're on <a href="http://markmail.org/message/uolyl46mpr5w243o">Groovy 1.5</a> now, as of a month ago. The 100+ messages per day rate will probably continue. My friend <a href="http://www.javaworld.com/podcasts/jtech/2007/110107jtech004.html">Scott Davis</a> was right.Jason Hunterhttp://www.blogger.com/profile/00854855078730758915noreply@blogger.com