tag:blogger.com,1999:blog-75930876195900479672009-02-21T05:37:02.725-08:00Old BlogMoved to Wordpress: http://cafenate.wordpress.com/Nathan Fiedlerhttp://www.blogger.com/profile/18361875200770548606noreply@blogger.comBlogger68125tag:blogger.com,1999:blog-7593087619590047967.post-60538609385588708462009-02-16T23:32:00.000-08:002009-02-17T16:12:39.468-08:00Comparing Drobo and DroboShare to an OpenSolaris storage serverIn my previous entry I recalled the trial of recovering from a corrupted HFS+J volume that contained my Time Machine backups. During that process, I became aware of some of the drawbacks of the storage appliance on which my backups were stored, the <a href="http://www.drobo.com/">Drobo</a>. During all of that intensive reading and writing of large chunks of data, it was obvious that the Drobo was a bit on the slow side. Meanwhile, I had come across a <a href="http://blogs.sun.com/storage/entry/video_the_utlimate_zfs_tutorial">video tutorial</a> of <a href="http://en.wikipedia.org/wiki/Zfs">ZFS</a>, the file system that <a href="http://www.sun.com/">Sun</a> created a few years back. It was so impressive, I decided then and there that the solution was to replace the Drobo with a server running <a href="http://www.opensolaris.com/">OpenSolaris</a> and ZFS. Granted, there are some advantages to an appliance like the Drobo, and in this entry I'll outline the pluses and minuses.<br /><br /><span style="font-size:130%;">Drobo Advantages</span><br />It's an appliance, you plug it in and stick in the drives and that's basically all you need to do. In fact, the disks do not even have to be the same size, although if they are then it won't waste any of the capacity. With four 200GB drives, the Drobo provided 554GB of usable storage space. And it does this while consuming a modest 36W of power.<br /><br /><span style="font-size:130%;">Drobo Disadvantages</span><br />While the list of advantages was rather brief, this list is going to be significantly longer. First, if you are using a Drobo, you almost certainly want to share it on a LAN, in which case you need to buy a DroboShare (another $200 on top of the $500 for the empty Drobo, plus whatever you had to pay for the disks). To make this work properly, you need to run the Drobo Dashboard, which has a couple of idiosyncrasies. First, you have to completely disable the firewall in Mac OS X in order for the Dashboard to detect the DroboShare. Second, the Dashboard application files must be owned by a particular user, the one that installed the software. That would not be a problem except that the application won't launch for any other user. While the DroboShare can be mounted without using the Dashboard, it seems there is a file ownership issue with at least some of the files on the Drobo, making them appear to be corrupted. The Drobo support tech was stumped and unable to resolve that particular issue, and could not bring themselves to admit that the Dashboard software was flawed.<br /><br />During all of that Time Machine volume copying I was doing, the used space on the Drobo climbed upward. But as I deleted the botched copies, the Drobo was reluctant to return the space to the free side of the usage dial. Weeks went by and mysteriously one day the space all came back. I'll never know why, just one of the mysteries of the black box that is the Drobo. In fact, that is exactly what the Drobo is: a closed, black box. There is no access into it, no remedy when something goes wrong, no tool for diagnosis. If the box fails for any reason, you have exactly one place to turn to for help (if you're <a href="http://billstreeter.net/2008/03/18/do-not-buy-a-drobo/">Bill Streeter</a>, you'll know exactly how true that is). You better hope your support contract is still good ($50/year, $150 if you let it lapse). In the worst case, you have to buy a new Drobo just so you can get the data off of your disks. Yes, that is indeed the truth. Like any proprietary RAID-like device, the Drobo's on-disk format is a closely guarded secret. If the device fails and you can't find a replacement, your data is gone forever.<br /><br />While its power consumption is modest, it has insufficient airflow around the disks and as a result they run a bit too hot for my taste. Now that they are in the new storage server, they are much cooler. Speaking of power, the power connector on the Drobo is notoriously loose. It's fallen out numerous times over the year that I've used my Drobo. The tech support person suggested taping the power cord to the side of the Drobo. Brilliant.<br /><br />Last, but not least, the Drobo has a weird capacity "limit" of 2TB. Apparently it has something to do with the fact that it's only interface to the world is through a USB port. I would assume the second generation Drobo is better at this, but I'm not willing to spend $500 to find out. Regardless, your capacity is limited to whatever you can find in four disks, as that is the maximum number of disks any single Drobo can take. But, if you've got the money, you can plug two Drobos into a single Drobo Share. I doubt too many people have done that, as it would cost $1200 plus the cost of the disks. Meanwhile, I could build a system to hold 10+ disks for a fraction of that cost.<br /><br /><span style="font-size:130%;">OpenSolaris and ZFS</span><br />The only disadvantage to running a server with OpenSolaris is that you have to install, configure, and maintain it. But hey, I've been doing that for years so it's no trouble for me. In fact, the recent releases of OpenSolaris are remarkably easy to set up and administer. Most of the standard configuration is done using the graphical interface, and for everything else there are well-written manual pages. As for the advantages of OpenSolaris and ZFS, there are many. First of all, ZFS is the most amazing file system on the planet. It's incredibly easy to set up a storage pool and create file systems. It handles stripes, mirrors, and data/parity formats (RAID 0, 1, 5, and 6) depending on your replication needs. You can add as many disks as your hardware can handle, and you can configure them any way you like.<br /><br />ZFS has invincible data integrity. Unlike most, if not all, RAID 5 implementations, it does not suffer from the infamous <a href="http://en.wikipedia.org/wiki/Standard_RAID_levels#RAID_5_performance">write hole</a>. Instead, it never overwrites live data, so all writes go to free blocks, with data being written first, then the meta data blocks, and finally the über blocks. If power is lost at any point, when the system comes back, it will only see valid data and meta data. What's more, all blocks are check-summed, and that checksum is stored in the parent block. This checksum carries upward to the über block, which effectively has a fingerprint of the entire file system. This guards against the worst kind of data loss, the silent kind, as it detects the occasional bit rot that some disks can suffer. With built-in data replication, these problems can be automatically corrected on the fly.<br /><br />But what about data portability, which was a major issue for me with the Drobo? Well, get this: ZFS is open source. What's more, it's been ported to BSD and Mac OS X. So even if my storage server were to suddenly die, I not only have a choice of vendors to repair/replace the hardware, but I also have a choice of operating systems to access the data in the storage pool. But, in my opinion OpenSolaris is the best choice as it has the reference implementation of ZFS. What's more, OpenSolaris supports SMB, NFS, iSCSI, and AFP (via netatalk; see earlier blog entry), so I have several choices for how to access the storage over the network.<br /><br />ZFS has built-in support for snapshots. In fact, it has a feature similar to Time Machine, called Time Slider, that makes automatic snapshots and manages their expiration, much like Time Machine. With snapshots in place, if I ever run into a corrupted HFS disk image again, I can roll back the file system to an earlier snapshot. Granted, I may lose some data, but it's better than losing the entire image to an uncorrectable error.<br /><br /><span style="font-size:130%;">Final Notes<br /></span>I lied earlier when I said there was only one disadvantage to running a storage server instead of a Drobo. The one other issue is the power consumption of most server-class systems is over 100W. In fact, my current server consumes at least 102W and often hits 120W while actively doing work. But, I have <a href="http://www.logicsupply.com/blog/2008/11/05/the-chenbro-es34069-case-review-part-2-the-perfect-mainboard/">a plan</a> to replace the hardware with low power parts, ones primarily aimed at the mobile and embedded market. I plan to talk about that more in a future entry.<br /><br />While I could certainly format the Drobo using ZFS, I would only gain snapshots and on-disk consistency. Performance would still be rather poor, and disk management and repair would rely entirely on the Drobo itself. As far as ZFS would know, the Drobo would appear as one big disk. That is not the recommended scenario for ZFS according to its creators.<br /><br />Earlier I mentioned that Drobo provided 554GB of usable space with the four 200GB disks I had installed. In comparison, ZFS provided just 548GB. Not too bad considering I'm getting rock solid data integrity and automated snapshots.<br /><br />One final point, in regards to reclaiming disk space after deleting files. With ZFS, the freed space was returned within seconds, unlike the many weeks that it took the Drobo to realize I had deleted 100GB of data.<br /><br />All in all, I'm very happy with the decision I've made. I now have a fast, reliable, serviceable, and manageable storage box that I can update with newer software versions indefinitely, as well as easily grow the capacity as my needs change.<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7593087619590047967-6053860938558870846?l=cafenate.blogspot.com'/></div>Nathan Fiedlerhttp://www.blogger.com/profile/18361875200770548606noreply@blogger.com0tag:blogger.com,1999:blog-7593087619590047967.post-59671686682929991742009-02-16T17:44:00.000-08:002009-02-16T23:10:04.260-08:00Time Machine and invalid sibling link, now what?Last month I was playing around with the <a href="http://code.google.com/p/timedog/">timedog</a> script to determine where all the disk space was going on my Time Machine backups. I had mounted the remote disk image using <span style="font-family:courier new;">hdiutil attach</span>, when a few minutes later Time Machine kicked off a backup. In a moment of poor judgment, I tried to stop the backup and unmount the second mount of the backup volume that TM had established. At first nothing appeared to have gone wrong as a result of that action, but the next time TM tried to make a backup, it said the volume was corrupted and it couldn't make another backup. No matter, I was sure Disk Utility or <span style="font-family:courier new;">fsck</span> could correct the problem, surely the error wasn't too serious. Ah, but this error was no run of the mill error. It was in the fact the dreaded <span style="font-style: italic;">invalid sibling link</span> error. This is the error that makes most disk repair tools turn pale and shirk away into the corner. In most cases, nothing will fix the broken link, and there's no telling just what might be lost as a result.<br /><br />But, I wasn't going to give up too easily. After all, I had over a year's worth of backups that I wanted to recover. I Googled around for days, reading forum posts and blog entries, and anything else that might bring some hope to my dire situation. In many cases, others who had this problem tried <span style="font-family:courier new;">fsck</span> or Disk Warrior, and some of them were successful. By this time, I had tried <span style="font-family:courier new;">fsck -r</span> several dozen times, to no avail. Being the inventive type, I tried using <span style="font-family:courier new;">hdiutil convert</span> to create a new disk image from the corrupt one. But, as you can probably guess, all that accomplished was creating a new disk image with the same invalid sibling link. That meant that a simple disk block copy was not going to work, I had to try something that would copy the files one by one from the corrupt volume to a new one. Knowing that Time Machine makes gratuitous use of hard links, I needed a copy program that knew how to manage the hard links. Otherwise, a simple copy would result in a TM volume that was many times the size of the original.<br /><br />Using <span style="font-family:courier new;">rsync -H</span> was the first thing I tried, but it ran out of memory before it managed to copy anything. Next, I tried SuperDuper!, which was recommended by quite a few people. The free version would only perform a whole disk copy, erasing the destination and starting from scratch. That was fine since that was exactly what I wanted. Sadly, it too failed after about 36 hours, reporting a "type 8 error". It had managed to copy quite a bit of the TM backups, but I wasn't satisfied, I wanted everything.<br /><br />At this point I began to realize that there was only one way I was going to make a faithful copy of the Time Machine volume in a reasonable amount of time. Yep, that's right, I would have to write a script that would accomplish my goal. Being that Python is the language in which I am strongest, after Java, I chose to put its built-in file manipulation routines to good use. I figured I could use the same approach that timedog was using, comparing the inode values of the directory entries from one snapshot to the next. In this way, I could know which entries were hard links and which were new files. After just four weeks of spending my spare time hacking away in Python, I finally succeeded.<br /><br />Introducing <a href="http://code.google.com/p/timedog/wiki/UsingTimecopy">timecopy.py</a>, the fruit of my labor. It's a Python script that traverses a Time Machine volume and reproduces its contents to the destination of your choice. Aside from knowing about the hard links that Time Machine creates, it also copies over the extended attributes that convince TM to accept the copied backups. As a result of writing this tool, I have recovered my Time Machine backups and everything seems to be working fine once again. Granted, I'll never know what files may have been lost due to the invalid sibling link error, but at least I managed to save the vast majority of the data.<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7593087619590047967-5967168668292999174?l=cafenate.blogspot.com'/></div>Nathan Fiedlerhttp://www.blogger.com/profile/18361875200770548606noreply@blogger.com0tag:blogger.com,1999:blog-7593087619590047967.post-89540003364820806242009-02-08T14:13:00.000-08:002009-02-09T18:49:38.368-08:00Building netatalk on OpenSolaris 2008.11For a few months now, basically since the <a href="http://drobo.com/">Drobo</a> started supporting third party applications, I have been using my Drobo, via a DroboShare, as a Time Machine backup for my MacBook Pro. I used the <a href="http://code.google.com/p/backmyfruitup/">BackMyFruitUp</a> toolkit to set up the DroboShare as an <a href="http://en.wikipedia.org/wiki/Apple_Filing_Protocol">AFP</a> server, so the Mac saw it as an Apple-compatible network file share. One particularly fun step in that process was migrating my existing TM volume over to the Drobo, but that's another story. This story is about how I replaced the Drobo and DroboShare with a server running OpenSolaris.<br /><br /><span style="font-size:130%;">Installing OpenSolaris</span><br /><br />To start with, I took the old web/file server I had sitting around since last year and installed <a href="http://www.opensolaris.com/">OpenSolaris</a> 2008.11. Why OpenSolaris you may ask, considering I had been a Linux user for over a decade? Well, I have one word for you: ZFS. If you don't think <a href="http://opensolaris.org/os/community/zfs/">ZFS</a> is the most rockin' file system on the planet, then you haven't watched the three hour <a href="http://blogs.sun.com/storage/entry/video_the_utlimate_zfs_tutorial">presention</a>. Yeah, 3 hours, but it's absolutely fascinating. But seriously, I can't give ZFS all of the credit for prompting the switch to OpenSolaris. There are plenty of very <a href="http://www.opensolaris.com/learn/features/">good reasons</a> to use OpenSolaris and ZFS is just one of them.<br /><br />Okay, so once OpenSolaris was on the system disk, what next? Well, you should update the installed packages and reboot into the new boot environment. Next we'll need the C compiler and related packages: <span style="font-family:courier new;">pfexec pkg install gcc-dev</span><br /><br />The system is now capable of compiling software from source, in particular <a href="http://netatalk.sourceforge.net/">netatalk</a>. Skip the 2.0.3 release and go straight for whatever is the latest, which as of today is 2.0.4 beta2. We'll need that in order to work around a weird permissions issue introduced in Leopard. But first, we must install a compatible version of Berkeley DB.<br /><br /><span style="font-size:130%;">Installing Berkeley DB</span><br /><br />For now, netatalk works best with a slightly older release of <a href="http://www.oracle.com/technology/products/berkeley-db/">Berkeley DB</a>, version 4.2.52. Compiling this is pretty straightforward. Start by adding <span style="font-family:courier new;">/usr/local/lib<span style="font-family:georgia;"> to the</span></span> library load path (<span style="font-family:courier new;">pfexec crle -u -l /usr/local/lib</span>). Then compile and install Berkeley DB like so (consult their build instructions for details, but it basically goes like this):<br /><ol><li style="font-family: courier new;">cd build_unix</li> <li style="font-family: courier new;">../dist/configure --prefix=/usr/local</li> <li style="font-family: courier new;">make</li> <li style="font-family: courier new;">pfexec make install</li></ol><span style="font-size:130%;">Installing netatalk</span><br /><br />With the recent versions of netatalk, it expects the Solaris directory structure to look differently than what OpenSolaris has these days. To accommodate this, make a symbolic link from <span style="font-family:courier new;">/usr/ucbinclude</span> to <span style="font-family:courier new;">/usr/include</span> so that netatalk builds. Then edit <span style="font-family:courier new;">sys/Makefile.in</span>, removing 'solaris' from line 294 (to skip building the modules we don't really need), then save the file and you're ready to compile. Here I'm skipping the DDP bits that don't compile cleanly on OpenSolaris, and I'm giving PAM a miss because it's more work to set it up.<br /><ol style="font-family: courier new;"><li>./configure --disable-ddp --without-pam</li><li>make</li><li> pfexec make install</li></ol>Now comes the configuration stage. This setup suits my own needs, so if you want additional services then check out the netatalk <a href="http://netatalk.sourceforge.net/2.0/htmldocs/configuration.html">documentation</a> for more information. In general though, you will probably want to make similar changes to the default configuration, so I'll detail what I've done for my environment.<br /><ul><li>Edit <span style="font-family:courier new;">/usr/local/etc/netatalk/afpd.conf</span>, adding the following line at the end of the file (this sets up the encrypted password authentication method and tells clients not to save the password, although that seems to be ignored on OS X):</li></ul><blockquote style="font-family: courier new;">- -transall -uamlist uams_dhx.so -nosavepassword</blockquote><ul><li>Edit <span style="font-family:courier new;">/usr/local/etc/netatalk/netatalk.conf</span>, changing "yes" to "no" for the atalk and papd services (atalk is for pre-OSX systems, and papd is for printer sharing).</li></ul><ul><li>Edit <span style="font-family:courier new;">/usr/local/etc/netatalk/AppleVolumes.default</span>, adding the following (changing the default ~ line as well):</li></ul><blockquote style="font-family: courier new;">~ options:usedots,invisibledots,upriv perm:0770<br />/zeepool/shared "Shared" allow:@staff options:usedots,invisibledots,upriv perm:0770<br />/zeepool/nathan_backup "Nathan Backup" allow:nfiedler options:usedots,invisibledots,upriv perm:0770<br />/zeepool/antonia_backup "Antonia Backup" allow:akwok options:usedots,invisibledots,upriv perm:0770</blockquote>That's four lines of text above; the blog editor breaks the lines unfortunately. The usedots option tells netatalk to use dots instead of ":2e" for encoding dot files, while invisibledots says to make the dot files invisible by default. Now about the permissions issue alluded to above (see this <a href="http://groups.google.com/group/linux.debian.bugs.dist/browse_thread/thread/649e575e7f4094ce/7380c402ceca9eaf">discussion</a> for details). With Tiger, newly created files would be writable by others, but in Leopard the permissions are wacky, so the latest netatalk has a work around for that. Add the upriv option and perm:0770 to force the permissions for new files to allow others to read and write to them. After all, this is a shared volume, it's silly if no one else can access the files.<br /><br />With the configuration complete, you can start the netatalk services. I'm assuming that it's not running already, in which case you can just run this command: <span style="font-family:courier new;">pfexec /etc/init.d/atalk start</span><br /><br /><span style="font-size:130%;">Connecting and Permissions</span><br /><br />Now at this point you should be able to connect to the server from your Mac, using the <span style="font-style: italic;">Connect to Server</span> feature in <span>Finder</span> (you can use the <span style="font-family:courier new;">Cmd<span style="font-family:arial;">+</span>K</span> shortcut). Type in something like "afp://myserver" in the dialog, replacing myserver with the name of your server, and you will be prompted for a name and password. Use whatever you have for your user accounts on the OpenSolaris server. You could configure netatalk to use PAM, allowing authentication against LDAP or some other service, but for simplicity I just use the system accounts. Once you've authenticated, you will be prompted to select an available shared volume. It doesn't seem to matter which one you pick since the server will be added to the Finder sidebar, and from there you can browse to any of the shared volumes. As for accessing the files on the server, make sure the ownership and permissions are set up such that the user you connect as can read and write to those areas. For instance, the <span style="font-family:courier new;">nfiedler</span> user has read/write permission to <span style="font-family:courier new;">/zeepool/nathan_backup</span>, and that same user is a member of the <span style="font-family:courier new;">staff</span> group, and the <span style="font-family:courier new;">/zeepool/shared</span> area is owned by the <span style="font-family:courier new;">staff</span> group and is group writable. So far this seems to be working for us, but if you have better ideas then by all means please leave a comment.<br /><br />I can't take the credit for uncovering this information. In fact, this entry is just pulling together the different bits of information into a single, concise set of instructions. The <a href="http://darkdust.net/writings/opensolaris/compilingnetatalkonopensolaris">original blog</a> that I encountered was written by Marc Haisenko, and for step-by-step instructions on configuring netatalk on Linux, I found the <a href="http://www.kremalicious.com/2008/06/ubuntu-as-mac-file-server-and-time-machine-volume/">kremalicious</a> blog by Matthias Kretschmann.<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7593087619590047967-8954000336482080624?l=cafenate.blogspot.com'/></div>Nathan Fiedlerhttp://www.blogger.com/profile/18361875200770548606noreply@blogger.com0tag:blogger.com,1999:blog-7593087619590047967.post-80626073314054012722008-12-27T11:49:00.000-08:002008-12-27T13:45:13.592-08:00Burstsort for JavaShortly before working at Quantcast, I became interested in a sorting algorithm that I had not heard of before, called <a href="http://en.wikipedia.org/wiki/Burstsort">Burstsort</a>. I found it while browsing Wikipedia, reading about various methods of sorting. Burstsort, in case you haven't heard of it already, is very fast for large sets of strings, much faster than quicksort and its friends, including multikey quicksort and radixsort. It works by inserting the strings to be sorted into a shallow trie structure, where buckets are used to store the string references, to reduce memory usage. The buckets are "burst" when they exceed a certain size, and these buckets are sorted using a multikey quicksort. The structure is then traversed in order to retrieve the sorted strings. As a result, Burstsort is cache friendly and thus runs considerably faster than algorithms that are not cache-aware.<br /><br />Along with the original paper is a C implementation, but as far as I could tell, there was no Java implementation, at least not in open source. So, after reading all of the Burstsort papers several times, I finally started writing a Java implementation of the original algorithm. You can find the project on Google Code, at the <a href="http://code.google.com/p/burstsort4j/">burstsort4j</a> project page. The initial implementation is basically a rewrite of the original C code. After fixing a few bugs that I introduced during the rewrite, it appears to be working well and is indeed much faster than the other algorithms (quicksort and its multikey variant). Of course, I also rewrote those based on their C implementations, so it could be due to mistakes made on my part. Hopefully, since this is all open source now, others can evaluate the code and point out any mistakes I may have made.<br /><br />In the mean time, I'll be working on the newer algorithms, in particular the CP-burstsort and the "bucket redesign" Burstsort. The goal there is to reduce the memory usage, without trading off substantially from the run time.<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7593087619590047967-8062607331405401272?l=cafenate.blogspot.com'/></div>Nathan Fiedlerhttp://www.blogger.com/profile/18361875200770548606noreply@blogger.com4tag:blogger.com,1999:blog-7593087619590047967.post-46219250743853359612008-09-07T07:22:00.000-07:002008-09-07T08:01:57.662-07:00Finally a new product that shipsDuring my last 18 months at Sun, I have been working on a project whose name changed several times, and whose purpose changed almost as often. For the time being, it is just another source forge, but at least it "shipped". For all of the projects that I have worked on that were breaking new ground, I believe this is the first one to see the light of day. Another remarkable quality of my part in this project is that I worked on just one area, source code management. The project is called <a href="http://kenai.com/">Project Kenai</a>, and at the moment you need an invitation to create a new project. Given it's relative newness, it's the sanest way to grow the site and scale the infrastructure to meet demand.<br /><br />Let me give you a brief history of the project, from an insider's point of view. To start with, I was introduced to the newly assembled project as the fifth developer, to work with other developers poached from NetBeans-related products (Creator and Enterprise Pack). We learned we were building a web site, a developer collaboration site, it was eventually called. The plan was to build a prototype for the upcoming JavaOne conference, two months away. We were to learn a new language, Ruby, and a new framework, Rails. None of us had done anything quite like this before, and we going to do it in a very un-Sun-like manner. We were packed into a small room, told in rough terms what to build, and encouraged to use the "agile" development methodology. I use lower-case because we never really knew what Agile was, nor did we ever actually practice it.<br /><br />The prototype came and went; the demo at JavaOne was cancelled. We got Craig McClanahan and started the entire system from scratch. Instead of a single Rails app that basically did nothing more than scrape HTML from other services (e.g. Hudson, Mailman, WebSVN, MediaWiki), we were to build a set of RESTful web services, all in Rails, each implementing one particular aspect of the system. Because of my familiarity with Subversion, I took on the source code management portion of the project.<br /><br />Initially almost everything was written in Rails, except that we chose Sympa for the mailing list support, and being written in Perl, the mailing list web service was also written in Perl. My part was still Rails at that time, because Ruby support in Subversion was quite satisfactory. That changed, however, when we took on Mercurial support in addition to Subversion. My first attempt was to invoke the <span style="font-family: courier new;">hg</span> script from Ruby and capture the output. This led to dreadful performance and was very buggy. Of my own volition, and on my own time, I learned Python and Django and rewrote the SCM web service so I could properly support both Mercurial and Subversion.<br /><br />And that was basically the last bit of coding I did for this project, several months ago. Since then I've been fighting all of the integration issues and exploring options for improving the broken deployment process (at this time, it consists of a set of poorly written shell scripts, invoked by hand on each of the production systems, in a long and complicated process). Not developing new code or solving interesting problems, while debating with management about priorities (yes, releasing is good, but having an infrastructure to get you there is at least as important), is very disheartening. Ultimately, this is the reason I have left Sun. The project shipped on the very same day I left the company; it shipped and I shipped out.<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7593087619590047967-4621925074385335961?l=cafenate.blogspot.com'/></div>Nathan Fiedlerhttp://www.blogger.com/profile/18361875200770548606noreply@blogger.com0tag:blogger.com,1999:blog-7593087619590047967.post-80711832615286416412008-05-31T00:30:00.000-07:002008-05-31T01:13:05.096-07:00Analysis of Java implementations of Fibonacci HeapSome years ago, around 1997 or so, I wrote a Java implementation of the Fibonacci Heap data structure, as described in <a href="http://www.introductiontoalgorithms.com/">Introduction to Algorithms</a>, by Cormen, Leisersen, Rivest, and Stein. A data structure such as the Fibonacci Heap is useful in graphing applications, such as the one I spent some time working on, <a href="http://code.google.com/p/graphmaker/">GraphMaker</a>.<br /><br />Unfortunately, there were mistakes not only in my implementation, but in the pseudo-code published in the book. Due to the fact that my version was one of the first ever written in Java, and it was open source, it eventually spread to other open source software. A few months back, Jason Lenderman and John Sichi brought up an issue in the implementation via an email to me. In particular, John felt that the size of the degree array in the <span style="font-family:courier new;">consolidate()</span> method was too large. In fact, I had it set to the size of the heap, which meant the consolidate method had a running time of O(n). Oops, so much for the amortized O(log n) we were hoping for. After spending some time looking at other implementations, and studying the CLRS book, I realized that calculating the degree array size at all was a waste of time (n can never be greater than <code>Integer.MAX_VALUE</code>, and log base phi of that is 45). Terrific! The method was much faster now that it had an appropriately sized degree array.<br /><br />Not being satisfied that everything was working perfectly, I proceeded to write a unit test that would stretch the heap to its limits. Inserting a large numbers of random numbers, and <span style="font-weight: bold;">then</span> extracting the minimum value would cause a massive heap consolidation. This yielded a problem, and it was bewildering. I couldn't for the life of me see what was going wrong. Then my wife, Antonia, came to the rescue. At the time she was between jobs and had some time on her hands, so she took a look at it and found that the original pseudo-code in CLRS was missing two important steps. Elated, I submitted a bug report to Dr. Cormen and subsequently the fix has made its way into the example Java source code in an upcoming edition of the book. However, Antonia was not the first to realize there was a problem in the pseudo-code. It seems that Doug Cutting wrote a version of Fibonacci Heap in Java for the <a href="http://lucene.apache.org/nutch/">Apache Nutch</a> project, and it didn't have the problems that my wife had uncovered.<br /><br />Curious what other Java implementations looked like, I found several and have collected some notes on their implementations. In particular, I was looking at the consolidate operation, which is the only complex bit of code in a Fibonacci Heap, as everything else is fairly trivial. The "array" referred to below is the degree array used to keep track of the root nodes by their degree.<br /><ul><li><a href="http://www-verimag.imag.fr/%7Ecotton/">Scott Cotton</a><br /></li><ul><li>Calculates array size using binary search of lookup table<br /></li><li>Pre-fills array with nulls</li><li>Allocates an additional buffer for iterating root list -- can be very expensive</li><li>Rebuilds root list</li></ul><li><a href="http://www.cs.northwestern.edu/%7Eagupta/">Ashish Gupta</a><br /></li><ul><li>Calculates array size</li><li>Pre-fills array with nulls</li><li>Breaks when "w" or "nextW" is made a child</li><li>Rebuilds root list</li></ul><li><a href="http://lucene.apache.org/nutch/">Apache Nutch</a> (removed in Nov 2007)<br /></li><ul><li>Calculates array size</li><li>Involves a hash table which kills performance</li></ul><li><a href="http://www.introductiontoalgorithms.com/">CLRS</a></li><ul><li id="ccwm3">Calculates array size</li><li>Pre-fills array with nulls</li><li>Breaks when "nextW" is made a child<br /></li><li>Rebuilds root list</li></ul><li><a href="http://www.jgrapht.org/">John Sichi</a><br /></li><ul><li>Degree array is of size N<br /></li><li>Pre-fills array with nulls<br /></li><li>Counts number of root list elements (R)<br /></li><li>Iterates over root list R times, possibly wasting additional time</li><li>Does not handle the "w" and "nextW" issue<br /></li><li>Rebuilds root list<br /></li></ul></ul>It should be noted that all of the problems in Sichi's implementation are entirely my fault, as his code is a fork of my original implementation. Since our email discussion, Jason and John are aware of all of the bugs and their appropriate fixes.<br /><br />In the process of analyzing the various implementations, I learned a few things.<br /><ul><li>Allocate a fixed-size array for the degrees of the root nodes. At most it will be 45 entries to hold log base phi of <code>Integer.MAX_VALUE</code> elements, and it's cheaper to create that than to perform a series of floating point operations to arrive at a number that is slightly smaller than 45 (e.g. it's 23 to hold 50,000 elements).</li><li>Do not fill the degrees array with nulls -- all arrays in Java are automatically initialized to zero/false/null.</li><li id="v4i42">Do not waste time rebuilding the root list at the end of consolidate; the order of the root elements is of no consequence to the algorithm.<br /></li><li>Use the Cormen "splice in" technique in removeMin() to save significant time (see the Java implementation in the recent editions of the book, or the version in GraphMaker).</li></ul>My new implementation, as found in <a href="http://code.google.com/p/graphmaker/">GraphMaker</a>, has all of these improvements, so take a look if you're at all interested. In all, this was a fascinating exercise for me. I hope you had a chance to learn something, too.<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7593087619590047967-8071183261528641641?l=cafenate.blogspot.com'/></div>Nathan Fiedlerhttp://www.blogger.com/profile/18361875200770548606noreply@blogger.com11tag:blogger.com,1999:blog-7593087619590047967.post-71919008256470199822008-05-21T00:06:00.001-07:002008-05-21T00:14:48.029-07:00More Subversion vs. Mercurial metricsI wrote a while back about the disk usage of Subversion versus Mercurial, but all of my examples were rather small. Today I managed to create a migration of a Subversion repository, as complete and intact as possible (with trunk, tags, and branches directories just like in Subversion). The repository contains 3526 revisions and 28977 files, stored in Subversion 1.4.6 and migrated to Mercurial 1.0, using a hacked hg convert to keep the TTB structure intact.<br /><br />Keeping the entire revision history, the Subversion repository used 238 MB of disk, while the Mercurial repository (sans working copy) occupied 403 MB. Compare that with a "snapshot" repository (i.e. a repository with one revision) that consists of the latest version of all the files: Subversion 160 MB, Mercurial 229 MB. You can probably guess from these numbers that our repository has a lot of third-party code that we do not change frequently. Interestingly, in both cases, Subversion comes out ahead in terms of disk usage, by a significant margin.<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7593087619590047967-7191900825647019982?l=cafenate.blogspot.com'/></div>Nathan Fiedlerhttp://www.blogger.com/profile/18361875200770548606noreply@blogger.com0tag:blogger.com,1999:blog-7593087619590047967.post-45696933968946038742008-05-20T23:45:00.000-07:002008-05-21T00:03:42.173-07:00Daemonizing memcached on MacAs a follow-up to the previous entry on running Apache using the Mac OS X launchd service, here are the basic steps for daemonizing <a href="http://www.danga.com/memcached/">memcached</a>, the distributed memory-sensitive cache.<br /><ol><li>Start by installing <a href="http://www.monkey.org/%7Eprovos/libevent/">libevent</a>: <span style="font-family:courier new;">./configure</span>, <span style="font-family:courier new;">make</span>, <span style="font-family:courier new;">sudo make install</span><br /></li><li>Install memcached in the same way: <span style="font-family:courier new;">./configure</span>, <span style="font-family:courier new;">make</span>, <span style="font-family:courier new;">sudo make install</span></li><li>Create a <span style="font-family:courier new;">/Library/LaunchDaemons/com.danga.memcached.plist</span> file with the contents shown below.</li><li style="font-family: courier new;">sudo launchctl load /Library/LaunchDaemons/com.danga.memcached.plist</li></ol>At this point memcached will be running, listening on the default port (11211). Below is a working plist file for memcached when installed using the default options.<br /><br /><blockquote><span style="font-family:courier new;">&lt;?xml version="1.0" encoding="UTF-8"?&gt;<br />&lt;!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd"&gt;<br />&lt;plist version="1.0"&gt;<br />&lt;dict&gt;<br />&lt;key&gt;Label&lt;/key&gt;<br />&lt;string&gt;com.danga.memcached&lt;/string&gt;<br />&lt;key&gt;ProgramArguments&lt;/key&gt;<br />&lt;array&gt;<br /> &lt;string&gt;/usr/local/bin/memcached&lt;/string&gt;<br /> &lt;string&gt;-d&lt;/string&gt;<br /> &lt;string&gt;-u&lt;/string&gt;<br /> &lt;string&gt;root&lt;/string&gt;<br />&lt;/array&gt;<br />&lt;key&gt;RunAtLoad&lt;/key&gt;<br />&lt;true/&gt;<br />&lt;/dict&gt;<br />&lt;/plist&gt;</span><br /></blockquote><div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7593087619590047967-4569693396894603874?l=cafenate.blogspot.com'/></div>Nathan Fiedlerhttp://www.blogger.com/profile/18361875200770548606noreply@blogger.com0tag:blogger.com,1999:blog-7593087619590047967.post-6980991691232588012008-05-15T02:28:00.000-07:002008-05-21T00:02:52.811-07:00Starting Apache on Mac OS XFor the last year I've been using an ugly hack to have my custom compiled Apache started at bootup on my Mac OS X systems. Basically, and this is really bad, I would replace the <span style="font-family:courier new;">/usr/sbin/httpd</span> binary with a symbolic link to the version I compiled. Why is that bad, you might ask? Because if Apple ever updates their installed Apache, it will probably overwrite my version of <span style="font-family:courier new;">httpd</span>. Thanks to Google, I recently found a much better way, and wanted to document it as a blog entry all its own, rather than as a comment or sidenote to something else.<br /><br />Say you've compiled your version of Apache and it's now installed in the default location of <span style="font-family:courier new;">/usr/local/apache2</span>, and let's say you want it to be started at bootup, effectively replacing the version of Apache that comes with Mac OS X. Here are the simplest steps I know of to do this, and without using a single hack.<br /><ol><li>Before doing anything else, turn off the <span style="font-weight: bold;">Web Sharing</span> item in the Sharing panel of System Preferences, since in a few minutes, you'll be running your own Apache instance.</li><li>Create the file <span style="font-family:courier new;">/Library/LaunchDaemons/org.apache.httpd.plist</span> with the contents shown below, using your favorite editor (e.g. emacs, nano, vi).</li><li>Use the launchd service to manage the Apache instance by invoking the command: <span style="font-family:courier new;">sudo launchctl load -w /Library/LaunchDaemons/org.apache.httpd.plist</span></li></ol><span style="font-family:courier new;"></span>And that will do it. You now have a managed Apache instance of your own creation, and it should be running on the configured port (e.g. 80). Now just make a request and check the logs to make sure it's really working as you expected.<br /><br />Here's the plist file mentioned in the steps above:<br /><blockquote><span style="font-family:courier new;">&lt;?xml version="1.0" encoding="UTF-8"?&gt;<br />&lt;!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd"&gt;<br />&lt;plist version="1.0"&gt;<br /> &lt;dict&gt;<br /> &lt;key&gt;Label&lt;/key&gt;<br /> &lt;string&gt;org.apache.httpd&lt;/string&gt;<br /> &lt;key&gt;ProgramArguments&lt;/key&gt;<br /> &lt;array&gt;<br /> &lt;string&gt;/usr/local/apache2/bin/httpd&lt;/string&gt;<br /> &lt;string&gt;-k&lt;/string&gt;<br /> &lt;string&gt;start&lt;/string&gt;<br /> &lt;/array&gt;<br /> &lt;key&gt;RunAtLoad&lt;/key&gt;<br /> &lt;true/&gt;<br /> &lt;/dict&gt;<br />&lt;/plist&gt;</span></blockquote>If you ever want to turn off your version of Apache, you can invoke the <span style="font-family:courier new;">launchctl</span> command shown above, replacing "load" with "unload". In the mean time, you can manage the Apache instance using the <span style="font-family:courier new;">apachectl</span> command as usual (e.g. <span style="font-family:courier new;">/usr/local/apache2/bin/apachectl graceful</span>).<br /><span style="font-family:courier new;"></span><div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7593087619590047967-698099169123258801?l=cafenate.blogspot.com'/></div>Nathan Fiedlerhttp://www.blogger.com/profile/18361875200770548606noreply@blogger.com1tag:blogger.com,1999:blog-7593087619590047967.post-10519378977571446172008-04-11T00:34:00.000-07:002008-04-11T00:41:19.103-07:00What's the best programming language?The other day I came across an esoteric programming language called <a href="http://en.wikipedia.org/wiki/Brainfuck">brainfuck</a>. It gives new meaning to the term "esoteric". If you can write something practical in that, you're pretty damn hard-core. You also have entirely too much time on your hands. This made me wonder, "What is the worst programming language?", and I began collecting my thoughts based on my personal experience. But rather than focus on the negative, what about turning the question around and asking what is the <span style="font-style: italic;">best</span> programming language? To start with, let me explain my recent experiences with a handful of programming languages.<br /><br />In the past year at work, I've been involved in a project that involves mostly <a href="http://www.ruby-lang.org/">Ruby</a> on <a href="http://www.rubyonrails.org/">Rails</a> development. Coming up to speed with Ruby, after many years of <a href="http://java.sun.com/">Java</a> programming, was an interesting experience. Ruby is certainly easy to pick up. It's syntax is quite elegant, more expressive than Java, and closures are an appealing alternative to inner classes. However, after reading through <a href="http://glyphobet.net/blog/essay/228">glyphobet's analysis</a> of Ruby, I've come to appreciate how little I really learned. From that essay, and Gilad Bracha's rant on <a href="http://gbracha.blogspot.com/2008/03/monkey-patching.html">monkey patching</a>, and this very <a href="http://groups.google.com/group/comp.lang.python/msg/28422d707512283">insightful analysis</a> by Alex Martelli, I'm strongly in favor of using Python over Ruby.<br /><br />Getting back to my recent experiences, just when I felt comfortable with Ruby, it was time to shift gears and write an Apache module. Given that my C skills are pretty rusty, I opted to use <a href="http://perl.apache.org/">mod_perl</a> instead, as I had at least used <a href="http://www.perl.org/">Perl</a> a few times in the last decade. This, as it turned out, was not nearly as easy as I had hoped. It wasn't long before I decided that Perl was a time-wasting abomination, having evolved in a rather poor manner over a period of many years. I'm by no means a language expert, but if I get confused by a language, and waste entirely too much time on the lousy syntax, then there is a problem.<br /><br />As a reprieve, I had the opportunity to pick up <a href="http://www.python.org/">Python</a>. This came about because we wanted to integrate with a third party tool written in Python. Since Ruby can't call Python code, I initially tried invoking the tool's shell script wrapper, capturing the output and parsing it. Not surprisingly, performance was dreadful, and it was unstable as well. To solve the problem, I broke down and started learning Python, as well as <a href="http://www.djangoproject.com/">Django</a>, in an effort to write a new web service to handle integration with this third party tool. In a matter of weeks, I had a functional web service, complete with unit tests, and its performance was fantastic compared to the Rails service.<br /><br />In addition to these scripting languages, I had the chance to write some Java code (again, another integration piece). Naturally that went very quickly since I have a decade of basically nothing but Java experience. But it isn't just about experience. The tool support for Java is simply phenomenal -- code completion, error-checking, and flawless refactoring are amazing time savers. A little experience and great tools make Java a very productive language.<br /><br />So what is the best programming language? To make the selection a little easier, let's consider the popular languages, those in the top 20 on the <a href="http://www.tiobe.com/index.php/content/paperinfo/tpci/">TIOBE</a> list. Having written code in 11 of them myself, I can honestly say that Java, in my opinion, is the major contender. It's syntax is much easier to learn than most others, it has incredibly good tool support, and the fact that you can run it just about anywhere (enterprise, workstation, mobile phone, sensors, etc) makes it very versatile. Granted, Ruby and Python are fun to learn and have some advantages, they come at a significant cost that makes me appreciate the durability of Java.<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7593087619590047967-1051937897757144617?l=cafenate.blogspot.com'/></div>Nathan Fiedlerhttp://www.blogger.com/profile/18361875200770548606noreply@blogger.com1tag:blogger.com,1999:blog-7593087619590047967.post-16890812082665891832008-04-10T23:21:00.000-07:002008-04-11T00:02:40.829-07:00Hadoop Summit 2008: My Take (Part IV)Continuing on with the user testimonials, Steve Schlosser (of Intel?) spoke on using Hadoop to create graphs of ground motion models, using interpolated data generated via two different systems. A technique he had learned was to massage the key values returned from the Mapper to trick Hadoop into naturally gathering tuples for the reduce stage. He reminded everyone that while MapReduce sounds simple, it's actually a number of stages: InputFormat, Map, ManipulateKey, Bucket, Combine, Shuffle, Sort, Reduce, and finally OutputFormat. Their cluster, by the way, consisted of 50 8-core blades with 8GB memory and 300GB disk. That's a pretty hefty set of machines, but I think they had a lot of calculations to run through to create their pretty pictures.<br /><br />Mike Haley from Autodesk got a turn to tell how he uses Hadoop in an effort to simplify the cataloging and searching of building materials. At least I think that's what it was about. He spent a lot of time talking about building materials and just a couple of minutes on Hadoop, so I kinda missed the point of it. Still, it was interesting; it's not often you get a chance to take a peek into a completely different industry.<br /><br />To provide a perspective on Yahoo's use of Hadoop, Christian Kunz spoke on the WebMap change-over. He apparently was not slated to speak for this presentation, as the <a href="http://developer.yahoo.com/hadoop/summit/">schedule</a> shows Arnab Bhattacharjee as the speaker. Subsequently, the talk was brief and it was obvious Christian was not at all comfortable in front of a large audience. In short, Yahoo went from a set of in-house scripts and programs, developed over a number of years, to Hadoop and saw a lot of advantages and improvements. The talk was literally 10 minutes, so there wasn't much else to it.<br /><br />Next up was a guy from Google, who spoke briefly on the problem of hiring people who know anything about parallel computing, much less MapReduce. Well, yeah, duh. Most colleges aren't teaching distributed/parallel computing techniques, and except for Yahoo and Google, who uses MapReduce? He then introduced Jimmy Lin from the University of Maryland, who spoke on a new course being offered at the university in which students use Hadoop to solve interesting problems. At this point in the day, I was a bit tired and wasn't really getting much from these last presentations. Interesting, yes, but not remarkable in my opinion. But I liked the irony that Google has a hard time finding people who are ready to dive into M-R, and their solution is to use their competitors open-source clone of their very own infrastructure. Come to think of it, the students aren't just learning M-R, they are learning Hadoop, so it's just as likely they would get snatched up by Yahoo! upon graduation.<br /><br />To wrap up the exciting day, a panel of folks from Yahoo, Powerset, and Mahout spoke on the future direction of Hadoop. For instance, the core developers want to eventually have Hadoop auto-configure itself, but still have options for power-users to tweak the settings (my thought was of the Java VM, which has dozens of settings for fine tuning its behavior, but out of the box, it automatically adjusts itself to suit the environment). Another wish was to support Kerberos for protecting the HDFS data; right now anyone can read anything in the system, which is not good for privacy and isolation. Another comment was that building the community is a challenge. Hopefully as committers become more senior, contributors can be mentored into committers and help foster new involvement from others.<br /><br />Well, that was it, the whole Hadoop Summit in a nutshell. There was a happy hour but I had to get home to my family, so I gave it a miss. For additional reading material, check out these: <a href="http://blog.blist.com/index.php/2008/03/26/hadoop-summit-best-in-show/" rel="bookmark" title="Permanent Link: Hadoop Summit - Best in Show">Hadoop Summit - Best in Show</a>, <a href="http://www.csdhead.cs.cmu.edu/blog/2008/03/26/at-the-hadoop-summit/" rel="bookmark" title="Permalink: At the Hadoop Summit">At the Hadoop Summit</a>, <a href="http://mikaelronstrom.blogspot.com/2008/03/visited-hadoop-conference.html">Visited Hadoop Conference</a>, <a href="http://parand.com/say/index.php/2008/03/25/hadoop-summit-notes/">Hadoop Summit Notes</a>.<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7593087619590047967-1689081208266589183?l=cafenate.blogspot.com'/></div>Nathan Fiedlerhttp://www.blogger.com/profile/18361875200770548606noreply@blogger.com0tag:blogger.com,1999:blog-7593087619590047967.post-78632334174058241342008-04-09T01:39:00.000-07:002008-04-09T02:56:45.343-07:00Hadoop Summit 2008: My Take (Part III)After an exciting morning, we all had the chance to partake in a free lunch. I had the (mis)fortune of sitting next to a gentleman from CMU whose job was to travel the world and meet researchers to learn the latest in cool inventions. Probably there was more to it than that, but that was the part I understood. He and two other gentlemen my age that he had met earlier in the day pontificated on the future directions of technology and its effects on our lives, while I struggled to keep up and not look dumb. It's hard being a heads-down sort of techie and finding yourself thrust into a conversation with people whose job is to think big.<br /><br />After that minor ordeal, the technical presentations started up again with Michael Stack explaining the virtues of <a href="http://hadoop.apache.org/hbase/">HBase</a>. One of his early remarks was that HBase doesn't have any of that sissy RDBMS stuff. Well that's good, we wouldn't want to rile the DB folks, who already have a hard time <a href="http://www.databasecolumn.com/2008/01/mapreduce-a-major-step-back.html">grasping the obvious</a> (to be fair, they have a follow-up article that is at least a little bit better than their original post). One interesting revelation was that HBase ran significantly slower than the numbers published in the BigTable paper, as little as 20%. Speculation that HDFS was to blame was made. That Hadoop is still a little immature is not a surprise, and clearly performance is not yet a major concern.<br /><br />Next up was Bryan Duxbury of Rapleaf, who spoke of their use of HBase and Hadoop. Bryan is the one responsible for the Ruby support in HBase, by the way. He primarily discussed the performance of HBase. They are using a 64 node cluster, with a total of 2TB disk space and 64GB memory. After some experimentation, Bryan found that the compression built into HBase was a little slow, and that compressing the data in the client helped (less data going over the network, less processing for HBase when data is stored as-is).<br /><br />Speaking of databases, the next talk was by two developers at Facebook who are working on Hive. It sounded to me like another database implemented on top of HDFS, with yet another query language. The good news was it resembled SQL and supports streaming (to programs written in languages other than Java). As far as I can tell, the project is not open source, so this is of little interest to me. Sure, I like hearing about these projects, but as with the Microsoft presentation, it's all just "research" until you share your work with others to enable active discussion and collaboration.<br /><br />Jinesh Varia from Amazon then presented on their use of Hadoop in relation to Amazon Web Services. He primarily spoke of the architecture of EC2 and SQS. A couple of interesting points were that SQS uses message queues to deal with machine failure, and S3 is a bottleneck when used from EC2. No word on how they are planning to address that. I got the impression that Amazon uses their own infrastructure extensively, with basically everything running on EC2. Definitely an interesting talk by an energetic speaker.<br /><br />I'll finish up with this mini-series on the Hadoop Summit in my next post.<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7593087619590047967-7863233417405824134?l=cafenate.blogspot.com'/></div>Nathan Fiedlerhttp://www.blogger.com/profile/18361875200770548606noreply@blogger.com0tag:blogger.com,1999:blog-7593087619590047967.post-70415956328279674712008-03-29T01:03:00.000-07:002008-03-29T02:07:03.230-07:00Hadoop Summit 2008: My Take (Part II)After the initial introductory speakers, the conference quickly became very technical with a preso by Chris Olston of Yahoo! on <a href="http://incubator.apache.org/pig/">Pig</a>, the system for performing data analysis on large quantities of data distributed over a Hadoop cluster. It's basically the equivalent of <a href="http://labs.google.com/papers/sawzall.html">Sawzall</a>, if you're at all familiar with Google's technologies. One of the instigators for a system such as Pig is that many times developers are writing the same sorts of joins, sorts, and merges over and over again. Chris says there's even a mailing list at Yahoo for sharing M-R snippets for such tasks. Pig makes it very easy to describe what data you want, and what you want to do with it. It compiles your Pig Latin script into a set of M-R applications and runs them on the cluster. A point that Chris stressed is that Pig Latin is not a query language, but rather a data flow description. Because it has an imperative style, the order of actions is more well-defined than say SQL.<br /><br />I found Chris' talk so engrossing that I wanted to join Yahoo! just to work on Pig. He's a very good speaker and Pig strikes me as a very practical, and yet easily overlooked, part of the Hadoop system. For some reason that I have yet to understand, I fancy working on systems like this, the underlying infrastructure that most people (as in users, not developers) never see or give any thought to. Sure, working on GMail would be cool and all, but I'd rather develop something like Pig.<br /><br />Following that was a talk by Kevin Beyer of IBM, on JAQL, the JavaScript version of Pig. Well, that's not entirely fair, but I did get the sense they were very similar. Same sorts of basic operations, the difference being the syntax for JAQL is basically a cross between JavaScript and JSON. There may have been some advantages one way or the other, but they were subtle. In any case, both JAQL and Pig work closely with Hadoop and do essentially the same things.<br /><br />Next up was possibly the most "interesting" talk, by Michael Isard from Microsoft. Yeah, that Microsoft, the one that tried (and is still trying) to buy Yahoo! one way or another. He described his research project <a href="http://research.microsoft.com/research/sv/dryad/">Dryad</a> LINQ, or at least I think that's what he works on. His description of Dryad was that of a system that analyzes a graph and performs a set of tasks over a distributed system. The graph describes the tasks, the data flow, and dependencies. It's a very different approach than M-R, more general purpose. Naturally Dryad made a few trade-offs to improve overall performance. For instance, he believes in general Dryad performs very well, but it's failure handling is rather inefficient. Hmm, interesting approach. I believe Google made the realization that in a large enough cluster, you are <span style="font-style: italic;">always</span> going to have failures, so you had better deal with them gracefully and efficiently. And naturally these points came up during the Q&amp;A section, to which he responded that he needs to come up with some comparison numbers. When asked if he'd looked at Hadoop in terms of performance, he flat out said "no". Not a surprise there; frankly I'd be surprised if he even ran Hadoop once, let alone read any of the Google papers.<br /><br />It goes without saying (but I'm saying it anyway) that all of this is implemented in, and on top of, Microsoft technologies (e.g. .NET, Windows). And you can surely bet that because it's still in the research group, it will be a while before it sees the light of day, and it will most certainly not be open source in any reasonable way. One really funny part was some shill in the front row said "Well, I think judging by the reaction in the room, you're kicking everybody's butt, congratulations." Um, yeah, I don't think it was at all obvious that Dryad was better than Hadoop. They made certain choices and ended up with a very different system, with very different performance characteristics and features. It seemed to me that each node in Dryad was some arbitrary program, so they forfeited all of the advantages that M-R provides. Also, there's no distributed file system (his actual response to a question from the audience). Presumably everything is stored in an SQLServer instance. Like I said, I really don't see how that's better than Hadoop.<br /><br />For a pleasant change of pace, the next talk was about <a href="http://www.x-trace.net/">X-Trace</a>, given by Andy Konwinski from UC Berkeley. He and Matei Zaharia (presumably) created hooks in Hadoop to enable monitoring events in the system as they occur within the cluster. Andy had a very appealing self-deprecating style, and made a few jokes about pretty graphs and dumb programmer mistakes, which warmed up the room after the rather dry and strange talk about Dryad. For instance, he and his colleagues used X-Trace to identify a silly configuration mistake they had made in Hadoop. They had set up 30 map workers but left the default number of reducers to 1, which caused their sample job to run for hours longer than it should have. This became blindingly obvious when they rendered a few graphs to show what was going on in the cluster. Clearly if you're running into problems with Hadoop, X-Trace would be an excellent debugging tool. Andy gave another example in which a graph made very clear that one machine in particular was having disk problems, of a sort that impacted performance without necessarily taking the machine out of working order (and thus out of the cluster).<br /><br />For the last talk before lunch, Ben Reed presented on the <a href="http://zookeeper.sourceforge.net/">ZooKeeper</a> project, which is both a distributed lock manager of sorts, and a distributed file system for very small files (everything is kept in main memory). It's purpose is to facilitate configuration of the nodes in a cluster, enabling them to elect leaders and define membership, as well as serving as a name server. It's actually very similar to Google's <a href="http://labs.google.com/papers/chubby.html">Chubby</a> lock service, just written in Java. Everything is stored in memory for fast response times, with a disk-based log, I assume for reconstructing the data if the node is restarted. The ZooKeeper team found that a system consisting of about three to five nodes works best. Fewer and reliability goes down; more and performance becomes an issue as the leader tries to keep all of the servers up-to-date.<br /><br />Then there was lunch, which I'll continue with in the next installment.<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7593087619590047967-7041595632827967471?l=cafenate.blogspot.com'/></div>Nathan Fiedlerhttp://www.blogger.com/profile/18361875200770548606noreply@blogger.com0tag:blogger.com,1999:blog-7593087619590047967.post-34325291078857884932008-03-27T01:28:00.000-07:002008-03-29T02:04:12.304-07:00Hadoop Summit 2008: My Take (Part I)On Tuesday I attended the first Hadoop Summit held in the TechMart building in Sunnyvale. It was a terrific experience, much better than I had expected. It doesn't help that my expectations have been tempered by such events as the first NetBeans Day, wherein many of the people that showed up where only there to see James Gosling speak for the first few minutes, after which no more than ~50 people came through (and half of them were Sun employees). It probably didn't help that they were competing with JavaOne as well, which they have corrected since then.<br /><br />But getting back to the Hadoop Summit. Everything worked very well. Parking was easy, registration was a breeze, and the staff was friendly and accommodating. I asked politely about whether water would be supplied and they quickly chased down the TechMart folks to locate the bottles -- they were already in the auditorium waiting for us. Strike one for dumb attendee. The breakfast spread was a good as any conference I'd been to.<br /><br />Upon entering the auditorium, it was immediately evident this was an event for computer nerds. There were multiple WiFi routers sitting on a table on one side of the room, and power strips were on the floor beneath every row of chairs. Wow. Too bad the WiFi connections were flaky -- I only got an IP address for only a few minutes, then lost it. Someone came up and asked if the Internet connections were working for me, which gave me the sense I was not the only one suffering from bad connectivity. No matter, I was only going to take notes anyway.<br /><br />By the time Anjay was ready to open the presentations at 8:55, the room was nearly full. On the wall was a sign stating that the maximum occupancy was 299. I'd say it was a safe bet that about 300 people were crammed into the room, as there were folks standing along the walls.<br /><br />The first to speak was Doug Cutting, the guy who created Hadoop as a subproject of Lucene. His opening line was "Are you sure you are all in the right place, there's an awful lot of people here." He gave the history of nutch and how Hadoop got started. Getting the history of it all was fascinating, I always like hearing how projects get started. Another fascinating aspect of Hadoop is that it's been barely two years from an almost nothing subproject to a large project with a conference consisting of hundreds of attendees. I guess that means distributed computing is more appealing than Java IDEs. That or Yahoo does a better job organizing these things than Sun. In all fairness, subsequent NetBeans Days have grown exponentially, and are much, much, much better than the first one.<br /><br />Okay, back to the Hadoop Summit. Next up was Eric14 (some bizarre nickname because his high school friends couldn't pronounce Baldeschwieler), who explained how Hadoop is being used and developed within Yahoo. It's growing very fast and they are struggling to hire people who have any experience with distributed computing, let alone Hadoop in particular. He says they have tens of thousands of nodes (i.e. machines running in clusters in case you are not familiar with this overloaded term) and each machine typically has about 8 cores. Yeah, that's a lot of horse power, but Yahoo must feel they can make good use of it with Hadoop, which supports many MapReduce jobs running in parallel. I believe Google stated they use systems with 4 cores, each with two IDE disk drives.<br /><br />Eric was curious what the attendees were doing in terms of their Hadoop usage, and asked for an informal poll. There were many people, a little less than half, that had at least 20-node clusters, whereas only a few attendees were running clusters with more than 100 nodes. I was surprised, I didn't expect that many folks to be running such sizable clusters.<br /><br />There's lots more to say, so breaking this up into several parts makes sense to me. Look for at least three more entries, roughly along the lines of presentations given, with my impression on each of them.<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7593087619590047967-3432529107885788493?l=cafenate.blogspot.com'/></div>Nathan Fiedlerhttp://www.blogger.com/profile/18361875200770548606noreply@blogger.com0tag:blogger.com,1999:blog-7593087619590047967.post-26300926918584382892008-03-18T22:46:00.000-07:002008-03-19T00:31:28.858-07:00Geoworks: the inspirationFrom time to time I have thought about writing of my experiences at my first employer, Geoworks. It was a fun place to work, there were a lot of really <a href="http://www.osnews.com/story/1864">smart developers</a>, and I met my wife there. The company is gone now, although there's a <a href="http://www.geoworks.com/">website</a> holding on to the domain. Wondering if anyone else was writing about their Geoworks experiences, I came across a <a href="http://www.thebishop.net/geodog/archives/2003/01/08/geoworks_sic_transit_gloria_mundi.html">couple</a> <a href="http://steve-yegge.blogspot.com/2007/12/codes-worst-enemy.html">of</a> <a href="http://steve-yegge.blogspot.com/2006/04/psh-whatever.html">entries</a>. The last one is pretty amazing, a beautiful example of how I wish <span style="font-style: italic;">I</span> could write. I never met Dave, but I did get to work closely with Stevey (I just called him Steve) for two weeks in Bartlett, Tennessee in the summer of 1997. Like everyone else at Geoworks, when Dave died, I sent my condolences to Steve and Mike, who was also a friend of mine. At the time I wanted to say something meaningful, hoping to help them feel even just a little better. I knew it was impossible given I didn't know them very well, but it was what I wished for. Even now, more than 10 years later, I can't think of what to say. At this point, I'm just glad Steve didn't "off" himself. He is an inspiration in more ways than one. In addition to being a <a href="http://steve-yegge.blogspot.com/2007/08/how-to-make-funny-talk-title-without.html">thoughtful and entertaining writer</a>, he's also a <a href="http://blip.tv/file/319044/">talented speaker</a>.<br /><br />Reading Steve's blog has forced me to evaluate how I spend my time on the computer. He has a knack for putting what I've often taken for granted into an enlightening perspective. What's interesting to me is that I didn't discover his blog until <span style="font-style: italic;">after</span> I ventured outside of the Java realm. Working with nothing but Java for a decade, and not just <span style="font-style: italic;">Java</span> but Java <span style="font-style: italic;">tools</span>, left me so insulated from everything else going on the industry that I was completely unaware of <a href="http://rubyonrails.org/">Rails</a>, <a href="http://del.icio.us/">del.icio.us</a>, or the many clever people blogging on every conceivable topic, including Steve.<br /><br />Cheers to you, Steve, for surviving, thriving, and inspiring others to be more than they started out being. If you ever read this, please forgive my lamentable writing, but know that you are the single most remarkable person I ever met at Geoworks, and we only barely knew each other. Thanks.<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7593087619590047967-2630092691858438289?l=cafenate.blogspot.com'/></div>Nathan Fiedlerhttp://www.blogger.com/profile/18361875200770548606noreply@blogger.com2tag:blogger.com,1999:blog-7593087619590047967.post-42802881889460914932008-03-13T02:11:00.000-07:002008-03-13T23:16:47.445-07:00Converting Subversion to Mercurial updateOur development team at work is moving from a single Subversion repository to multiple Mercurial repositories. Being the SCM "expert" in the group, I was nominated to carry out the migration. My first instinct was to use what I know, which up until now had been <a href="http://cheeseshop.python.org/pypi/hgsvn">hgsvn</a>, which works fairly well, but lacked a few key features we needed. Fortunately, our team lead, <a href="http://blog.nicksieger.com/">Nick Sieger</a>, pointed out the <a href="http://www.selenic.com/mercurial/wiki/index.cgi/ConvertExtension">hg convert</a> extension, which appeared to be perfect.<br /><br />Armed with this enticing new option, which didn't exist the last time I had done a Subversion to Mercurial migration, I set about updating my Ultra 20 to the latest Solaris Express DE (build 79b of OpenSolaris). I was pleasantly surprised by the much-improved installer, and the fact that all of the tools and languages our team uses are pre-installed, and most are the latest versions. But I digress...<br /><br />To perform the migration, I had to implement a couple of work-arounds for bugs in that build of OpenSolaris. First was the fact that the Subversion language bindings are not in the default load path, and secondly, the neon library isn't <a href="http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6612347">loaded properly</a> for Python. I ended up with a command like this, to convert a single application within our source repository:<br /><blockquote><tt>LD_LIBRARY_PATH=/lib:/usr/lib:/usr/apache2/2.2/lib LD_PRELOAD=libneon.so hg convert -A ~/author_map.txt --filemap scm_map.txt file:///export/home/nfiedler/svn_mirror/trunk/scm</tt><br /></blockquote>With that, I was able to rename authors to match what we're using in our new infrastructure, and rename some of the poorly named directories in our source tree. The only missing piece is the svn:externals we were using to manage shared dependencies, but we solved that by using symbolic links. By the way, we won't shed a tear for losing the svn:externals feature, the most poorly implemented feature in all of Subversion.<br /><br />The best part of the migration process is that it's iterative, meaning I can run it again and again as each of the developers is ready to check in their pending work and switch over to Mercurial. I made the switch first to test the waters, and it's working very well. So far as I can tell, Mercurial is a lot faster than Subversion ever was for us, even when we were using an internal server. Granted, we were always using http over Apache, so perhaps using the svn protocol would have helped. We'll be implementing that in our system soon enough, but that's for another post.<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7593087619590047967-4280288188946091493?l=cafenate.blogspot.com'/></div>Nathan Fiedlerhttp://www.blogger.com/profile/18361875200770548606noreply@blogger.com0tag:blogger.com,1999:blog-7593087619590047967.post-7504426655648114722008-01-22T00:55:00.000-08:002008-01-23T02:02:29.064-08:00How I fit my life onlineFor several years I managed a web server for the domain bluemarsh.com, which I had purchased in 1999. This was primarily for the purpose of publishing my Java debugger and its source code. This grew to encompass a source code repository (<a href="http://subversion.tigris.org/">Subversion</a>), an issue tracker (<a href="http://www.bugzilla.org/">Bugzilla</a>), a wiki (<a href="http://www.mediawiki.org/">MediaWiki</a>), mailing lists (<a href="http://www.list.org/">Mailman</a>), and all of the infrastructure needed to manage that, including <a href="http://www.postfix.org/">Postfix</a>, <a href="http://dovecot.org/">Dovecot</a>, and <a href="http://www.openldap.org/">OpenLDAP</a>. There was also a photo gallery (<a href="http://gallery.menalto.com/">Gallery</a>) for sharing photos with friends and family, and a blog (<a href="http://typosphere.org/">Typo</a>), which eventually migrated to the site that you're looking at now. This was all well and good, and gave me some of the experience I needed when I was assigned to a new team at work, which involved my expertise in working with these open-source components. Then, things changed.<br /><br />The new project that I had been assigned to forced me to learn a new programming language, and a web application framework I had barely even heard of prior to that time. I'm not actually sure why, but this resulted in me losing interest in my Java debugger, enough that I decided to move it whole hog to <a href="http://sourceforge.net/">SourceForge</a>. Prior to that, only the downloads were hosted there, while everything else was on BlueMarsh. With the debugger moving away from home, it left an empty nest behind. It was then that I realized how much of my life was managed by the components running on the server. Not only was I using Bugzilla for tracking bugs in my debugger (there's a joke in there somewhere, but I'm not that nerdy), but I had been using it to track tasks to be done around the house (okay, I <span style="font-style: italic;">am</span> that nerdy). So what was wrong with that?<br /><br />Well, nothing per se, but can you guess how long it takes to upgrade such an installation? Bear in mind I was using Fedora Linux, and mostly using the pre-packaged software that it includes. Basically it takes about an hour for the bits to be laid on disk. But that's not the time consuming part, it's the configuration of all of this stuff that takes hours and hours. Especially things like Postfix, which has a myriad of options for controlling spam, takes me at least an hour to set up. Sure, that doesn't sound too bad, until you consider I don't usually have time like that. I've got a 2 year old, you know, and this is <span style="font-style: italic;">not</span> my day job. It was then that I decided to find ways to make everything on this server disappear.<br /><br />First there were the things I simply didn't need, they were just infrastructure. This included rsync, the mail servers, and the directory service. Once I could give up the server, these services simply ceased to exist. It was the other stuff that was more challenging to push online. For instance, I had a collection of cron job scripts that would send email to me at specific times. These were reminders for things like "take out the trash" and "backup my documents" -- all the sorts of things I'd forget otherwise. Lo and behold, there are at least two very good solutions to this: Yahoo! Calendar and Google Calendar. I went with the latter because it has a slick and intuitable interface, although the former is nice, too. This process of discovery continued for weeks. I found I could do everything on one web site or another. Oh, the joy I felt was immense. I could finally free myself of the burden of managing this machine. And it wasn't just one machine, I had adopted a second computer to be the "staging" machine. After a really nasty update one day, it became clear that I needed a better process for managing upgrades. So now I've got not just one, but two computers, for which I have to find new homes.<br /><br />Today I had a peculiar feeling: I had finally turned off the machines and I realized that I could no longer read the <a href="http://logwatch.org/">logwatch</a> emails that I had grown accustomed to receiving every day. What was I going to <span style="font-style: italic;">do</span> with myself if I couldn't look forward to managing my server? Then I remembered that there's plenty to do, and I've just freed up a chunk of time for me to do them. Which includes updating this blog more often.<br /><br />In case you're curious, I'm using <a href="http://mail.yahoo.com/">Yahoo!</a> for mail, <a href="http://www.google.com/calendar/">Google</a> for reminders, <a href="http://www.passpack.com/">PassPack</a> for password management, <a href="http://docs.google.com/">Google Docs</a> for keeping documents and notes, <a href="http://www.rememberthemilk.com/">RememberTheMilk</a> for tracking tasks, <a href="http://www.blogger.com/">Blogger</a> for blogging, <a href="http://tumblr.com/">Tumblr</a> for my tumble log, <a href="http://sourceforge.net/">SourceForge</a> for my coding projects, <a href="http://picasaweb.google.com/">Picasa</a> for sharing photos, and <a href="http://www.bingodisk.com/">BingoDisk</a> for online backups. Oh, and my favorite web site of all, <a href="http://del.icio.us/">del.icio.us</a>, for managing my bookmarks and helping me find fun reading material.<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7593087619590047967-750442665564811472?l=cafenate.blogspot.com'/></div>Nathan Fiedlerhttp://www.blogger.com/profile/18361875200770548606noreply@blogger.com0tag:blogger.com,1999:blog-7593087619590047967.post-52925233253980618892007-11-23T00:42:00.000-08:002007-11-23T01:08:46.971-08:00Foodie in the makingIf you have seen my <a href="http://cafenate.tumblr.com/">tumblelog</a> then you will know that I have been learning to bake bread at home. I was first inspired by this <a href="http://www.thesimpledollar.com/2007/11/04/homemade-bread-cheap-delicious-healthy-and-easier-than-you-think/">article</a> when it came up on <a href="http://del.icio.us/">del.icio.us</a> a few weeks ago. Just this week another article came up on del.icio.us, this time it was the <a href="http://smittenkitchen.com/2006/11/one-for-the-sling-files">no-knead bread</a> recipe. Naturally, I was intriqued. Not because the recipe is easier to do, but because the results were so much better than what you typically get at home. Well, after many hours of waiting, the results are indeed spectacular. The bread is delicious and moist, and the crust is crunchy but not tough. Truly remarkable, considering how little effort goes into the process. Now the only problem is I need to find a new use for the mixer we bought two weeks ago.<br /><br />In related news, I baked these <a href="http://wednesdaychef.typepad.com/the_wednesday_chef/2007/01/thomas_kellers_.html">chocolate bouchons</a>. I used the recipe from the children's cookbook, What's Cooking?, but the one on the web is exactly the same. To say that these fancy little brownies are rich is like saying the Sun is warm. I took my first batch into the office for my friends to try. Shannon, our resident foodie, liked them so much she ate two. Actually, she inhaled them. While the four of us were talking, they just disappeared. I'm assuming she didn't just stuff them into her backpack. But honestly, they are quite easy to eat, like potato chips -- you don't want to stop at just one. And of course, Baby <span style="font-style: italic;">really</span> likes them.<br /><br />This past week I ran out of the Doge Nero beans from Caffe del Doge, so I had to buy a bag of Peet's Espresso Forte. I used to think Peet's coffee was quite good, until I had the Doge Nero. The difference is astounding. The Espresso Forte is thin and just a little too bitter compared to the Dogo Nero, which is rich and full-bodied. And the Doge Nero produces a nice crema, although it is short-lived. Now if they would just get more beans then I could buy another bag, but they aren't expecting the next shipment until next week. Oh, the suffering I must endure.<br /><br />All of this made me realize something. I'm turning into a foodie. Even Shannon labeled me a "budding foodie" earlier this week. What that means, I'm not entirely sure, but I think it means we'll be spending more money on kitchen accessories, and spending more time using them. My wife says our interests are evolving. Curious, I hadn't really thought of it that way, but she's right.<br /><br />Well, I'm off to make another loaf of the no-knead bread. Most of it disappeared during the Thanksgiving dinner tonight, and I can't face another morning without this bread. If you like baking, and you have not yet tried this bread, I highly recommend doing so immediately. It's worth far more than the trivial effort you will put into it.<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7593087619590047967-5292523325398061889?l=cafenate.blogspot.com'/></div>Nathan Fiedlerhttp://www.blogger.com/profile/18361875200770548606noreply@blogger.com0tag:blogger.com,1999:blog-7593087619590047967.post-62132262215019745382007-10-07T17:14:00.000-07:002007-11-11T01:44:14.981-08:00What Real Friends Are<div><p>Thirty-four years is the time it took me to realize something that probably most others have known considerably earlier in their lives. I’m stupid, I’ll be the first to admit that. All signs indicate my nearly-two-year old daughter will be smarter than me by age 10—she can already say most letters of the alphabet, with just a little prompting. But my stupidity is not the subject of this entry. What I realized this past week is something I had not really ever thought about before. Something, that as an avid computer programmer, was not high on my list of priorities. Until now.</p> <p>Real friends. These are not the people that you greet casually at work in the morning, or that you occasionally talk to at the local watering hole. No, real friends are those people in your life that you remember forever. They’re the ones for which you actually commit brain cells to remembering their stories, their tribulations, and their triumphs. I have two such friends that fall into this classification, aside from my wife. To help me understand what makes these friendships special, I’ve put some thought into just what it is that makes them important to me.</p> <p>Loyalty. This is an important, though not necessarily the most important, quality of real friends. Loyalty is that special quality friends exhibit, that even when you’ve said something stupid, they still talk to you. Even if it means you just unintentionally insulted them. I especially appreciate this quality because I frequently make mistakes like this. It’s a particular problem for me because I’m white, male, and straight, so I often have the typical thought processes that go along with a white, straight male. What’s more, I grew up in rural Pennsylvania, no where near the massive melting pot of the big city. I rode the bus to school with dozens of kids that were just like me, and as a result I have the cultural diversity of a sea cucumber. That my friends are willing to put up with my occasionally idiotic statements is, in a word, precious.</p> <p>Discretion. This has more to do with my level of ability in this regard than that of my friends, though they retain this quality as well. Personally I don’t have anything too super-secret to hide from others, but rather I have a certain level of discretion when it comes to keeping the secrets of others. Both of my friends have shared things with me that I will never tell another person. Fortunately for me this is getting a little easier since all three of us are close friends, so we’re sharing more and more with each other all the time. This is good, because I’d rather use my brain cells for remembering what’s important than for what I need to keep private.</p> <p>Respect. Too often I am not respectful of others. I suppose it is a function of my cynicism and the jaded lens in front of my eyes. In general, I hate people. They are annoying, self-centered, and quite often stupid to the point of making me wonder how the human race got this far. I don’t mean my friends and family, but rather the nameless masses that I pass by each day. I can’t speak for others, but I am willing to bet this is not an uncommon trait among people living in a megalopolis like the Bay Area. What my friends have made me realize is that I want to be more loving and giving of myself. I want to make myself available to my friends and family in ways I had not considered before. I used to think spending more time in front of the computer screen was the best use of my resources, but now I realize the less time I spend sitting in this chair, the better.</p> <p>There are other qualities that real friends exemplify, but there’s something else I want to say first. Real friends bridge divides.</p> <p>Real friends bridge the racial divide. Does it matter that one of my friends is <a href="http://en.wikipedia.org/wiki/Hapa">hapa</a>? Not at all. Of course, why would it, I married a Chinese and have a beautiful hapa child. Sitting here at my desk I look out the front window and see that most of the kids walking by have formed groups along racial boundaries. In five years, I’m convinced those same friends will have gone their separate ways, never to see each other again. My friends and I are close because we like one another, not because we have suitable ethnic backgrounds. And therein lies one of the key elements of what I consider true friendship, blindness to all things superficial.</p> <p>Real friends bridge the gender divide. One of my friends happens to be a girl, and she happens to be attractive. While this makes sorting out my feelings a little more difficult, it too has brought me a new understanding that I otherwise would have lacked. Significant Others cannot do this to the same extent, as each of you is often trying to make the other happy by whatever reasonable means are necessary. Friends, however, do not have to make the same compromises. I appreciate the lessons I’ve learned from my girl friend, and I hope she’ll continue to help me have a better understanding of the fairer sex.</p> <p>Real friends bridge the orientation divide. Fifteen years ago I would have choked if you told me that one of my best friends was gay. Fifteen years later, one of my best friends is gay. How, or even when, that change came about I’ll never quite understand, but I have grown more in the last three years than in the last 15. This friend has been especially patient with me, as you can probably imagine, and for that I am extremely thankful.</p> <p>Jim. Shannon. There are no words that I know of that can express the impact you have had on my life. Thank you for putting up with me. Thank you for showing me what real friends are, and what’s important in life. For that, I will never forget you, no matter where life takes us in the years to come. For now, let’s keep having lunch at the usual time and place. Unless the food offerings are especially dismal, in which case we should go off-campus.</p></div><div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7593087619590047967-6213226221501974538?l=cafenate.blogspot.com'/></div>Nathan Fiedlerhttp://www.blogger.com/profile/18361875200770548606noreply@blogger.com0tag:blogger.com,1999:blog-7593087619590047967.post-72886952821712605732007-09-07T01:08:00.000-07:002007-10-24T22:36:35.655-07:00Follow-up to Perl not for dummies<div><p>So if I were to write that bit of Perl code I posted about 3 days ago in Ruby (using the additional functionality provided by Rails), it would look like this:<br /></p> <pre><code>hash = Hash.from_xml(res.content)<br />service_id = hash['service']['id']<br />hash['service']['activities'].each do |item|<br /> if item['name'] == activity<br /> activity_id = item['id']<br /> break<br /> end<br />end</code><br /></pre> <p>I’m fairly certain that code is an accurate translation of the Perl snippet, but without reproducing the same situation, I can’t guarantee it. The point is, it is representative of Ruby code, and its significantly more readable syntax.</p></div><div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7593087619590047967-7288695282171260573?l=cafenate.blogspot.com'/></div>Nathan Fiedlerhttp://www.blogger.com/profile/18361875200770548606noreply@blogger.com0tag:blogger.com,1999:blog-7593087619590047967.post-60496470471328927022007-09-04T01:03:00.000-07:002007-10-24T22:37:01.363-07:00Perl is not for dummies, I'm living proof<div><p>Let’s start with the following (ugly) Perl code snippet.<br /></p> <pre><code>my $xs = new XML::Simple(ForceArray => 1, KeyAttr => [ ]);<br />my $xml = $xs->XMLin($res->content);<br />$service_id = $xml->{service}->[0]->{id}->[0]->{content};<br />foreach my $item (@{$xml->{service}->[0]->{activities}->[0]->{activity}}) {<br />if ($item->{name}->[0] eq $activity) {<br /> $activity_id = $item->{id}->[0]->{content};<br /> last;<br />}<br />}</code><br /></pre> <p>Where to begin? Actually, I want to start with the fact that trying to understand what’s happened to Perl in the years between the time I first learned and enjoyed using it, and the present time, is like trying to understand a crazy person. At first they sort of make sense, and even seem familiar, but then they say something that doesn’t make sense. And you wonder, “What the hell was that?”, and because you have to know what was intended, you struggle, turning your brain inside-out to understand.</p> <p>This all started last week when I volunteered myself to write a Perl module for a set of access control, authentication, and authorization handlers for Apache (collectively known as <span class="caps">AAA</span>). At first this seemed to be easy enough, I had used Perl for several years off-and-on, so I felt comfortable with it. That, as I have come to realize, was long enough ago that the memory has been overwritten by Java and Ruby.</p> <p>Learning about <code>mod_perl</code> and how to write Perl handlers for Apache was the easy part, surprisingly enough. The painful bits were trying to understand why Perl is so stupid about <code>use</code> and methods that can’t be found just because I didn’t <code>use</code> some damn class. Ruby doesn’t have this problem—if the object has a method with a given name, you can invoke it. There’s no need to include, import, require, or “use” anything. The method is there whether you know it or not. Perl, it seems, has to be told that it’s okay to invoke the method. What’s up with that?</p> <p>The other major pain point was the <code>XML::Simple</code> library. It seems to return these objects that kinda look like hashes and lists, but they’re really something else. And since this is Perl, and not Ruby, I have to access them differently than I would if they had been hashes and arrays. Just iterating over an array of elements took me over an hour to figure out (the fourth line in the snippet above). Naturally, the answer was buried somewhere in the <span class="caps">FAQ</span>, but of course I had to bang my head on the problem for a while before I turned to the <span class="caps">FAQ</span>. Why doesn’t the documentation have any decent examples? Why can’t it just work like it’s supposed to? If this were Ruby, I’d have been done last Friday.</p> <p>I realize all of this is my fault. Perl is wonderful, it’s been around for ages, billions of people knock themselves out using it every day, so this is clearly a problem with me. I’m a dummy, I’ll be the first to admit that. My conclusion is, Perl is not for dummies.</p> <p>Of course, that implies that Ruby <em>is</em> for dummies. Uh, yeah… that’s not what I meant. Ruby fits me and my way of thinking much better than Perl. Now that I think about it, Ruby actually suits me better than Java, too. Now if only <code>mod_ruby</code> were as up-to-date as <code>mod_perl</code>, then I could rewrite everything I spent the last three days writing in the span of 15 minutes.</p></div><div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7593087619590047967-6049647047132892702?l=cafenate.blogspot.com'/></div>Nathan Fiedlerhttp://www.blogger.com/profile/18361875200770548606noreply@blogger.com0tag:blogger.com,1999:blog-7593087619590047967.post-52387294579510362012007-08-11T23:20:00.000-07:002007-10-21T08:21:21.051-07:00If my "fast" took on physical form...<div><p>A couple of months ago a small package arrived at the house. It was from some random person in Ohio, and otherwise had no indication what it was about. Naturally, being paranoid, I took the package to the back yard to open it, just in case it might explode. I’d rather blow up just myself than take out my family, too. Alas, it wasn’t nearly that dangerous. Instead, it appeared to be an odd little plastic toy. After opening the box and looking through the small booklet, I learned that it was my “fast”, as conceived by a fictitious scientist at Volkswagen. There’s a <span class="caps">DVD</span> in the VW showrooms that has a mockumentary about understanding people’s “fast”. The video is silly, but the booklet that accompanied the gremlin-looking toy is funny. At the back are a series of questions and answers regarding your “fast”. The best one goes like this:</p> <blockquote> <p>Q: My Fast is too fast.</p> </blockquote> <blockquote> <p>A: You are too old.</p> </blockquote> <p>The booklet also provides suggestions for positions within the car where your fast can sit. The problem, I would imagine, is that unless you afix it to a surface using an adhesive, it’s going to act as a projectile in any rapid acceleration/decceleration scenario. It’s heavier than you’d expect and hard as heck, so I’ve left my fast on a shelf at home.</p> <p>Anyway, the fast looks like a black ball, with hind legs like a dog, and stubby little paws, and his ears are folded back as if he’s moving at high speed. His eyes are beady, and his mouth is making a devilish sort of grin. The best part is, he comes with multiple tails; you choose which one he gets to wear by plugging it in to his butt. Mine’s got the pointy tail as that most resembles what I think a fast’s tail should look like.</p> <p>In case you were wondering, this little gimmick is related to the <span class="caps">VW GTI</span>. I imagine everyone that buys a new <span class="caps">GTI</span> gets one of these little guys. I wonder how long it will be before other car manufacturers do something similar. Imagine what Mitsubishi might send you for buying an Evo. Maybe a small booklet on how to talk your way out of a speeding ticket. I could have used one of those.</p></div><div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7593087619590047967-5238729457951036201?l=cafenate.blogspot.com'/></div>Nathan Fiedlerhttp://www.blogger.com/profile/18361875200770548606noreply@blogger.com1tag:blogger.com,1999:blog-7593087619590047967.post-11771727484577870662007-08-07T00:06:00.000-07:002007-10-21T08:22:35.561-07:00One month with the iPhone<div><p>It’s been over a month since I got my iPhone, and not a day has gone by where I didn’t appreciate it. Today was a great example of that. On my way home from work I encountered stopped traffic on the highway. It was noticable even before I reached the entrance ramp, so instead I turned around and took an alternate route. Thanks to the iPhone, I could pull up Google Maps and get a view of the area, with the traffic status showing as color coded lines over the major roadways. It clearly indicated that the traffic was barely moving for about two miles up the highway, then after that it was all green. So, I simply took surface streets for that distance and merged onto the highway. I almost certainly saved myself at least 15 minutes.</p> <p>It’s one thing for the iPhone to be able to do that sort of thing, as I’m sure it’s not the only smart phone that can. But, the iPhone makes it so mind numbingly easy to do the things that other phones make ridiculously annoying. Even my daughter, who’s not quite two years old, has figured out how to flick through the photo album, and as she demonstrated today, scroll around the map in Google Maps. Granted, she doesn’t know it’s a map, but she’s mastered the basic interface for the iPhone. I predict that by age 4 she’ll know how to use every feature of the device, including setting it up to connect to an <span class="caps">IMAP</span> server to retrieve mail.</p> <p>It’s been one month and I still love this phone. I surprised myself, in fact. I had assumed that by now I’d have found some major issue with it that would have me regretting its purchase. On the contrary, I’d gladly pay $500 for it again, without even thinking about it.</p></div><div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7593087619590047967-1177172748457787066?l=cafenate.blogspot.com'/></div>Nathan Fiedlerhttp://www.blogger.com/profile/18361875200770548606noreply@blogger.com0tag:blogger.com,1999:blog-7593087619590047967.post-89904057670752016902007-07-05T17:29:00.000-07:002007-10-21T08:23:38.295-07:00Strange street names<div><p>Near my parents house is a street named “2nd Place”. The only name worse than that would be “Last Place”, which is not a very likely street name, of course, but you could be led wonder when you see a name like Second Place. In Pennsylvania there is/was a street called “Gunpowder Lane” (or Street, or whatever). My Mom knew someone that lived there. They said the name used to be something reasonable, but some committee decided to change it. Needless to say, some of the homeowners were not too happy about that.</p> <p>Just past Newberg, on highway 99W, there is a road named “Veritas Lane”. Imagine, if you live on that road you must always tell the truth.</p> <p>I’m sure there are many more names, just as strange as these, but that’s all I can remember.</p> <p>By the way, I’m writing this from my iPhone, which is working remarkably well. I found that if the phone gets too hot, it freezes until its temperature lowers. It would be nice if it would warn you, though.</p> <p><strong>Update:</strong> the reason for my remarks regarding street names is that, as a developer, I have to think about names all the time. Every chunk of code that a developer writes has to be given some sort of name, and you don’t want to be the loser that names your functions “myfunction”, “function1”, or “doSomething”. So, I see these street signs and immediately pick apart the name, look at alternate meanings, and try to guess what the “developer” who named them was thinking.</p></div><div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7593087619590047967-8990405767075201690?l=cafenate.blogspot.com'/></div>Nathan Fiedlerhttp://www.blogger.com/profile/18361875200770548606noreply@blogger.com0tag:blogger.com,1999:blog-7593087619590047967.post-47472779763950925332007-07-05T08:04:00.000-07:002007-10-21T13:41:56.202-07:00Two men try to outsmart a rope<div><p>At my parent’s home in Oregon, there is a large walnut tree in the front yard. This part of Dundee was built where a walnut orchard had been, and several of the original trees are still there. Prior to their moving in a few years ago, a swing had been hung from their walnut tree. The chain was rusting and the seat had lichen growing on it, so I thought it would be worthwhile to replace it. Using a thick rope seemed more appealing than a chain, and a wooden seat would be more comfortable than the rubber one (with lichen).</p> <p>So we all set out to the local hardware store. On the Fourth of July the only place open is Lowe’s. Naturally my Dad and I have to traverse the entire length of the store before we can find the rope. Big hardware stores are nice in that they have almost everything, but finding what you want is a exercise in frustration. Anyway, we find some polypropylene rope and get a length of 25’, which should be more than we need. Right next to the rope are clamps and thimbles and the like, for connecting ropes together and making loops and such. Of course, being men that like to build things, we decide we need to use clamps to make the required loops in the rope.</p> <p>Back at home, I dismantle the old swing and Pop starts hanging the new one. He looped the rope over the branch, had a length of it hanging down, on which the seat would rest, and then looped back up and around the branch again. To hold the loops on the branch, he pinched the ends together with the metal clamps. We each took turns testing it out, and deemed the swing functional.</p> <p>Now Baby enters the picture. I sit on the swing, with Baby on my lap. We take one swing and <span class="caps">THUMP</span>! Thankfully my ass is soft enough to take most of the force of the impact, but my wrist hurt a little. Baby, of course, appears to be fine. The metal clamp had spread apart and the rope slid out, causing a critical failure in the swing system.</p> <p>So, back to the drawing board. It’s not long before I remember that there are these things that ancient people would tie using rope, and they called them “knots”. I had tied a few myself, in Boy Scouts. This was sufficiently long enough ago that I couldn’t remember anything but the names of the knots. Thanks to the Internet, I found a few good <a href="http://www.realknots.com/">knots</a>, and one in particular that would do the job very well, the Dutch Marine Bowline. After studying the graphic and trying to imagine it upside down so I could apply it to the swing situation, I managed to tie both ends of the rope around the tree branch. First came the one-man test, then the one-woman test, followed by the man-and-baby test. All tests passed with flying (or swinging) colors.</p> <p>In conclusion, two men, and some arguably unsuitable technology, could “knot” outwit the rope.</p></div><div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7593087619590047967-4747277976395092533?l=cafenate.blogspot.com'/></div>Nathan Fiedlerhttp://www.blogger.com/profile/18361875200770548606noreply@blogger.com0