<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss'><id>tag:blogger.com,1999:blog-36768584</id><updated>2009-11-24T22:20:22.701-05:00</updated><title type='text'>Omics! Omics!</title><subtitle type='html'>A computational biologist's personal views on new technologies &amp; publications on genomics &amp; proteomics and their impact on drug discovery</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://omicsomics.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/36768584/posts/default'/><link rel='alternate' type='text/html' href='http://omicsomics.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><link rel='next' type='application/atom+xml' href='http://www.blogger.com/feeds/36768584/posts/default?start-index=26&amp;max-results=25'/><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>319</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-36768584.post-7297635183990996990</id><published>2009-11-22T21:54:00.004-05:00</published><updated>2009-11-22T22:29:30.389-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='rare diseases'/><category scheme='http://www.blogger.com/atom/ns#' term='genome sequencing'/><title type='text'>Targeted Sequencing Bags a Diagnosis</title><content type='html'>A nice complement to the one paper (Ng et al) &lt;a href="http://omicsomics.blogspot.com/2009/11/targeted-sequencing-bags-rare-disease.html"&gt;I detailed last week&lt;/a&gt; is a paper that actually came out just before hand (Choi et al).  Whereas the Ng paper used whole exome targeted sequencing to find the mutation for a previously unexplained rare genetic disease, the Choi et al paper used a similar scheme (though with a different choice of targeting platform) to find a known mutation in a patient, thereby diagnosing the patient.&lt;br /&gt;&lt;br /&gt;The patient in question has a tightly interlocked pedigree (Figure 2), with two different consanguineous marriages shown.  Put another way, this person could trace 3 paths back to one set of great-great-grandparents.  Hence, they had quite a bit of DNA which was identical-by-descent, which meant that in these regions any low-frequency variant call could be safely ignored as noise.  A separate scan with a SNP chip was used to identify such regions independently of the sequencing.&lt;br /&gt;&lt;br /&gt;The patient was a 5 month old male, born prematurely at 30 weeks and with "failure to thrive and dehydration".  Two spontaneous abortions and a death of another premature sibling at day 4 also characterized this family; a litany of miserable suffering.  Due to imbalances in the standard blood chemistry (which, I wish the reviewers had insisted on further explanation for those of us who don't frequent that world), a kidney defect was suspected but other causes (such as infection) were not excluded.&lt;br /&gt;&lt;br /&gt;The exome capture was this time on the Nimblegen platform, followed by Illumina sequenicng.  This is not radically different from the Ng paper, which used Agilent capture and Illumina sequencing.  At the moment Illumina &amp; Agilent appear to be the only practical options for whole exome-scale capture, though there are many capture schemes published and quite a few available commercially. Lots of variants were found.  One that immediately grabbed attention was a novel missense mutation which was homozygous and in a known chloride transporter, SLC26A3.  This missense mutation  (D652N)targets a position which is almost utterly conserved across the family, and is making a significant change in side chain (acid group to polar non-charged).  Most importantly, SLC26A3 has already been shown to cause "congenital chloride-losing diarrhea" (CLD) when mutated in other positions.  Clinical follow-up confirmed that fluid loss was through the intestines and not the kidneys.&lt;br /&gt;&lt;br /&gt;One of the genetic diseases of the kidney that had been considered was Bartter syndrome, which the more precise blood chemistry did not match.  Given that one patient had been suspected of Bartter but instead had CLD, the group screened 39 more patients with Bartter but lacking mutations in 4 different genes linked to this syndrome.  5 of these patients had homozygous mutations in SLC26A3, 2 of which were novel.  190 control chromosomes were also sequenced; none had mutations.  3 of these patients had further follow-up &amp; confirmation of water loss through the gastrointestinal tract.&lt;br /&gt;&lt;br /&gt;This study again illustrates the utility of targeted sequencing for clinical diagnosis of difficult cases.  While a whole exome scan is currently in the neighborhood of $20K, more focused searches could be run far cheaper.  The challenge will be in designing economical panels which will allow scanning the most important genes at low cost and designing such panels well.  Presumably one could go through OMIM and find all diseases &amp; syndromes which alter electrolyte levels and known causative gene(s).  Such panels might be doable for perhaps as low as $1-5K per sample; too expensive for routine newborn screening but far better than a endless stream of tests.  Of course, such panels would miss novel genes or really odd presentations, so follow-up of negative results with whole exome sequencing might be required.  With &lt;a href="http://omicsomics.blogspot.com/2009/11/three-blows-against-tyranny-of.html"&gt;newer sequencing platforms available&lt;/a&gt;, the costs for this may plummet to a few hundred dollars per test, which is probably on par with what the current screening of newborns for inborn errors runs.  One impediment to commercial development in this field may well be the rapid evolution of platforms; companies may be hesitant that they will bet on a technology that will not last.&lt;br /&gt;&lt;br /&gt;Of course, to some degree the distinction between the two papers is artificial.  The Ng et al paper actually, as I noted, did diagnose some of their patients with known genetic disease.  Similarly, the patients in this study who are now negative for known Bartter syndrome genes and for CLD would be candidates for whole exome sequencing.  In the end, what matters is to make the right diagnosis for each patient so that the best treatment or supportive care can be selected.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="float: left; padding: 5px;"&gt;&lt;a href="http://www.researchblogging.org"&gt;&lt;img alt="ResearchBlogging.org" src="http://www.researchblogging.org/public/citation_icons/rb2_large_gray.png" style="border:0;"/&gt;&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Proceedings+of+the+National+Academy+of+Sciences+of+the+United+States+of+America&amp;rft_id=info%3Apmid%2F19861545&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=Genetic+diagnosis+by+whole+exome+capture+and+massively+parallel+DNA+sequencing.&amp;rft.issn=0027-8424&amp;rft.date=2009&amp;rft.volume=106&amp;rft.issue=45&amp;rft.spage=19096&amp;rft.epage=101&amp;rft.artnum=&amp;rft.au=Choi+M&amp;rft.au=Scholl+UI&amp;rft.au=Ji+W&amp;rft.au=Liu+T&amp;rft.au=Tikhonova+IR&amp;rft.au=Zumbo+P&amp;rft.au=Nayir+A&amp;rft.au=Bakkalo%C4%9Flu+A&amp;rft.au=Ozen+S&amp;rft.au=Sanjad+S&amp;rft.au=Nelson-Williams+C&amp;rft.au=Farhi+A&amp;rft.au=Mane+S&amp;rft.au=Lifton+RP&amp;rfe_dat=bpr3.included=1;bpr3.tags=Biology%2CHealth%2CGenetics+%2C+Medicine"&gt;Choi M, Scholl UI, Ji W, Liu T, Tikhonova IR, Zumbo P, Nayir A, Bakkaloğlu A, Ozen S, Sanjad S, Nelson-Williams C, Farhi A, Mane S, &amp; Lifton RP (2009). Genetic diagnosis by whole exome capture and massively parallel DNA sequencing. &lt;span style="font-style: italic;"&gt;Proceedings of the National Academy of Sciences of the United States of America, 106&lt;/span&gt; (45), 19096-101 PMID: &lt;a rev="review" href="http://www.ncbi.nlm.nih.gov/pubmed/19861545"&gt;19861545&lt;/a&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-7297635183990996990?l=omicsomics.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://omicsomics.blogspot.com/feeds/7297635183990996990/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=36768584&amp;postID=7297635183990996990' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/36768584/posts/default/7297635183990996990'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/36768584/posts/default/7297635183990996990'/><link rel='alternate' type='text/html' href='http://omicsomics.blogspot.com/2009/11/targeted-sequencing-bags-diagnosis.html' title='Targeted Sequencing Bags a Diagnosis'/><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='08368724497474381730'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-5697159460047247673</id><published>2009-11-19T23:17:00.003-05:00</published><updated>2009-11-20T00:06:16.177-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='biotech companies'/><category scheme='http://www.blogger.com/atom/ns#' term='genome sequencing'/><title type='text'>Three Blows Against the Tyranny of Expensive Experiments</title><content type='html'>Second generation sequencing is great, but one of it's major issues so far is that the cost of one experiment is quite steep.  Just looking at reagents, going from a ready-to-run library to sequence data is somewhere in the neighborhood of $10K-25K on 454, Illumina, Helicos or SOLiD (I'm willing to take corrections on these values, though they are based on reasonable intelligence).  While in theory you can split this cost over multiple experiments by barcoding, that can be very tricky to arrange.  Perhaps if core labs would start offering '1 lane of Illumina - Buy It Now!' on eBay the problem could be solved, but finding a spare lane isn't easy. &lt;br /&gt;&lt;br /&gt;This issue manifests itself in other ways.  If you are developing new protocols anywhere along the pipeline, your final assay is pretty expensive, making it challenging to work inexpensively.  I've heard rumors that even some of the instrument makers feel inhibited in process development.  It can also make folks a bit gun shy; Amanda heard first hand tonight from someone lamenting a project stymied under such circumstances.  Even for routine operations, the methods of QC are pretty inexact so far as they don't really test whether the library is any good, just whether some bulk property (size, PCRability, quantity) is within a spec.  This huge atomic cost also the huge barrier to utilization in a clinical setting; does the clinician really want to wait some indefinite amount of time until enough patient samples are queued to make the cost/sample reasonable?&lt;br /&gt;&lt;br /&gt;Recently, I've become aware of three hopeful developments on this front.  The first is the &lt;a href="http://omicsomics.blogspot.com/2009/10/pondering-polonators.html"&gt;Polonator&lt;/a&gt;, which according to Kevin McCarthy has a consumable cost of only about $500 per run (post library construction).  $500 isn't nothing to risk on a crazy idea, but it sure beats $10K.  There aren't many Polonators around, but for method development in areas such as targeted capture it would seem like a great choice.&lt;br /&gt;&lt;br /&gt;Today, another shoe fell.  Roche &lt;a href="http://www.roche.com/media/media_releases/med_dia_2009-11-19.htm"&gt;has announced a smaller version of the 454 system, the GS Junior&lt;/a&gt;.  While the instrument cost wasn't announced, it will supposedly generate 1/10th as much data (&lt;a href="http://www.gsjunior.com/instrument-workflow.php"&gt;35+Mb from 100Kreads with 400 Q20 bases&lt;/a&gt;) for the same cost per basepair, suggesting that the reagent cost for a run will be in the neighborhood of $2.5K.  Worse than what I described above, but rather intriguing.  This is a system that may have a good chance to start making clinical inroads; $2.5K is a bit steep for a diagnostic but not ridiculous -- or you simply need to multiplex fewer samples to get the cost per sample decent.  The machine is going to boast 400+bp reads, playing to the current comparative strength of the 454 chemistry.  The instrument cost wasn't mentioned.  While I doubt anyone would buy such a machine solely as an upfront QC for SOLiD or Illumina, with some clever custom primer design one probably could make libraries useable 454 plus one other platform.  &lt;br /&gt;&lt;br /&gt;It's an especially auspicious time for Roche to launch their baby 454, as Pacific Biosciences released some specs &lt;a href="http://www.genomeweb.com/sequencing/pacbio-reveals-commercial-specs-initial-focus-long-reads-short-runs-low-experime"&gt;through GenomeWeb's In Sequence&lt;/a&gt; and what I've been able to &lt;a href="http://www.clcngs.com/2009/11/3rd-gen-sequencing-company-pacbio-reveals-commercial-specs/"&gt;scrounge about&lt;/a&gt; (I can't quite talk myself into asking for a subscription) this is going to put some real pressure across the market, but particularly on 454.  The key specs I can find are a per run cost of $100 which will get you approximately 25K-30K reads of 1.5Kb each -- or around 45Mb of data.  It may also be possible to generate 2X the data for nearly the same cost; apparently the reagents packed with one cell are really good for two run in series.  Each cell takes 10-15 minutes to run (at least in some workflows) and the instrument can be loaded up with 96 of them to be handled serially.  This is a similar ballpark to what the GS Junior is being announced with, though with fewer reads but longer read lengths.  I haven't been able to find any error rate estimates or the instrument cost.  I'll assume, just because it is new and single molecule, that the error rate will give Roche some breathing room.  &lt;br /&gt;&lt;br /&gt;But in general, PacBio looks set to really grab the market where long reads, even noisy ones, are valuable.  One obvious use case is transcriptome sequencing to find alternative splice forms.  Another would be to provide 1.5Kb scaffolds for genome assembly; what I've found also suggests PacBio will offer a 'strobe sequencing' mode which is akin to Helicos' dark filling technology, which is a means to get widely spaced sequence islands.  This might provide scaffolding information in much larger fragments.  10Kb?  20Kb?  And again, though you probably wouldn't buy the machine just for this, at $100/run it looks like a great way to QC samples going into other systems. Imagine checking a library after initial construction, then after performing hybridization selection and then after another round of selection!  After all, the initial PacBio instrument won't be great for really deep sequencing.  It appears it would be $5K-10K to get approximately 1X coverage of a mammalian genome -- but likely with a high error rate.  &lt;br /&gt;&lt;br /&gt;With the ability to easily sequence 96 samples at a time (though it isn't clear what sample prep will entail) does have some interesting suggestions.  For example, one could do long survey sequencing of many bacterial species, with each well yielding 10X coverage of an E.coli-sized genome (a lot of bugs are this size or smaller).  The data might be really noisy, but for getting a general lay-of-the-land it could be quite useful -- perhaps the data would be too noisy to tell which genes were actually functional vs. decaying pseudogenes, but you would be able to ask "what is the upper bound on the number of genes of protein family X in genome Y".  if you really need high quality sequence, then a full run (or targeted sequencing) could follow.&lt;br /&gt;&lt;br /&gt;At $100 per experiment, the sagging Sanger market might take another hit.  If a quick sample prep to convert plasmids to usable form is released, then ridiculous oversampling (imagine 100K reads on a typical 1.5Kb insert in pUC scenario!) might overcome a high error rate.&lt;br /&gt;&lt;br /&gt;One interesting impediment which PacBio has acknowledged is that they won't be able to ramp up instrument production as quickly as they might like and will be trying to place (ration) instruments strategically.  I'm hoping at least one goes to a commercial service provider or a core lab willing to solicit outside business, but I'm not going to count on it.  &lt;br /&gt;&lt;br /&gt;Will Illumina &amp; Life Technologies (SOLiD) try to create baby sequencers?  Illumina does have a scheme to convert their array readers to sequencers, but from what I've seen these aren't expected to save much on reagents.  Life does own the VisiGen technology, which is apparently similar to PacBio's but hasn't yet published a real proof-of-concept paper -- at least that I could find; &lt;a href="http://www.freepatentsonline.com/7329492.html"&gt;their key patent&lt;/a&gt; has issued -- reading material for another night.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-5697159460047247673?l=omicsomics.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://omicsomics.blogspot.com/feeds/5697159460047247673/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=36768584&amp;postID=5697159460047247673' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/36768584/posts/default/5697159460047247673'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/36768584/posts/default/5697159460047247673'/><link rel='alternate' type='text/html' href='http://omicsomics.blogspot.com/2009/11/three-blows-against-tyranny-of.html' title='Three Blows Against the Tyranny of Expensive Experiments'/><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='08368724497474381730'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-4901757353874067114</id><published>2009-11-17T23:41:00.002-05:00</published><updated>2009-11-18T00:02:47.906-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='biotech companies'/><title type='text'>Decode -- Corpse or Phoenix?</title><content type='html'>The &lt;a href="http://scienceblogs.com/geneticfuture/2009/11/decode_genetics_finally_goes_u.php"&gt;news that Decode has filed for bankruptcy&lt;/a&gt; is a sad milestone in the history of genomics companies.  Thus falls either the final or penultimate human gene mapping companies, with everyone else having either disappeared entirely or exited that business.  A partial list would include Sequana, Mercator, Myriad, Collaborative Research/Genome Therapeutics, Genaera and (of course) Millennium.  I'm sure I'm missing some others.  The one possible survivor I can think about is &lt;a href="http://www.perlegen.com/"&gt;Perlegen&lt;/a&gt;, though their website is pretty bare bones, suggesting they have exited as well.&lt;br /&gt;&lt;br /&gt;The challenge all of these companies faced, and rarely beat, was how to convert mapping discoveries into a cash stream which could pay for all that mapping.  Myriad could be seen as the one success, having generated the controversial BRCA tests from their data, but (I believe) they no longer are actively looking.  In new tests are in-licensed from academics.&lt;br /&gt;&lt;br /&gt;Most other companies shed their genomics efforts as part of becoming product companies; the real money is in therapeutics.  Mapping turned out to be such a weak contributor to that value stream.  A major problem is that mapping information rarely led to a clear path to a therapeutic; too many targets nicely validated by genetics were complete head-scratchers as to how to create a therapeutic.  Not that folks didn't try; Decode even in-licensed a drug and acquired all the pieces for a full drug development capability.  &lt;br /&gt;&lt;br /&gt;Of course, perhaps Decode's greatest notoriety came from their deCodeMe DTC genetic testing business.  Given the competition &amp; controversy in this field, that was unlikely to save them.  The Icelandic financial collapse I think did them some serious damage as well.  That's a reminder that companies, regardless of how they are run, sometimes have their fate channeled by events far beyond their control.  A similar instance was the loss of Lion's CFO in the 9/11 attacks; he was soliciting investors at the WTC that day.  The 9/11 deflation of the stock market definitely crimped a lot of money-losing biotechs plans for further fund raising.&lt;br /&gt;&lt;br /&gt;Bankruptcies were once very rare for biotech, but quite a few have been announced recently.  The old strategy of selling off the company at fire sale prices seems to be less in style these days; assets are now being sold as part of the bankruptcy proceedings.  Apparently, &lt;a href="http://scienceblogs.com/geneticfuture/2009/11/details_on_the_future_of_the_d.php"&gt;this and perhaps other functions will continue&lt;/a&gt;.  Bankruptcy in this case is a way of shedding incurred obligations viewed as nuisances; anyone betting on another strategy by buying the stock is out of luck.&lt;br /&gt;&lt;br /&gt;Personally, I wish that the genetic database and biobanks which deCode have created could be transferred to an appropriate non-profit such as the Sanger.  I doubt much of that data will ever be convertable into cash, particularly at the scale most investors are looking for.  But a non-profit could extract the useful information and get it published, which was deCode's forte but I doubt they've mined everything that can be mined.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-4901757353874067114?l=omicsomics.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://omicsomics.blogspot.com/feeds/4901757353874067114/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=36768584&amp;postID=4901757353874067114' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/36768584/posts/default/4901757353874067114'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/36768584/posts/default/4901757353874067114'/><link rel='alternate' type='text/html' href='http://omicsomics.blogspot.com/2009/11/decode-corpse-or-phoenix.html' title='Decode -- Corpse or Phoenix?'/><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='08368724497474381730'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-5654189444672563048</id><published>2009-11-15T23:29:00.003-05:00</published><updated>2009-11-16T00:23:11.555-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='rare diseases'/><category scheme='http://www.blogger.com/atom/ns#' term='genome sequencing'/><title type='text'>Targeted Sequencing Bags a Rare Disease</title><content type='html'>Nature Genetics on Friday released the paper from Jay Shendure, Debra Nickerson and colleagues which used targeted sequencing to identify the damaged gene in a rare Mendelian disorder, Miller syndrome.  The work had been presented at least in part at recent meetings, but now all of us can digest it in entirety.&lt;br /&gt;&lt;br /&gt;The impressive economy of this paper is that they targeted (using Agilent chips) less than 30Mb of the human genome, which is less than 1%.  They also worked with very few samples; only about 30 cases of Miller Syndrome have been reported in the literature.  While I've &lt;a href="http://omicsomics.blogspot.com/2009/10/why-im-not-crazy-about-term-exome.html"&gt;expressed some reservations about "exome sequencing"&lt;/a&gt;, this paper does illustrate why it can be very cost effective and my objections (perhaps not made clear enough before) is more a worry about being too restricted to "exomes" and less about targeting.  &lt;br /&gt;&lt;br /&gt;Only four affected individuals (two siblings and two individuals unrelated to anyone else in the study) were sequenced, each at around 40X coverage of the targeted regions.  Since Miller is so vanishingly rare, the causative mutations should be absent from samples of human diversity such as dbSNP or the HapMap, so these was used as a filter.  Non-synonymous (protein-altering), splice site mutations &amp; coding indels were considered as candidates.  Both dominant models and recessive models were considered.  Combining the data from both siblings, 228 candidate dominant genes and 9 recessive ones fell out.  Looking then to the unrelated individuals zeroed in on a single gene, DHODH, under the recessive model (but 8 in the dominant model).  Using a conservative statistical model, the odds of finding this by chance were estimated at 1.5x10e-05.&lt;br /&gt;&lt;br /&gt;An interesting curve was thrown by nature.  If predictions were made as to whether mutations would be damaging, then DHODH was excluded as a candidate gene under a recessive model.  Both siblings carried one allele (G605A) predicted to be neutral but another allele predicted to be damaging.&lt;br /&gt;&lt;br /&gt;Another interesting curve is a second gene, DNAH5, which was a candidate considering only the siblings' data but ruled out by the other two individuals' data.  However, this gene is already known to be linked to a Mendelian disorder.  The two siblings had a number of symptoms which do not fit with any other Miller case -- and well fit the symptoms of DNAH5 mutation.  So these two individuals have two rare genetic diseases!&lt;br /&gt;&lt;br /&gt;Getting back to DHODH, is it the culprit in Miller?  Sequencing three further unrelated patients found them all to be compound heterzygotes for mutations predicted to be damaging.  So it becomes reasonable to infer that a false prediction of non-damaging was made for G605A.  Sequencing of DHODH in parents of the affected individuals confirmed that each was a carrier, ruling out DHODH as a causative gene under a dominant model.  &lt;br /&gt;&lt;br /&gt;DHODH is known to encode dihydroorotate dehydrogenase, which catalyzes a biochemical step in the de novo synthesis of pyrimidines.  This is a pathway targeted in some cancer chemotherapies, with the unfortunate result that some individuals are exposed to these drugs in utero -- and these persons manifest symptoms similar to Miller syndrome.  Furthermore, another genetic disease (Nagler) has great overlap in symptoms with Miller -- but sequencing of DHODH in 12 unrelated patients failed to find any coding mutations in DHODH.   &lt;br /&gt;&lt;br /&gt;The authors point to the possible impact of this approach. They note that there are 7,000 diseases which affect fewer than 200K patients in the U.S. (a widely used definition of rare disease), but in aggregate this is more than 25M persons.  Identifying the underlying mutations for a large fraction of these diseases would advance our understanding of human biology greatly, and with a bit of luck some of these mutations will suggest practical therapeutic or dietary approaches which can ameliorate the disease.  &lt;br /&gt;&lt;br /&gt;Despite the success here, they also underline opportunities for improvement.  First, in some cases variant calling was difficult due to poor coverage in repeated regions.  Conversely, some copy number variation manifested itself in false positive calls of variation.  Second, the SNP databases for filtering will be most useful if they are derived from similar populations; if studying patients with a background poorly represented in dbSNP or HapMap then those databases won't do.  &lt;br /&gt;&lt;br /&gt;How economical a strategy would this be?  Whole exome sequencing on this scale can be purchased for a bit under $20K/individual; to try to do this by Sanger would probably be at least 25X that.  So whole exome sequencing of the 4 original individuals would be less than $100K for sequencing (but clearly a bunch more for interpretation, sample collection, etc).  The follow-up sequencing would a add a bit, but probably less than one exome's worth of sequencing.  Even if a study turned up a lot of candidate variants, smaller scale targeted sequencing can be had for $5K or less per sample.  Digging into the methods, the study actually used two passes of array capture -- the second to clean up what wasn't captured well by the first array design &amp; to add newer gene predictions.  This is a great opportunity to learn from these projects -- the array designs can keep being refined to provide even coverage across the targeted genes.  And, of course, as the cost per base of the sequencing portion continues its downwards slide this will get even more attractive -- or possibly simply be displaced by really cheap whole genome sequencing.  If the cost of the exome sequencing can be approximately halved, then perhaps a project similar to this could be run for around $100K.&lt;br /&gt;&lt;br /&gt;So, if 700 diseases could each be examined at 100K/disease, that would come out to $70M -- hardly chump change.  This underlines the huge utility of getting sequencing costs down another order of magnitude.  At $1000/genome, the sequencing costs of the project would stop grossly overshadowing the other key areas - sample collection &amp; data interpretation.  If the total cost of such a project could be brought down closer to $20K, then now we're looking at $14M to investigate all described rare genetic disorders.  That's not to say it shouldn't be done at $70M or even several times that, but ideally some of the money saved by cheaper sequencing could go to elucidating the biology of the causative alleles such a campaign would unearth, because certainly many of them will be much more enigmatic than DHODH.&lt;br /&gt;&lt;br /&gt;&lt;span style="float: left; padding: 5px;"&gt;&lt;a href="http://www.researchblogging.org"&gt;&lt;img alt="ResearchBlogging.org" src="http://www.researchblogging.org/public/citation_icons/rb2_large_gray.png" style="border:0;"/&gt;&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Nature+genetics&amp;rft_id=info%3A%2Fdoi%3A10.1038%2Fng.499&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=Exome+sequencing+identifies+the+cause+of+a+mendelian+disorder&amp;rft.issn=&amp;rft.date=2009&amp;rft.volume=&amp;rft.issue=&amp;rft.spage=&amp;rft.epage=&amp;rft.artnum=http%3A%2F%2Fwww.nature.com%2Fng%2Fjournal%2Fvaop%2Fncurrent%2Fabs%2Fng.499.html&amp;rft.au=Sarah+B.+Ng&amp;rft.au=Kati+J.+Buckingham&amp;rft.au=Choli+Lee&amp;rft.au=Abigail+W.+Bigham&amp;rft.au=Holly+K.+Tabor&amp;rft.au=Karin+M.+Dent&amp;rft.au=Chad+D.+Huff&amp;rft.au=Paul+T.+Shannon&amp;rft.au=Ethylin+Wang+Jabs&amp;rft.au=Deborah+A.+Nickerson&amp;rft.au=Jay+Shendure&amp;rft.au=Michael+J.+Bamshad&amp;rfe_dat=bpr3.included=1;bpr3.tags=Biology%2CClinical+Research%2CComputational+Biology%2C+Genetics+%2C+Genetics%2C+Metabolism%2C+Pathology"&gt;Sarah B. Ng, Kati J. Buckingham, Choli Lee, Abigail W. Bigham, Holly K. Tabor, Karin M. Dent, Chad D. Huff, Paul T. Shannon, Ethylin Wang Jabs, Deborah A. Nickerson, Jay Shendure, &amp; Michael J. Bamshad (2009). Exome sequencing identifies the cause of a mendelian disorder &lt;span style="font-style: italic;"&gt;Nature genetics&lt;/span&gt; : &lt;a rev="review" href="doi:10.1038/ng.499"&gt;doi:10.1038/ng.499&lt;/a&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-5654189444672563048?l=omicsomics.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://omicsomics.blogspot.com/feeds/5654189444672563048/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=36768584&amp;postID=5654189444672563048' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/36768584/posts/default/5654189444672563048'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/36768584/posts/default/5654189444672563048'/><link rel='alternate' type='text/html' href='http://omicsomics.blogspot.com/2009/11/targeted-sequencing-bags-rare-disease.html' title='Targeted Sequencing Bags a Rare Disease'/><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='08368724497474381730'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-2451449178883239716</id><published>2009-11-12T22:44:00.004-05:00</published><updated>2009-11-12T23:15:13.904-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='genome sequencing'/><category scheme='http://www.blogger.com/atom/ns#' term='dogs'/><title type='text'>A 10,201 Genomes Project</title><content type='html'>With valuable information emerging from the &lt;a href="http://www.politigenomics.com/2009/05/1000-genomes-phase-change.html"&gt;1000 (human) genomes project&lt;/a&gt; and now a proposal for a &lt;a href="http://www.genomeweb.com/sequencing/consortium-prepares-sequence-10000-vertebrate-genomes-once-cost-comes-down"&gt;10,000 vertebrate genome project&lt;/a&gt;, it's well past time to expose to public scrutiny a project I've been spitballing for a while, which I now dub the 10,201 genomes project.  Why that?  Well, first it's a bigger number than the others.  Second, it's 101 squared.&lt;br /&gt;&lt;br /&gt;Okay, perhaps my faithful assistant is swaying me, but I still think it's a useful concept, even if for the time being it must remain a &lt;i&gt;gehunden&lt;/i&gt; experiment.  All kidding aside, the goal would be to sequence the full breadth of caninity with the prime focus on elucidating the genetic machinery of mammalian morphology.  In my biological world, that would be more than enough to justify such a project once the price tag comes down to a few million.  With some judicious choices, some fascinating genetic influences on complex behaviors might also emerge.  And yes, there is a possibility of some of this feeding back to useful medical advances, though one should be honest to say that this is likely to be a long and winding road.  It really devalues saying something will impact medicine when we claim every project will do so.&lt;br /&gt;&lt;br /&gt;The general concept would be to collect samples from multiple individuals of every known dog breed, paying attention to important variation within breed standards.  It would also be valuable to collect well-annotated samples from individuals who are not purebred but exhibit interesting morphology.  For example, I've met a number of "labradoodles" (Labrador retriever x poodle) and they exhibit a wide range of sizes, coat colors and other characteristics -- precisely the fodder for such an experiment.  In a similar manner, it is said that the same breed from geographically distant breeders may be quite distinct, so it would be valuable to collect individuals from far-and-wide.  But going beyond domesticated dogs, it would be useful to sequence all the wild species as well.  With genomes at $1K a run, this would make good sense.   Of particular interest for a non-dog genome is the &lt;a href="http://www.genome.org/cgi/pmidlookup?view=long&amp;pmid=17284676"&gt;case of lines of foxes&lt;/a&gt;. which have been bred over just a half century into a very docile line and a second selected for aggressive tendencies.&lt;br /&gt;&lt;br /&gt;What realistically could we expect to find?  One would expect a novel gene, &lt;a href="http://www.sciencemag.org/cgi/content/abstract/sci;325/5943/995?maxtoshow=&amp;HITS=10&amp;hits=10&amp;RESULTFORMAT=&amp;fulltext=shih+tzu&amp;searchid=1&amp;FIRSTINDEX=0&amp;resourcetype=HWCIT"&gt;as is the case with short legged breeds&lt;/a&gt;, to leap out.  Presumably regions which have undergone selective sweeps would be spottable as well and linkable to traits.  A wealth of high-resolution copy number information would certainly emerge.&lt;br /&gt;&lt;br /&gt;Is it worth funding?  Well, I'm obviously biased.  But already the 10,000 vertebrate genome has &lt;a href="http://www.nature.com/news/2009/091104/full/462021a.html"&gt;kicked up some dust&lt;/a&gt; from some who are disappointed that the genomics community has not had "an inordinate fondness for beetles" (only one sequenced so far).  Genome sequencing is going to get much cheaper, but never "too cheap to meter".  De novo projects will always be inherently more expensive due to more extensive informatics requirements -- the first annotation of the genome is highly valuable but requires extensive effort.  I too am disappointed that  greater sampling of arthropods hasn't been sequenced -- and it's hard to imagine folks in the evo-devo world being fond of this point either.  &lt;br /&gt;&lt;br /&gt;It's hard for me to argue against sequencing thousands of human germlines to uncover valuable medical information or to sequence tens of thousands of somatic cancer genomes for the same reason.  But, even so I'd hate to see that push out funding for filling in more information about the tree of life.  Still, do we really need 10,000 vertebrate genomes in the near future or 10,201 dog genomes?  If the trade for doing only 5,000 additional vertebrates is doing 5,000 diverse invertebrates, I think that is hard to argue against.  Depth vs. breadth will always be a challenging call, but perhaps breadth should be favored a bit more -- at least once I'm funded for my ultra-deep project!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-2451449178883239716?l=omicsomics.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://omicsomics.blogspot.com/feeds/2451449178883239716/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=36768584&amp;postID=2451449178883239716' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/36768584/posts/default/2451449178883239716'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/36768584/posts/default/2451449178883239716'/><link rel='alternate' type='text/html' href='http://omicsomics.blogspot.com/2009/11/10201-genomes-project.html' title='A 10,201 Genomes Project'/><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='08368724497474381730'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-482090515875695878</id><published>2009-11-11T23:50:00.004-05:00</published><updated>2009-11-12T00:10:26.419-05:00</updated><title type='text'>A call for new technological minds for the genome sequencing instrument fields</title><content type='html'>There's a &lt;a href="http://www.nature.com/nbt/journal/v27/n11/abs/nbt.1585.html"&gt;great article&lt;/a&gt; in the current Nature Biotechnology (alas, you'll need a subscription to read the full text) titled "The challenges of sequencing by synthesis" as this post detailing the challenges around the current crop of sequencing-by-synthesis instruments.  The paper was written by a number of the PIs on grants for $1K genome technology.&lt;br /&gt;&lt;br /&gt;While there is one short section on the problem of sample preparation, the heart of the paper can be found in the other headings: &lt;br /&gt;&lt;ul&gt;&lt;br /&gt;&lt;li&gt;surface chemistry&lt;/li&gt;&lt;li&gt;fluorescent labels&lt;/li&gt;&lt;li&gt;the enzyme-substrate system&lt;/li&gt;&lt;li&gt;optics&lt;/li&gt;&lt;li&gt;throughput versus accuracy&lt;/li&gt;&lt;li&gt;read-length and phasing limitations&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;Each section is tightly written and well-balanced, with no obvious playing of favorites or bashing of anti-favorites present.  Trade-offs are explored &amp; the dreaded term (at least amongst scientists) "cost models" shows up; indeed there is more than a little bit of a nod to accounting -- but if sequencing is really going to be $1K/person on an ongoing basis the beans must be counted correctly!&lt;br /&gt;&lt;br /&gt;I won't try to summarize much in detail; it really is hard to distill such a concentrated draught any further.  Most of the ideas presented as possible solutions can be viewed as evolutionary relative to the current platforms, though a few exotic concepts are floated as well (such as &lt;a href="http://graphics.stanford.edu/papers/confocal/"&gt;synthetic aperture optics&lt;/a&gt;.  It is noteworthy that an explicit goal of the paper is to summarize the problem areas so that new minds can approach the problem; as implied by the section title list above this is clearly a multi-discipline problem.  It does somewhat suggest the question whether Nature Biotechnology, a journal I am quite fond of, was the best place for this.  If new minds are desired, perhaps Physical Review Letters would have been better.  But that's a very minor quibble.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="float: left; padding: 5px;"&gt;&lt;a href="http://www.researchblogging.org"&gt;&lt;img alt="ResearchBlogging.org" src="http://www.researchblogging.org/public/citation_icons/rb2_large_gray.png" style="border:0;"/&gt;&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Nature+biotechnology&amp;rft_id=info%3Apmid%2F19898456&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=The+challenges+of+sequencing+by+synthesis.&amp;rft.issn=1087-0156&amp;rft.date=2009&amp;rft.volume=27&amp;rft.issue=11&amp;rft.spage=1013&amp;rft.epage=23&amp;rft.artnum=&amp;rft.au=Fuller+CW&amp;rft.au=Middendorf+LR&amp;rft.au=Benner+SA&amp;rft.au=Church+GM&amp;rft.au=Harris+T&amp;rft.au=Huang+X&amp;rft.au=Jovanovich+SB&amp;rft.au=Nelson+JR&amp;rft.au=Schloss+JA&amp;rft.au=Schwartz+DC&amp;rft.au=Vezenov+DV&amp;rfe_dat=bpr3.included=1;bpr3.tags=Biology%2CChemistry%2CEngineering%2CPhysics%2CMaterials%2C+Optics%2C+Biochemistry%2C+Genetics+%2C+Nanoscience%2C+Biochemistry%2C+Materials%2C+Nanoscience"&gt;Fuller CW, Middendorf LR, Benner SA, Church GM, Harris T, Huang X, Jovanovich SB, Nelson JR, Schloss JA, Schwartz DC, &amp; Vezenov DV (2009). The challenges of sequencing by synthesis. &lt;span style="font-style: italic;"&gt;Nature biotechnology, 27&lt;/span&gt; (11), 1013-23 PMID: &lt;a rev="review" href="http://www.ncbi.nlm.nih.gov/pubmed/19898456"&gt;19898456&lt;/a&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-482090515875695878?l=omicsomics.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://omicsomics.blogspot.com/feeds/482090515875695878/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=36768584&amp;postID=482090515875695878' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/36768584/posts/default/482090515875695878'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/36768584/posts/default/482090515875695878'/><link rel='alternate' type='text/html' href='http://omicsomics.blogspot.com/2009/11/call-for-new-technological-minds-for.html' title='A call for new technological minds for the genome sequencing instrument fields'/><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='08368724497474381730'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-8667234261354114472</id><published>2009-11-10T23:04:00.002-05:00</published><updated>2009-11-10T23:18:46.752-05:00</updated><title type='text'>Occult Genetic Disease</title><content type='html'>A &lt;a href="http://thegenesherpa.blogspot.com/2009/10/glucowha-parkinsons-disease-and.html"&gt;clinical aside&lt;/a&gt; by Dr. Steve over at Gene Sherpas piqued my interested recently.  He mentioned a 74 year old female patient of his with lung difficulties who turned out positive both by the sweat test and genetic testing for cystic fibrosis.  One of her grandchildren had CF, which appears to have been a key hint in this direction.  This anecdote was particularly striking to me because I had recently finished &lt;a href="http://www.amazon.com/Better-Surgeons-Performance-Atul-Gawande/dp/0312427654/ref=sr_1_1?ie=UTF8&amp;s=books&amp;qid=1257912938&amp;sr=8-1"&gt;Atul Gawande's "Better"&lt;/a&gt; (highly recommended), which had a chapter on CF.  Even today, a well treated CF patient living to such an age would be remarkable; when this woman was born living to 20 would be lucky.  Clearly she either has a very modest deficit or some interesting modifier or such (late onset?) which allowed her to live to this age.&lt;br /&gt;&lt;br /&gt;Now, if this patient didn't have any CF in her family, would one test for this?  Probably not.  But thinking more broadly, will this scenario be repeated frequently in the future when complete genome sequencing becomes a routine part of large numbers of medical files?  Clearly we will have many "&lt;a href="http://thegenesherpa.blogspot.com/2009/11/long-qt-syndrome-location-matters.html"&gt;variants of unknown significance&lt;/a&gt;", but will we also find many cases of occult (hidden) genetic disease in which a patient shows clinical symptoms (but perhaps barely so).  Having a sensitive and definitive phenotypic test will assist this greatly; showing excess saltiness of sweat is pretty clear.&lt;br /&gt;&lt;br /&gt;From a clinical standpoint, many of these patients may be confusing -- if someone is nearly asymptomatic should they be treated?  But from a biology standpoint, they should prove very informative by helping us define the biological thresholds of disease or by uncovering modifiers.  Even more enticing would be the very small chance of finding examples of partial complementation -- cases where two defective alleles somehow work together to generate enough function.  One example I've thought of (admittedly a bit far-fetched, but not total science fiction) would be two alleles which each produce a protein subject to instability but when heterodimerized stabilize the protein just enough.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-8667234261354114472?l=omicsomics.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://omicsomics.blogspot.com/feeds/8667234261354114472/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=36768584&amp;postID=8667234261354114472' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/36768584/posts/default/8667234261354114472'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/36768584/posts/default/8667234261354114472'/><link rel='alternate' type='text/html' href='http://omicsomics.blogspot.com/2009/11/occult-genetic-disease.html' title='Occult Genetic Disease'/><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='08368724497474381730'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-9182146484383837394</id><published>2009-10-29T22:53:00.002-04:00</published><updated>2009-10-29T23:39:25.386-04:00</updated><title type='text'>My Most Expensive Paper</title><content type='html'>Genome Research has a &lt;a href="http://genome.cshlp.org/content/early/2009/10/28/gr.095976.109.abstract"&gt;paper detailing the Mammalian Gene Collection (MGC)&lt;/a&gt;, and if you look way down on the long author list (which includes Francis Collins!) you'll see mine there along with two Codon Devices colleagues.  This paper cost me a lot -- nothing in legal tender, but a heck of a lot of blood, sweat &amp; tears.&lt;br /&gt;&lt;br /&gt;The MGC is an attempt to have every human &amp; mouse protein coding sequence (plus more than a few rat)available as an expression clone, with native sequence.  Most of the genes were cloned from cDNA libraries, but coding sequences which couldn't be found that way were farmed out to a number of synthetic biology companies.  Codon decided to take on a particularly challenging tranche of mostly really long ORFs, hoping to demonstrate our proficiency in this difficult task.&lt;br /&gt;&lt;br /&gt;At the start, the attitude was "can-do".  When it appeared we couldn't parse some targets into our construction scheme, I devised a new algorithm that captured a few more (&lt;a href="http://omicsomics.blogspot.com/2007/08/personal-breakthrough.html"&gt;which I blogged about cryptically&lt;/a&gt;).  It was going to be a huge order which would fill our production pipeline in a expansive new facility we had recently moved into, replacing a charming but cramped historic structure.  A new system for tracking constructs through the facility was about to be rolled out that would let us finally track progress across the pipeline without a human manager constantly looking over each plasmid's shoulder.  The delivery schedule for MGC was going to be aggressive but would show our chops.  We were going to conquer the world!&lt;br /&gt;&lt;br /&gt;Alas, almost as soon as we started (and had sunk huge amounts of cash into oligos) we discovered ourselves in a small wicker container which was growing very hot.  Suddenly, nothing was working in the production facility.  A combination of problems, some related to the move (a key instrument incorrectly recalibrated)and another problem whose source was never quite nailed down forced a complete halt to all production activity for several months -- which soon meant that MGC was going to be the only trusty source of revenue -- if we could get MGC to release us from our now utterly undoable delivery schedule.&lt;br /&gt;&lt;br /&gt;Eventually, we fixed the old problems &amp; got new processes in place and pushed a bunch of production forward.  We delivered a decent first chunk of constructs to MGC, demonstrating that we were for real (but still with much to deliver).  Personnel were swiped from the other piece of the business (protein engineering) to push work forward.  More and more staff came in on weekends to keep things constantly moving.&lt;br /&gt;&lt;br /&gt;Even so, trouble still was a constant theme.  Most of the MGC project were large constructs, which were built by a hierarchical strategy.  Which means the first key task was to build all the parts -- and some parts just didn't want to be built.  We had two processes for building "leaves", and both underwent major revisions and on-the-fly process testing.  We also started screening more and more plasmids by sequencing, sometimes catching a single correct clone in a mountain of botched ones (but running up a higher and higher capillary sequencing bill).  Sometimes we'd get almost right pieces, which could be fixed by site directed mutagenesis -- yet another unplanned cost in reagents &amp; skilled labor.  I experimented with partial redesigns of some builds -- but with the constraint of not ordering more costly oligos.  Each of these pulled in a few more constructs, a few more delivered -- and a frustrating pile of still unbuilt targets. &lt;br /&gt;&lt;br /&gt;Even when we had all the parts built, the assembly of them to the next stage was failing at alarming rates -- usually by being almost right.  Yet more redesigns requiring fast dancing by the informatics staff to support.  More constructs pushed through.  More weekend shifts.&lt;br /&gt;&lt;br /&gt;In the end, when Codon shut down its gene synthesis business -- about 10 months after starting the MGC project -- we delivered a large fraction of our assignment -- but not all of it.  For a few constructs we delivered partial sequences for partial credit.  It felt good to deliver -- and awful to not deliver.  &lt;br /&gt;&lt;br /&gt;Now, given all that I've described (and more I've left out), I can't help but be a bit guilty about that author list.  It was decided at some higher level that the author list would not be several miles long, and so some sort of cut had to be made.  Easily 50 Codon employees played some role in the project, and certainly there were more than a dozen for whom it occupied a majority of their attention.  An argument could have been easily made for at least that many Codon authors.  But, the decision was made that the three of us who had most shared the project management aspect would go on the paper.  In my case, I had ended up the main traffic cop, deciding which pieces needed to be tried again through the main pipeline and which should be directed to the scientist with magic hands.  For me, authorship is a small token for the many nights I ran SQL queries at midnight to find out what had succeeded and what had failed in sequencing -- and then checked again at 6 in the morning before heading off to work.  Even on weekends, I'd be hitting the database in the morning &amp; night to find out what needed redirecting -- and then using SQL inserts to redirect them.  I realized I was on the brink of madness when I was sneaking in queries on family ski weekend.&lt;br /&gt;&lt;br /&gt;Perhaps after such a checkered experience it is natural to question the whole endeavor.  The MGC effort means that researchers who want to express a mammalian protein from a native coding sequence can do so.  But how much of what we built will actually get used?  Was it really necessary to build the native coding sequence -- which often gave us headaches in the builds from repeats &amp; GC-rich regions (or, as we belatedly discovered, certain short runs of G could foul us up)?  MGC is a great resource, but the goal of a complete catalog of mammalian genes wasn't realized -- some genes still aren't available from MGC or any of the commercial human gene collections.&lt;br /&gt;&lt;br /&gt;MGC also torture-tested Codon's construction processes, and the original ones failed badly.  Our in-progress revisions fared much better, but still did not succeed as frequently as they should have.  when we could troubleshoot things, we could ascribe certain failures to almost every conceivable source -- bad enzymes, a bad oligo well, failure to follow procedures, laboratory mix-ups, etc.  But an awful lot could not be pinned to any cause, despite investigation, suggesting that we simply did not understand our system well enough to use it in a high-throughput production environment.&lt;br /&gt;&lt;br /&gt;I do know one thing: while I hope to stay where I am for a very long time, should I ever be looking for a job again I will avoid a production facility.  Some gene synthesis projects were worse than MGC in terms of demanding customers with tight timelines (which is no knock on the customers; now I'm that customer!), but even with MGC I found it's just not the right match for me.  It's no fun to burn so much effort on just getting something through the system so that somebody else can do the cool biology.  I don't ever want to be in a situation where I'm on vacation and thinking about which things are stalled in the line.  Some people thrive in the environment; I found it draining. &lt;br /&gt;&lt;br /&gt;But, there is something to be said for the experience.  I learned a lot which can be transferred to other settings.  That which doesn't kill us makes us stronger -- MGC must have made me Superman.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-9182146484383837394?l=omicsomics.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://omicsomics.blogspot.com/feeds/9182146484383837394/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=36768584&amp;postID=9182146484383837394' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/36768584/posts/default/9182146484383837394'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/36768584/posts/default/9182146484383837394'/><link rel='alternate' type='text/html' href='http://omicsomics.blogspot.com/2009/10/my-most-expensive-paper.html' title='My Most Expensive Paper'/><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='08368724497474381730'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-4735991589714223046</id><published>2009-10-26T00:39:00.002-04:00</published><updated>2009-10-26T00:44:02.662-04:00</updated><title type='text'>DTC CNVs?</title><content type='html'>Curiosity question: do the current DTC genomics companies report out copy number variations (CNVs) to their customers?  Are any of their technologies unable to read these?  Clearly Knome (or Illumina, which isn't DTC but sort of competing with them) should be able to get this info from the shotgun sequencing.  But what about the array-based companies such as Navigenics &amp; 23andMe?  My impression is that any high density SNP array data can be mined for copy number info, but perhaps there are caveats or restrictions on that.  &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;It would seem that with CNVs so hot in the literature and a number of complex diseases being associated to them, this would be something the DTC companies would jump at.  But have they?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-4735991589714223046?l=omicsomics.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://omicsomics.blogspot.com/feeds/4735991589714223046/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=36768584&amp;postID=4735991589714223046' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/36768584/posts/default/4735991589714223046'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/36768584/posts/default/4735991589714223046'/><link rel='alternate' type='text/html' href='http://omicsomics.blogspot.com/2009/10/dtc-cnvs.html' title='DTC CNVs?'/><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='08368724497474381730'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-4624918914570119248</id><published>2009-10-24T22:53:00.003-04:00</published><updated>2009-10-24T23:12:28.954-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='conferences'/><title type='text'>Now where did I misplace that genome segment of mine?</title><content type='html'>One of the many interesting ASHG tidbits from the Twitter feed is a &lt;a href="http://twitter.com/suganthibala/statuses/5130226737"&gt;comment from "suganthibala"&lt;/a&gt; which I'll quote in full&lt;br /&gt;&lt;blockquote&gt;On average we each are missing 123 kb. homozygously. An incomplete genome is the norm. What a goofy species we are.&lt;/blockquote&gt;.&lt;br /&gt;&lt;br /&gt;I'm horribly remiss in tracking the CNV literature, but this comment makes me wonder whether this is atypical at all.  How extensively has this been profiled in other vertebrate species and how do other species look in terms of the typical amount of genome missing?  I found two papers for dogs, &lt;a href="http://www.ncbi.nlm.nih.gov/pubmed/19015322"&gt;one of which features a former lab mate as senior author&lt;/a&gt; and the &lt;a href="http://www.ncbi.nlm.nih.gov/entrez/utils/fref.fcgi?PrId=3051&amp;itool=AbstractPlus-def&amp;uid=19129542&amp;nlmid=9518021&amp;db=pubmed&amp;url=http://www.genome.org/cgi/pmidlookup?view=long&amp;pmid=19129542"&gt;other one&lt;/a&gt; has Evan Eichler in the author list.  Some work has clearly been done in &lt;a href="http://www.ncbi.nlm.nih.gov/sites/entrez?term=17965714%2017206864%2018032724%2017989247[uid]%20AND%20pubmed%20pmc%20local[sb]%20AND%20loprovpmc[sb]&amp;log$=pmcad6_more"&gt;mouse as well&lt;/a&gt;. &lt;br /&gt;&lt;br /&gt;Presumably there is some data for Drosophila, but how extensive?  Are folks going through their collections of D. melanogaster collected from all of the world and looking for structural variation?  With a second gen sequencer, this would be straightforward to do -- though a lot of libraries would need to be prepped!  Many flies could be packed into one lane of Illumina data, so this would take some barcoding.  Even cheaper might be to do it on a Polonator (reputed to cost about $500 in consumables per run (not including library prep).&lt;br /&gt;&lt;br /&gt;Attacking this by paired-end/mate-pair NGS rather than arrays (which have been the workhorse so far) would enable detecting balanced rearrangements, which arrays are blind to (though there is another &lt;a href="http://sciencepond.com/search/Eichler"&gt;tweeted item&lt;/a&gt; that Eichler states "Folks you can't get this kind of information from nextgen sequencing; you need old-fashioned capillaries" -- I'd love to hear the background on that) That leads to another proto-thought: will the study of structural variation lead to better resolution of the conundrum of speciation and changes in chromosome structure -- i.e. it's easy to see how such rearrangments could lead to reproductive isolation but not easy to see how they wouldn't be sufficiently non-isolating to allow for enough founders.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-4624918914570119248?l=omicsomics.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://omicsomics.blogspot.com/feeds/4624918914570119248/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=36768584&amp;postID=4624918914570119248' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/36768584/posts/default/4624918914570119248'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/36768584/posts/default/4624918914570119248'/><link rel='alternate' type='text/html' href='http://omicsomics.blogspot.com/2009/10/now-where-did-i-misplace-that-genome.html' title='Now where did I misplace that genome segment of mine?'/><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='08368724497474381730'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-2585524258580019545</id><published>2009-10-24T00:06:00.004-04:00</published><updated>2009-10-24T00:13:46.195-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='conferences'/><title type='text'>ASHG Tweets: Minor Fix or Slow Torture?</title><content type='html'>Okay, I'll admit it: I've been ignoring Twitter.  It doesn't help that I never really learned to text (I might have sent one in my life).  Maybe if I ever get a phone with a real keyboard, but even then I'm not sure.  Live blogging from meetings seemed a bit interesting -- but in those tiny packets?  I even came up with a great post on Twitter -- alas a few days after the first of April, when it would have been appropriate.&lt;br /&gt;&lt;br /&gt;But now I've gotten myself hooked on the Twitter feed coming from attendees at the &lt;a href="http://twitter.com/#search?q=%23ASHG"&gt;American Society for Human Genetics&lt;/a&gt;.  It's an interesting mix -- some well established bloggers, lots of folks I don't know plus various vendors hawking their booths or off-conference tours and such.  Plus, you don't even need a Twitter account!&lt;br /&gt;&lt;br /&gt;The only real problem is its really making me wish I was there.  I've never been to Hawaii, despite a nearly lifelong interest in going.  And such a cool meeting!  But, you can't go to every meeting unless your a journalist or event organizer (or sales rep!), so I had to stay home and get work done.  &lt;br /&gt;&lt;br /&gt;I suspect I'm hooked &amp; will be repeating this exercise whenever I miss good conferences.  Who knows? Maybe I'll catch the Twitter bug yet!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-2585524258580019545?l=omicsomics.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://omicsomics.blogspot.com/feeds/2585524258580019545/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=36768584&amp;postID=2585524258580019545' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/36768584/posts/default/2585524258580019545'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/36768584/posts/default/2585524258580019545'/><link rel='alternate' type='text/html' href='http://omicsomics.blogspot.com/2009/10/ashg-tweets-minor-fix-or-slow-torture.html' title='ASHG Tweets: Minor Fix or Slow Torture?'/><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='08368724497474381730'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-2618026474784290662</id><published>2009-10-22T22:33:00.002-04:00</published><updated>2009-10-22T23:34:06.981-04:00</updated><title type='text'>Physical Maps IV: Twilight of the Clones?</title><content type='html'>I've been completely slacking on completing my self-imposed series on how second generation sequencing (I'm finally trying to kick the "next gen" term) might reshape the physical mapping of genomes.  It hasn't been that my brain has been ignoring the topic, but somehow I've not extracted the thoughts through my fingertips.  And I've figured out part of the reason for my reticence -- my next installment was supposed to cover BACs and other clone-based maps, and I'm increasingly thinking these aren't going to be around much longer.&lt;br /&gt;&lt;br /&gt;Amongst the many ideas I turned over was how to adapt BACs to the second generation world.  BACs are very large segments -- often a few hundred kilobases -- cloned into low copy (generally single copy) vectors in E.coli. &lt;br /&gt;&lt;br /&gt;One approach would be to simple sequence the BACs.  One key challenge is that a single BAC is poorly matched to a second generation sequencer; even a single lane of a sequencer is gross overkill.  So good high-throughput multiplex library methods are needed.  Even so, there will be a pretty constant tax of resequencing the BAC vector and the inevitable contaminating host DNA in the prep.  That's probably going to run about 10% wastage -- not unbearable but certainly not pretty.&lt;br /&gt;&lt;br /&gt;Another type of approach is end-sequencing.  for this you really need long reads, so 454 is probably the only second generation machine suitable.  But, you need to smash down the BAC clone to something suitable for emulsion PCR.  I did see something in Biotechniques on a vectorette PCR to accomplish this, so it may be a semi-solved problem.  &lt;br /&gt;&lt;br /&gt;A complementary approach is to landmark the BACs, that is to identify a set of distinctive features which can be used to determine which BACs overlap. At the Providence conference one of the posters discussed getting 454 reads from defined restriction sites within a BAC. &lt;br /&gt;&lt;br /&gt;But, any of these approaches still require picking the individual BACs and prepping DNA from them and performing these reactions.  While converting to 454 might reduce the bill for the sequence generation, all that picking &amp; prepping is still going to be expensive.&lt;br /&gt;&lt;br /&gt;BACs baby cousins are fosmids, which are essentially the same vector concept but designed to be packaged into lambda phage.  Fosmids carry approximately 40Kb of DNA.  I've already seen ads from Roche/454 claiming that their 20Kb mate pair libraries obviate the need for fosmids.  While 20Kb is only half the span, many issues that fosmids solve are short enough to be fixed by a 20Kb span, and the 454 approach enables getting lots of them.&lt;br /&gt;&lt;br /&gt;This is all well and good, but perhaps its time to look just a little bit further ahead.  Third generation technologies are getting close to reality (those who have early access Pacific Biosciences machines might claim they are reality).  Some of the nanopore systems detailed in Rhode Island are clearly far away from being able to generate sequences you would believe.  However, physical mapping is a much less demanding application than trying to generate a consensus sequence or identify variants.  Plenty of times in my early career it was possible using BLAST to take amazingly awful EST sequences and successfully map them against known cDNAs.  &lt;br /&gt;&lt;br /&gt;Now, I don't have any inside information on any third generation systems.  But, I'm pretty sure I saw a claim that Pacific Biosciences has gotten reads close to 20Kb. Now, this could have been a "magic read" where all the stars were aligned.  But imagine for a moment if this technology can routinely hit such lengths (or even longer) -- albeit with quality that makes it unusable for true sequencing but sufficient for aligning to islands of sequence in a genome assembly.  If such a technology could generate sufficient numbers of such reads in reasonable time, the 454 20Kb paired libraries could start looking like buggy whips.&lt;br /&gt;&lt;br /&gt;Taking this logic even further, suppose one of the nanopore technologies could really scan very long DNAs, perhaps 100Kb or more.  Perhaps the quality is terrible, but again, as long as its just good enough.  For example, suppose the error rate was 15%, or a phred 8 score.  AWFUL!  But, in a sequence of 10,000 (standing for the size of a fair-sized sequence island in an assembly) you'd expect to find nearly 3 runs of 50 correct bases.  Clearly some clever algorithmics would be required (especially since with nanopores you don't know which direction the DNA is traversing the pore), but this would suggest that some pretty rotten sequencing could be used to order sequence islands along long reads.  &lt;br /&gt;&lt;br /&gt;Yet another variant on this line of thinking would be to use nanopores to read defined sequence landmarks from very long fragments.  Once you have an initial assembly, a set of unique sequences can be selected for synthesis on microarrays.  While PCR is required to amplify those oligos, it also offers an opportunity to subdivide the huge pool.  Furthermore, with sufficiently long oligos on the chip one could even have multiple universal primer targets per oligo, enabling a given landmark to be easily placed in multiple orthogonal pools.  With an optical nanopore reading strategy, 4 or more color-coded pools could be hybridized simultaneously and read.  Multiple colors might be used for more elaborate coding of sequence islands -- i.e. one island might be encoded with a series of flashing lights, much like some lighthouses.  Again, clever algorithmics would be needed to design such probe strategies.  &lt;br /&gt;&lt;br /&gt;How far away would such ideas be?  Someone more knowledgeable about the particular technologies could guess better than I could.  But, it would certainly be worth exploring, at least on paper, for anyone wanting to show that nanopores are close to prime time.  While really low quality reads or just landmarking molecules might not seem exciting, it would offer a chance to get the technology into routine operation -- and from such routine operation comes continuous improvement.  In other words, the way to push nanopores into routine sequencing might be by carefully picking something other than sequence -- but making sure that it is a path to sequencing and not a detour.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-2618026474784290662?l=omicsomics.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://omicsomics.blogspot.com/feeds/2618026474784290662/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=36768584&amp;postID=2618026474784290662' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/36768584/posts/default/2618026474784290662'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/36768584/posts/default/2618026474784290662'/><link rel='alternate' type='text/html' href='http://omicsomics.blogspot.com/2009/10/physical-maps-iv-twilight-of-clones.html' title='Physical Maps IV: Twilight of the Clones?'/><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='08368724497474381730'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-2319834782608363012</id><published>2009-10-14T23:17:00.002-04:00</published><updated>2009-10-14T23:57:27.397-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='genome sequencing'/><title type='text'>Why I'm Not Crazy About The Term "Exome Sequencing"</title><content type='html'>I find myself worrying sometimes that I worry too much about the words I use -- and worry some of the rest of the time that I don't worry enough.  What can seem like the right words at one time might seem wrong some other time.  The terms "&lt;a href="http://www.genomeweb.com/search/google?cx=001523166877881412738:giguif19v6c&amp;cof=FORID:11&amp;query=killer+app&amp;op=Search&amp;form_build_id=form-e20537b5c2c535c4b4d00720841ac9e4&amp;form_token=910030c86f3d9349f727ec5cac30cb24&amp;form_id=google_cse_results_searchbox_form#978"&gt;killer app&lt;/a&gt;" are thrown around a lot in the tech space, but would you really want to hear it used about sequencing a genome if you were the patient whose DNA was under scrutiny?&lt;br /&gt;&lt;br /&gt;One term that sees a lot of traction these days is "exome sequencing".  I listened in on a free &lt;a href="http://w.on24.com/r.htm?e=165328&amp;s=1&amp;k=672F81556EC6B80FF26A54B43FE5554B"&gt;Science magazine webinar&lt;/a&gt; today on the topic, and the presentations were all worthwhile.  The focus was on the Nimblegen capture technology (Roche/Nimblegen/454 sponsored the webinar), though other technologies were touched on.&lt;br /&gt;&lt;br /&gt;By "exome sequencing" what is generally meant is to capture &amp; sequence the exons in the human genome in order to find variants of interest.  Exons have the advantage of being much more interpretable than non-coding sequences; we have some degree of theory (though quite incomplete) which enables prioritizing these variants.  The approach also has the advantage of being significantly cheaper at the moment than whole genome sequencing (one speaker estimated $20K per exome).  So what's the problem?&lt;br /&gt;&lt;br /&gt;My concern is that the terms "exome sequencing" are taken a bit too literally.  Now, it is true that these approaches catch a bit of surrounding DNA due to library construction and the targeting approaches cover splice junctions, but what about some of the other important sequences?  According to my poll of practitioners of this art, their targets are entirely exons (confession: N=1 for the poll).  &lt;br /&gt;&lt;br /&gt;I don't have a general theory for analyzing non-coding variants, but conversely there are quite a few well annotated non-coding regions of functional significance.  An obvious case are promoters.  Annotation of human promoters and enhancers and other transcriptional doodads is an ongoing process, but some have been well characterized.  In particular, the promoters for many drug metabolizing enzymes have been scrutinized because these may have significant effects on how much of the enzyme is synthesized and therefore drug metabolism. &lt;br /&gt;&lt;br /&gt;Partly coloring my concern is the fact that exome sequencing kits are becoming standardized; at least two are on the market currently.  Hence, the design shortcomings of today might influence a lot of studies.  Clearly sequencing every last candidate promoter or enhancer would tend to defeat the advantages of exome sequencing, but I believe a reasonable shortlist of important elements could be rapidly identified.&lt;br /&gt;&lt;br /&gt;My own professional interest area, cancer genomics, adds some additional twists.  At least one major cancer genome effort (at the Broad) is using exome sequencing.  On the one hand, it is true that there are relatively few recurrent, focused non-coding alterations documented in cancer.  However, few is not none.  For example, in lung cancer the c-Met oncogene has been documented to be &lt;a href="http://cancerres.aacrjournals.org/cgi/content/full/63/19/6272"&gt;activated by mutations within an intron&lt;/a&gt;; these mutations cause skipping of an exon encoding an inhibitory domain.  Some of these alterations are about 50 nucleotides away from the nearest splice junction -- a distance that is likely to result in low or no coverage using the &lt;a href="http://www.ncbi.nlm.nih.gov/entrez/utils/fref.fcgi?PrId=3494&amp;itool=AbstractPlus-nondef&amp;uid=19182786&amp;nlmid=9604648&amp;db=pubmed&amp;url=http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pubmed&amp;pubmedid=19182786"&gt;Broad's in solution capture technology&lt;/a&gt; (confession #2: I haven't verified this with data from that system).  &lt;br /&gt;&lt;br /&gt;The drug metabolizing enzyme promoters I mentioned before are a bit greyer for cancer genomics.  On the one hand, one is generally primarily interested in what somatic mutations have occurred on the tumor.  On the other hand, the norm in cancer genomics is tending towards applying the same approach to normal (cheek swab or lymphocyte) DNA from the patient, and why not get the DME promoters too?  After all, these variants may have &lt;a href="http://www.ncbi.nlm.nih.gov/pubmed/19823875?ordinalpos=1&amp;itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_DefaultReportPanel.Pubmed_RVDocSum"&gt;influenced the activity of therapeutic agents or even development of the disease&lt;/a&gt;.  Just as some somatic mutations seem to cluster enigmatically with patient characteristics, perhaps some somatic mutations will correlate with germline variants which contributed to disease initiation.&lt;br /&gt;&lt;br /&gt;Whatever my worries, they should be time-limited.  Exome sequencing products will be under extreme pricing pressure from whole genome sequencing.  The $20K cited (probably using 454 sequencing) is already potentially matched by one vendor (Complete Genomics).  Now, in general the cost of capture will probably be a relatively small contributor compared to the cost of data generation, so exome sequencing will ride much of the same cost curve as the rest of the industry.  But, it probably is $1-3K for whole exome capture due to the multiple chips required and the labor investment (anyone have a better estimate?).  If whole mammalian genome sequencing really can be pushed down into the $5K range, then mammalian exome sequencing will not offer a huge cost advantage if any.  I'd guess interest in mammalian exome sequencing will peak in a year or two, so maybe I should stop worrying and learn to love the hyb.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-2319834782608363012?l=omicsomics.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://omicsomics.blogspot.com/feeds/2319834782608363012/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=36768584&amp;postID=2319834782608363012' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/36768584/posts/default/2319834782608363012'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/36768584/posts/default/2319834782608363012'/><link rel='alternate' type='text/html' href='http://omicsomics.blogspot.com/2009/10/why-im-not-crazy-about-term-exome.html' title='Why I&apos;m Not Crazy About The Term &quot;Exome Sequencing&quot;'/><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='08368724497474381730'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-5739116942368040164</id><published>2009-10-09T14:32:00.007-04:00</published><updated>2009-10-09T14:39:19.756-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='administration'/><title type='text'>Bad blog!  Bad, bad, bad blog!</title><content type='html'>Thanks to Dan Koboldt from Mass Genomics, I've discovered that another blog (the &lt;a href="http://medicalcenterinfo.com/"&gt;Oregon Personal Injury Law Blog&lt;/a&gt; had copied my breast cancer genome piece.  Actually, it appears that since it started this summer it may have copied every one of my posts here at &lt;a href="http://omicsomics.blogspot.com"&gt;Omics! Omics!&lt;/a&gt; without any attribution or apparent linking back.  I've left a comment (which is moderated) protesting this.&lt;br /&gt;&lt;br /&gt;curiously, the author of this blog (I assume it has one) doesn't seem to have left any identifying information or contact info, so for the moment the comments section is my only way of communicating.  Perhaps this is some sort of wierd RSS-driven bug; that's the only charitable explanation I can contemplate.  But it is strange -- most of these have no possible link to personal injury -- or can PNAS sue me for complaining about their RSS feed?&lt;br /&gt;&lt;br /&gt;We'll see if the author fixes this, or at least replies with something along the lines of "head down, ears flat &amp; tail between the legs".  &lt;br /&gt;&lt;br /&gt;Just to double-check the RSS hypothesis, I'm actually going to explicitly sign this one -- Keith Robison from &lt;a href="http://omicsomics.blogspot.com"&gt;Omics! Omics!&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-5739116942368040164?l=omicsomics.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://omicsomics.blogspot.com/feeds/5739116942368040164/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=36768584&amp;postID=5739116942368040164' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/36768584/posts/default/5739116942368040164'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/36768584/posts/default/5739116942368040164'/><link rel='alternate' type='text/html' href='http://omicsomics.blogspot.com/2009/10/bad-blog-bad-bad-bad-blog.html' title='Bad blog!  Bad, bad, bad blog!'/><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='08368724497474381730'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-603880617736730152</id><published>2009-10-09T00:06:00.004-04:00</published><updated>2009-10-09T00:53:09.757-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='proteomics'/><category scheme='http://www.blogger.com/atom/ns#' term='metagenomics'/><title type='text'>Nano Anglerfish Snag Orphan Enzymes</title><content type='html'>The new Science has an extremely impressive paper tackling the problem of orphan enzymes.  Due primarily to Watson-Crick basepairing, our ability to sequence nucleic acids has shot far past our ability to characterize the proteins they may encode.  If I want to measure an RNA's expression, I can generate an assay almost overnight by designing specific real-time PCR (aka RT-PCR aka TaqMan) probes.  If I want to analyze any specific protein's expression, it generally involves a lot of teeth gnashing &amp; frustration.  If you're lucky, there is a good antibody for it -- but most times there is either no antibody or one of unknown (and probably poor) character.  Mass spec based methods continue to improve, but still don't have an "analyze any protein in any biological sample anytime" character (yet?).&lt;br /&gt;&lt;br /&gt;One result of this is that there are a lot of ORFs of unknown function in any sequenced genome.  Bioinformatic approaches can make guesses for many of these and those guesses are often around enzymatic activity, but a bioinformatic prediction is not proof and the predictions are often quite vague (such as "hydrolase").  Structural genomics efforts sometimes pull in additional proteins whose sequence didn't resemble anything of known function, but whose structure has enzymatic characteristics such as nucleotide binding pockets.  There have been one or two of such structures de-orphaned by virtual screening, but these are a rarity.&lt;br /&gt;&lt;br /&gt;Attempts have been made at high-throughput screening of enzyme activities.  For example, several efforts have been published in which cloned libraries of proteins from a proteome were screened for enzyme activity.  While these produced initial papers, they've never seemed to really catch fire.&lt;br /&gt;&lt;br /&gt;The new paper is audacious in providing an approach to detecting enzyme activities and subsequently identifying the responsible proteins, all from protein extracts.  The key trick is an array of golden nano anglerfish -- well, that's how I imagine it.  Like an anglerfish, the gold nanoparticles dangle their chemical baits off long spacers (poly-A, of all things!).  In reverse of an anglerfish, the bait complex glows &lt;span style="font-style:italic;"&gt;after&lt;/span&gt; it has been taken by its prey, with a clever unquenching mechanism activating the fluorophore and marking that a reaction took place.  But the real kicker is that like an anglerfish, the nanoparticles seize their prey!  Some clever chemistry around a bound Cobalt ion (which I won't claim to understand)results in linking the enzyme to the nanoparticle, from which it can be cleaved, trypsinized and identified by mass spectrometry.  1676 known metabolites and 807 other compounds of interest were immobilized in this fashion.  &lt;br /&gt;&lt;br /&gt;As one test, the researchers applied separately extracts of the bacteria Pseudomonas putida and Streptomyces coelicolor to arrays.  Results were in quite strong agreement with the existing bioinformatic annotations of these organisms, in that the P.putida extract's pattern of metabolized and not metabolized substrates strongly coincided with what the informatics would predict and the same was true for S.coelicolor (with a P&lt;5.77^-177 for the latter!). But, agreement was not perfect -- each species catalyzed additional reactions on the array which were absent from the databases.  By identifying the bound proteins, numerous assignments were made which were either novel or significant refinements of the prior annotation. Out of 191 proteins identified in the P.putida set, 31 hypothetical proteins were assigned function, 47 proteins were assigned a different function and the previously ascribed function was confirmed for the remaining 113 proteins.&lt;br /&gt;&lt;br /&gt;Further work was done with environmental samples.  However, given the low protein abundance from such samples, these were converted into libraries cloned into E.coli and then the extracts from these E.coli strains analyzed.  Untransformed E.coli was used to estimate the backgrounds to subtract -- I must confess a certain disappointment that the paper doesn't report any novel activities for E.coli, though it isn't clear that they checked for them (but how could you not!).  The samples came from three extreme environments -- one from a hot, heavy metal rich acidic pool, one from oil-contaminated seawater and a third from a deep sea hypersaline anoxic region.  From each sample a plethora of enzyme activities were discovered.&lt;br /&gt;&lt;br /&gt;Of course, there are limits to this approach.  The tethering mechanism may interfere with some enzymes acting on their substrates.  It may, therefore, be desirable to place some compounds multiple times on the array but with the linker attached at different points.  It is unlikely we know all possible metabolites (particularly for strange bugs from strange places), so some enzymes can't be deorphaned this way.  And sensitivity issues may challenge finding some enzyme activities if very few copies of the enzyme are present.&lt;br /&gt;&lt;br /&gt;On the other hand, as long as these issues are kept in mind this is an unprecedented &amp; amazing haul of enzyme annotations.  Application of this method to industrially important fungi &amp; yeasts is another important area, and certainly only the bare surface of the bacterial world was scratched in this paper.  Arrays with additional unnatural -- but industrially interesting -- substrates are hinted at in the paper.  Finally, given the reawakened interest in small molecule metabolism in higher organisms &amp; their diseases (such as cancer), application of this method to human samples can't be far behind.  &lt;br /&gt;&lt;br /&gt;&lt;span style="float: left; padding: 5px;"&gt;&lt;a href="http://www.researchblogging.org"&gt;&lt;img alt="ResearchBlogging.org" src="http://www.researchblogging.org/public/citation_icons/rb2_large_gray.png" style="border:0;"/&gt;&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Science&amp;rft_id=info%3A%2F10.1126%2Fscience.1174094&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=Reactome+array%3A+Forging+a+link+between+metabolome+and+genome&amp;rft.issn=&amp;rft.date=2009&amp;rft.volume=326&amp;rft.issue=5950&amp;rft.spage=252&amp;rft.epage=257&amp;rft.artnum=http%3A%2F%2Fwww.sciencemag.org%2Fcgi%2Fcontent%2Fabstract%2F326%2F5950%2F252&amp;rft.au=Ana+Beloqui&amp;rft.au=Mar%C3%ADa-Eugenia+Guazzaroni&amp;rft.au=Florencio+Pazos&amp;rft.au=Jos%C3%A9+M.+Vieites&amp;rft.au=Marta+Godoy&amp;rft.au=Olga+V.+Golyshina%2C&amp;rft.au=Tatyana+N.+Chernikova&amp;rft.au=Agnes+Waliczek&amp;rft.au=Rafael+Silva-Rocha&amp;rft.au=Yamal+Al-ramahi&amp;rft.au=Violetta+La+Cono&amp;rft.au=Carmen+Mendez&amp;rft.au=Jos%C3%A9+A.+Salas&amp;rft.au=Roberto+Solano&amp;rft.au=Michail+M.+Yakimov&amp;rft.au=Kenneth+N.+Timmis&amp;rft.au=Peter+N.+Golyshin&amp;rft.au=Manuel+Ferrer&amp;rfe_dat=bpr3.included=1;bpr3.tags=Biology%2CChemistry%2CBiotechnology%2C+Biochemistry%2C+Bioinformatics%2C+Microbiology+%2C+Chemical+Biology%2C+Biochemistry%2C+Biological+Chemistry"&gt;Ana Beloqui, María-Eugenia Guazzaroni, Florencio Pazos, José M. Vieites, Marta Godoy, Olga V. Golyshina,, Tatyana N. Chernikova, Agnes Waliczek, Rafael Silva-Rocha, Yamal Al-ramahi, Violetta La Cono, Carmen Mendez, José A. Salas, Roberto Solano, Michail M. Yakimov, Kenneth N. Timmis, Peter N. Golyshin, &amp; Manuel Ferrer (2009). Reactome array: Forging a link between metabolome and genome &lt;span style="font-style: italic;"&gt;Science, 326&lt;/span&gt; (5950), 252-257 : &lt;a rev="review" href="http://www.sciencemag.org/cgi/content/abstract/326/5950/252"&gt;10.1126/science.1174094&lt;/a&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-603880617736730152?l=omicsomics.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://omicsomics.blogspot.com/feeds/603880617736730152/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=36768584&amp;postID=603880617736730152' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/36768584/posts/default/603880617736730152'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/36768584/posts/default/603880617736730152'/><link rel='alternate' type='text/html' href='http://omicsomics.blogspot.com/2009/10/nano-anglerfish-snag-orphan-enzymes.html' title='Nano Anglerfish Snag Orphan Enzymes'/><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='08368724497474381730'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-7615799363377654638</id><published>2009-10-07T23:37:00.002-04:00</published><updated>2009-10-08T00:29:56.426-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='cancer'/><title type='text'>The genomic history of a breast cancer revealed</title><content type='html'>Today's Nature contains a great paper which is one more step forward for cancer genomics.  Using Illumina sequencing a group in British Columbia sequenced both the genome and transcriptome of a metastatic lobular (estrogen receptor positive) breast cancer.  Furthermore, they searched a sample of the original tumor for mutations found in the genome+transcriptome screen in order to identify those that may have been present early vs. those which were acquired later.&lt;br /&gt;&lt;br /&gt;From the combined genome sequence and RNA-Seq data they found 1456 non-synonymous changes which was then trimmed to 1178 after removing pseudogenes and HLA sequences.  1120 of these could be re-assayed by Sanger sequencing of PCR amplicons from both normal DNA and the metastatic samples -- 437 of these were confirmed.  Most of these (405) were found in the normal sample.  Of the 32 remaining, 2 were found only in the RNA-Seq data, a point to be addressed later below.  Strikingly, none of the mutated genes were found in the previous whole-exome sequencing (by PCR+Sanger) of breast cancer, though those samples were of a different subtype (estrogen receptor negative).&lt;br /&gt;&lt;br /&gt;There are a bunch of cool tidbits in the paper, which I'm sure I won't give full justice to here but I'll do my best.  For example, several other papers using RNA-Seq on solid cancers have identified fusion proteins, but in this paper none of the fusion genes suggested by the original sequencing came through their validation process.  Most of the coding regions with non-synonymous mutations have not been seen to be mutated before in breast cancer, though ERBB2 (HER2, the target of Herceptin) is in the list along with PALB2, a gene which when mutated predisposes individuals to several cancers (and is also associated with BRCA2).  The algorithm (SNVMix) used for SNP identification &amp; frequency estimation is a good example of an &lt;a href="http://omicsomics.blogspot.com/2007/10/scientific-easter-eggs.html"&gt;easter egg&lt;/a&gt;, a supplementary item that could easily be its own paper.&lt;br /&gt;&lt;br /&gt;One great little story is HAUS3.  This was found to have a truncating stop codon mutation and the data suggests that the mutation is homozygous (but at normal copy number) in the tumor.  A further screen of 192 additional breast cancers (112 lobular and 80 ductal) for several of the mutations found no copies of the same hits seen in this sample, but two more truncating mutations in HAUS3 were found (along with 3 more variations in ERBB2 within the kinase domain, a hotspot for cancer mutations).  HAUS3 is particularly interesting because until about a year ago it was just C4orf15, an anonymous ORF on chromosome 15.  Several papers have recently described a complex ("augmin") which plays a role in genome stability, and HAUS3 is a component of this complex.  This starts smelling like a tumor suppressor (truncating mutations seen repeatedly; truncating mutation homozygous in tumor; protein in function often crippled in cancer), and I'll bet HAUS3 will be showing up in some functional studies in the not too distant future.&lt;br /&gt;&lt;br /&gt;Resequencing of the primary tumor was performed using amplicons targeting the mutations found in the metastatic tumor.  These amplicons were small enough to be spanned directly by paired-end Illumina reads, obviating the need for library construction (a trick which has shown up in some other papers).  By using Illumina sequencing for this step, the frequency of the mutation in the sample could be estimated.  It is also worth noting that the primary tumor sample was a Formalin Fixed Paraffin Embedded slide, a way to preserve histology which is notoriously harsh on biomolecules and prone to sequencing artifacts. Appropriate precautions were made, such as sequencing two different PCR amplifications from two different DNA extractions.  The sequencing of the primary tumor suggests that only 10 of the mutations were present there, with only 4 of these showing a frequency consistent with being present in the primary clone and the others probably being minor components.  This is another important filter to suggest which genes are candidates for being involved in early tumorigenesis and which are more likely late players (or simply passengers).&lt;br /&gt;&lt;br /&gt;One more cool bit I parked above: the 2 variants seen only in the RNA-Seq library.  This suggested RNA editing and also consistent with this an RNA editase (ADAR) was found to be highly represented in the RNA-Seq data.  Two genes (COG3 and SRP9) showed high frequency editing.  RNA editing is beginning to be recognized as a widespread phenomenon in mammals (e.g. the nice work by Jin Billy Li in the Church lab); the possibility that cancers can hijack this for nefarious purposes should be an interesting avenue to explore.  COG3 is a Golgi protein &amp; &lt;a href="http://www.nature.com/nature/journal/v459/n7250/full/nature08109.html"&gt;links of the Golgi to cancer&lt;/a&gt; are starting to be teased out.  SRP9 is part of the signal recognition particle involved in protein translocation into the ER -- which of course feeds the Golgi.  Quite possibly this is coincidental, but it certainly rates investigating.&lt;br /&gt;&lt;br /&gt;One final thought: the next year will probably be filled with a lot of similar papers.  Cancer genomics is &lt;a href="http://www.genomeweb.com/cancer-genome-atlas-gets-275m-funding-stimulus-nci-and-nhgri"&gt;gearing up in a huge way&lt;/a&gt;, with &lt;a href="http://www.massgenomics.org/2009/08/wucgi-washu-cancer-genomics-initiative.html"&gt;Wash U alone planning 150 genomes&lt;/a&gt; well before a year from now.  It seems unlikely that those 150 genomes will end up as 150 distinct papers and more so it will be a challenge to do the level of follow-up in this paper on such a grand scale.  A real challenge to the experimental community -- and the funding establishment -- is converting the tantalizing observations which will come pouring out of these studies into validated biological findings.  With a little luck, biotech &amp; pharma companies (such as my employer) will be able to convert those findings into new clinical options for doctors and patients.  &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="float: left; padding: 5px;"&gt;&lt;a href="http://www.researchblogging.org"&gt;&lt;img alt="ResearchBlogging.org" src="http://www.researchblogging.org/public/citation_icons/rb2_large_gray.png" style="border:0;"/&gt;&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Nature&amp;rft_id=info%3A%2F10.1038%2Fnature08489&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=Mutational+evolution+in+a+lobular+breast+tumor+profiled+at+single+nucleotide+resolution&amp;rft.issn=&amp;rft.date=2009&amp;rft.volume=461&amp;rft.issue=&amp;rft.spage=809&amp;rft.epage=813&amp;rft.artnum=http%3A%2F%2Fwww.nature.com%2Fnature%2Fjournal%2Fv461%2Fn7265%2Fabs%2Fnature08489.html&amp;rft.au=Sohrab+P.+Shah&amp;rft.au=Ryan+D.+Morin&amp;rft.au=Jaswinder+Khattra&amp;rft.au=Leah+Prentice&amp;rft.au=Trevor+Pugh&amp;rft.au=Angela+Burleigh&amp;rft.au=Allen+Delaney&amp;rft.au=Karen+Gelmon&amp;rft.au=Ryan+Guliany&amp;rft.au=Janine+Senz&amp;rft.au=Christian+Steidl&amp;rft.au=Robert+A.+Holt&amp;rft.au=Steven+Jones&amp;rft.au=Mark+Sun&amp;rft.au=Gillian+Leung&amp;rft.au=Richard+Moore&amp;rft.au=Tesa+Severson&amp;rft.au=Greg+A.+Taylor&amp;rft.au=Andrew+E.+Teschendorff&amp;rft.au=Kane+Tse&amp;rft.au=Gulisa+Turashvili&amp;rft.au=Richard+Varhol&amp;rft.au=Ren%C3%A9+L.+Warren&amp;rft.au=Peter+Watson&amp;rft.au=Yongjun+Zhao&amp;rft.au=Carlos+Caldas&amp;rft.au=David+Huntsman&amp;rft.au=Martin+Hirst&amp;rft.au=Marco+A.+Marra&amp;rft.au=Samuel+Aparicio&amp;rfe_dat=bpr3.included=1;bpr3.tags=Biology%2CCancer%2C+Genetics"&gt;Sohrab P. Shah, Ryan D. Morin, Jaswinder Khattra, Leah Prentice, Trevor Pugh, Angela Burleigh, Allen Delaney, Karen Gelmon, Ryan Guliany, Janine Senz, Christian Steidl, Robert A. Holt, Steven Jones, Mark Sun, Gillian Leung, Richard Moore, Tesa Severson, Greg A. Taylor, Andrew E. Teschendorff, Kane Tse, Gulisa Turashvili, Richard Varhol, René L. Warren, Peter Watson, Yongjun Zhao, Carlos Caldas, David Huntsman, Martin Hirst, Marco A. Marra, &amp; Samuel Aparicio (2009). Mutational evolution in a lobular breast tumor profiled at single nucleotide resolution &lt;span style="font-style: italic;"&gt;Nature, 461&lt;/span&gt;, 809-813 : &lt;a rev="review" href="10.1038/nature08489"&gt;10.1038/nature08489&lt;/a&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-7615799363377654638?l=omicsomics.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://omicsomics.blogspot.com/feeds/7615799363377654638/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=36768584&amp;postID=7615799363377654638' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/36768584/posts/default/7615799363377654638'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/36768584/posts/default/7615799363377654638'/><link rel='alternate' type='text/html' href='http://omicsomics.blogspot.com/2009/10/genomic-history-of-breast-cancer.html' title='The genomic history of a breast cancer revealed'/><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='08368724497474381730'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-8434385511641647816</id><published>2009-10-06T23:46:00.002-04:00</published><updated>2009-10-07T00:06:29.833-04:00</updated><title type='text'>Diagramming the Atari Pathway</title><content type='html'>Okay, it was an outside speaker at work who planted this seed in my brain, and now I can't shake the image -- but perhaps by writing this I will (but also perhaps I will infect my loyal readers with it).&lt;br /&gt;&lt;br /&gt;The stated observation was that some biological pathway diagrams "look like Space Invaders".  Now, I hold such games dear to my heart -- they were quite the rage in our neighborhood growing up, though we didn't own one &amp; I was never very good.  Nowadays one can buy replicas which play many of the old games -- except the entire system fits inside the replica of the old joysticks.  My hardware-oriented brother loves to point out all the interesting workarounds which are now fossilized in these players -- such as limits on the number of moving graphics ("sprites") which could occupy a scan line.&lt;br /&gt;&lt;br /&gt;But which video game seems to be the model for some of these diagrams?  Space invaders is an obvious candidate (or one of the knockoffs or follow-ons such as Galaga), but my old favorite Centipede (or its successor Millipede) is even closer -- they even had spiders trying to spin webs.&lt;br /&gt;&lt;br /&gt;It would be a pretty funny visual joke -- saved for precisely the right time (the wrong time could be disaster!) -- to have a pathway display morph into a game.  The transcription factors start moving about and crashing into the kinases which in turn blast away at the receptors.  &lt;br /&gt;&lt;br /&gt;Versions of the reverse have sometimes occupied my mind -- what if we could make scientific programs more game-like?  The notion I most commonly ponder is a flight simulator for protein structures.  Even that could be taken to another level -- your X-wing is flying down a canyon of the giant structure, ready to unleash a boronic warhead to destroy the evil proteasomic death star!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-8434385511641647816?l=omicsomics.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://omicsomics.blogspot.com/feeds/8434385511641647816/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=36768584&amp;postID=8434385511641647816' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/36768584/posts/default/8434385511641647816'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/36768584/posts/default/8434385511641647816'/><link rel='alternate' type='text/html' href='http://omicsomics.blogspot.com/2009/10/diagramming-atari-pathway.html' title='Diagramming the Atari Pathway'/><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='08368724497474381730'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-8371831931300523784</id><published>2009-10-06T00:04:00.002-04:00</published><updated>2009-10-06T00:10:18.218-04:00</updated><title type='text'>Why does PNAS clip their RSS feeds?</title><content type='html'>Okay, minor pet peeve.  I've pretty much switched over to using Outlook as an RSS reader to keep up with journals of interest.  I still get a few ToC by email, but the RSS mechanism has lots of advantages.  First, I'm in Outlook all the time, so it's a natural place.  Second, I can leave behind copies of the papers of interest, with all the tools in Outlook for moving them or tagging them &amp; such.  One minor annoyance is you can't (as least as far as I can tell) force a scan of the RSS feeds.  Sure, mostly this is obsessive or time-killing, but when you have intermittent net access it's really handy.&lt;br /&gt;&lt;br /&gt;But one big difference in the feeds.  Most ToC feeds send out one entry per article and that entry contains the title, authors &amp; abstract.  But PNAS sends out only the authors, title &amp; a very short head end of the abstract.  Aaaaarrrrrgggghhhh!   Lost is much of the ability to vet my level of interest in an article plus the additional keywords which would enhance searching for it.&lt;br /&gt;&lt;br /&gt;I realize PNAS is already busy with torquing their acceptance channels, but could someone who knows someone there in power please get them to fix this?!!?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-8371831931300523784?l=omicsomics.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://omicsomics.blogspot.com/feeds/8371831931300523784/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=36768584&amp;postID=8371831931300523784' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/36768584/posts/default/8371831931300523784'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/36768584/posts/default/8371831931300523784'/><link rel='alternate' type='text/html' href='http://omicsomics.blogspot.com/2009/10/why-does-pnas-clip-their-rss-feeds.html' title='Why does PNAS clip their RSS feeds?'/><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='08368724497474381730'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-8780865675423049126</id><published>2009-10-01T22:19:00.003-04:00</published><updated>2009-10-01T22:49:52.648-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='conferences'/><title type='text'>Pondering Polonators</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_N2AOZejgjyA/SsVmR4RoSoI/AAAAAAAAAEU/AotDROtd4rc/s1600-h/DSC01984.JPG"&gt;&lt;img style="float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;width: 400px; height: 300px;" src="http://3.bp.blogspot.com/_N2AOZejgjyA/SsVmR4RoSoI/AAAAAAAAAEU/AotDROtd4rc/s400/DSC01984.JPG" border="0" alt=""id="BLOGGER_PHOTO_ID_5387824986568804994" /&gt;&lt;/a&gt;&lt;br /&gt;Standing next to the Polonator like a proud relative is Kevin McCarthy, who leads the Polonator effort at Dover Systems.  I had remembered him giving permission to photograph it at the first day of the Providence meeting &amp; brought my camera along the second day.  When I mentioned it was for my blog, Kevin leaped into the frame.  All in good fun!&lt;br /&gt;&lt;br /&gt;The Polonator is an intriguing gadget.  No other next-gen sequencer can be had for under $200K -- or about 1/2 to 1/4 the price of any of the other instruments.  But it's no tinfoil-and-paperclip contraption -- not only does it look very solid &amp; professional, with everything laid out neatly in the cabinet, but in one small test it was quite robust.  Kevin had it running mock sequencing cycles and he said "if you put your hand on the stage".  I thought he was being hypothetical, but then he politely insisted I do just that. Clearly he wasn't worried about anything going wrong (and somehow I was convinced my hand would emerge unscathed!).  &lt;a href="http://omicsomics.blogspot.com/2009/09/chi-next-gen-conference-day-1.html"&gt;In his talk&lt;/a&gt;, Kevin pointed out the various vibration isolation schemes engineered in -- you need not tiptoe past it during operation, despite the fact that it is doing some amazingly high-precision imaging.&lt;br /&gt;&lt;br /&gt;The truly intriguing angle on the Polonator is that it is a completely open architecture.  If you want to play around with different chemistries, go ahead (but please respect appropriate licenses!).  I'm guessing you could probably run any of the existing amplification-based chemistries on it (again, licenses might be an issue) -- presumably with a loss of performance.  Of course, with 454 you need continuous watching of a small bit of the flowcell, so the machine isn't ideal.  But that isn't the point -- you could use this as a general hardware &amp; software chassis to experiment.  I &lt;a href="http://omicsomics.blogspot.com/2009/09/chi-next-gen-conference-day-2.html"&gt;speculated previously that some new sequencing-by-synthesis chemistries&lt;/a&gt; could be run on the Polonator, and on further reflection I'm wondering if the optical-based nanopore scheme could be prototyped on a Polonator.  NAR published earlier this year &lt;a href="http://nar.oxfordjournals.org/cgi/content/full/37/1/e5?maxtoshow=&amp;HITS=10&amp;hits=10&amp;RESULTFORMAT=&amp;fulltext=cyclic&amp;searchid=1&amp;FIRSTINDEX=0&amp;resourcetype=HWCIT"&gt;another proposed chemistry&lt;/a&gt; that would seem Polonator-friendly.  &lt;br /&gt;&lt;br /&gt;If you wish to reprogram the fluidics, go ahead!  If you wish to image only in 1 color (the default chemistry requires 4), that's programmable.  &lt;span style="font-style:italic;"&gt;Everything&lt;/span&gt; is programmable.&lt;br /&gt;&lt;br /&gt;That's pretty enticing from a techie angle, but it's also a pretty risky business strategy.  Generally such an expensive gadget is either paid for with a hefty markup up front and/or a hefty premium on reagents.  But, while standard reagent kits are on the way, there's nothing proprietary about them.  Anyone can whip up their own.  Just like the hardware &amp; software, the wetware is all open as well.&lt;br /&gt;&lt;br /&gt;There's also the issue of the current chemistry, which appears to be the original Church lab sequencing-by-ligation scheme.  That means a bunch of sample prep steps and very short reads -- &lt;a href="http://www.polonator.org/protocols/polony.aspx"&gt;26 nucleotides of tag&lt;/a&gt;.  The tags are derived from the original sequence in a predictable way but which isn't quite like getting two simple paired-end or mate-pair reads.  That may be a barrier to many software toolsmiths including Polonator in their code, though perhaps with wide acceptance that would happen.  But, with 10Gbases of data after 80 hours of running, it may attract some attention!&lt;br /&gt;&lt;br /&gt;I'm trying to figure out how I would use one if I had one.  In the abstract sense, polony sequencing has already been shown quite capable of sequencing bacterial genomes.  Also, Complete Genomics' chemistry generates reads in the same ballpark and they are tackling human.  But would I have the courage to try that?  Certainly in my current professional situation it would be going out a bit on a limb.  Plus, even at under $200K it really needs to be kept busy to look like a good buy.  Does almost make me wish I was back in graduate school, as that is the time to experiment with such cool toys!&lt;br /&gt;&lt;br /&gt;On the other hand, I do have some notions of what I might try out on one.  Not enough notions to be able to justify buying one, but certainly if I could rent some time on one at a reasonable price I'd jump at the notion.  With luck, a service provider or two will decide to offer Polonating as a service.  Or, perhaps someone who has bought one might be interested in collaborating on some interesting clinically-relevant projects?  If so, leave me a comment here (which I won't make visible) &amp; we can talk!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-8780865675423049126?l=omicsomics.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://omicsomics.blogspot.com/feeds/8780865675423049126/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=36768584&amp;postID=8780865675423049126' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/36768584/posts/default/8780865675423049126'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/36768584/posts/default/8780865675423049126'/><link rel='alternate' type='text/html' href='http://omicsomics.blogspot.com/2009/10/pondering-polonators.html' title='Pondering Polonators'/><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='08368724497474381730'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_N2AOZejgjyA/SsVmR4RoSoI/AAAAAAAAAEU/AotDROtd4rc/s72-c/DSC01984.JPG' height='72' width='72'/><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-7910595058025295165</id><published>2009-09-28T22:31:00.003-04:00</published><updated>2009-09-28T23:08:21.108-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='structural biology'/><category scheme='http://www.blogger.com/atom/ns#' term='evolution'/><title type='text'>Locking in new functions</title><content type='html'>The September 24th Nature came in the mail today and as always with this journal (otherwise I wouldn't pay for it!) is full of interesting stuff.   One paper of particular interest is a cool merger of evolution, computational biology, structural biology and protein engineering.&lt;br /&gt;&lt;br /&gt;An interesting question in evolution is to what degree are changes reversible.  In the simplest case, of purely neutral characteristics, the answer would seem to be largely that they are.  However, even a purely neutral change will have a certain probability of reverting.  For example, since transversions (mutation of a pyrimidine to a purine or vice versa) are less common than transitions (purine-&gt;purine or pyrimidine-&gt;pyrimidine), a C-&gt;G mutation (transversion) is less likely to return to C than a C-&gt;T (transition).  Similarly, if a C is methylated but that methylation serves no purpose, the methylation will favor conversion to a T, but the T has no such biochemical slanting to mutate to a C.  But even these will be small changes.&lt;br /&gt;&lt;br /&gt;But throw in some function, and the question gets more complicated.  The question that this paper addresses is a specific receptor, the glucocorticoid receptor. A previous paper by the group showed that the inferred ancestral form was promiscuous,  primarily bound some related steroids, but did have some affinity for glucocorticoids.  This ancestral form existed in the last common ancestor of cartilaginous and bony fishes but by the time of the last common ancestor for bony fishes and tetrapods (such as us) it had fixed a specificity for corticosteroids.  These inferred ancestral receptors are referred to respectively as AncGR1 and AncGR2.&lt;br /&gt;&lt;br /&gt;While there are 37 amino acid replacements between AncGR1 and AncGR2, it takes only two of these (group X) to switch the preference of AncGR1 to corticoidsteroids.  The change is accomplished by substantially swinging a helix to a new position in the ligand binding pocket (helix 7)  Only three more substitutions (group Y) enforce specificity for corticosteroids; make all 5 of these changes and you convert a promiscuous receptor with weak activity towards corticosteroids to one activated only by them.  But the interesting kicker is you can't make this second set of specificity-locking mutations until 2 other mutations (group Z) are made.  The issue is that the first two X mutations cause a significant structural shift which is not entirely stable; without the stability of the group Z pair of mutations the group Y specificity trio can't be tolerated.&lt;br /&gt;&lt;br /&gt;But, there's a kicker.  If you engineer the AncGR2 protein back to having the ancestral states for groups X, Y and Z, the resulting protein is non-functional for any ligand.  Something is going on somewhere in those other 30 changes.  Some further phylogenetic filtering suggested 6 strong candidates and the solution of the X-ray structure of the AncGR2 ligand binding domain (though it turns out the prior homology model of this structure was apparently almost dead on).  Five of the candidates (group W) turn out to either be in or to contact that swung helix 7.  The structure of AncGR1 had been previously solved and a comparison of the AncGR1 and AncGR2 structures showed that the ancestral (AncGR1) forms at these 5 positions stabilize the ancestral position of helix 7 and the derived (AncGR2) amino acids at these positions actually clash with the AncGR1 positioning of helix 7.  Aynthesis of AncGR2 with the ancestral amino acids at groups X, Y, Z and W yielded a receptor whose specificity is very like AncGR1.  One group W substitution had a strong enough effect it could imbue the ancestral phenotype even without the other group W changes but some of the other group W changes could be made only in pairs to show an effect.  Finally, receptors with the ancestral state for combinations of x, y and z mutations (e.g. combining with Xyz -- AncGR2 for X but AncGR1-like at y and z) and found that any combination with xW is non-functional.  AncGR2 with ancestral amino acids at x,y,z &amp; w is not as good a receptor as AncGR1 -- suggesting that at least some of the remaining 25 positions contribute.&lt;br /&gt;&lt;br /&gt;So, this is a well-detailed case where evolutionary change eventually blocked the route back to the start.  A receptor which made the group X changes could still bind the original ligands but that would be lost once the group Y changes were layered on.  Group Y changes were probably preceded by group Z changes which would have made reversion to the original binding specificity unlikely -- and the group W mutations really nail shut the door.&lt;br /&gt;&lt;br /&gt;This particular system was a single polypeptide chain.  But it is not difficult to see how the concept could extend to other biological systems.  Co-evolution of interacting proteins, such as a protein and its receptor, or modification of a developmental system could similarly proceed in a stepwise fashion that ultimately prevents retreat.  We are a bit lucky in this case that the evolutionary traces are all preserved where we can find them; it is not difficult to imagine a scenario where part of the ancestral form is lost from all extant lineages and therefore invisible to our current vision.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="float: left; padding: 5px;"&gt;&lt;a href="http://www.researchblogging.org"&gt;&lt;img alt="ResearchBlogging.org" src="http://www.researchblogging.org/public/citation_icons/rb2_large_gray.png" style="border:0;"/&gt;&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Nature&amp;rft_id=info%3Apmid%2F19779450&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=An+epistatic+ratchet+constrains+the+direction+of+glucocorticoid+receptor+evolution.&amp;rft.issn=0028-0836&amp;rft.date=2009&amp;rft.volume=461&amp;rft.issue=7263&amp;rft.spage=515&amp;rft.epage=9&amp;rft.artnum=&amp;rft.au=Bridgham+JT&amp;rft.au=Ortlund+EA&amp;rft.au=Thornton+JW&amp;rfe_dat=bpr3.included=1;bpr3.tags=Biology%2CStructural+Biology%2C+Evolutionary+Biology%2C+Computational+Biology"&gt;Bridgham JT, Ortlund EA, &amp; Thornton JW (2009). An epistatic ratchet constrains the direction of glucocorticoid receptor evolution. &lt;span style="font-style: italic;"&gt;Nature, 461&lt;/span&gt; (7263), 515-9 PMID: &lt;a rev="review" href="http://www.ncbi.nlm.nih.gov/pubmed/19779450"&gt;19779450&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I'll probably add to my spam issues by pointing this out, but this&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-7910595058025295165?l=omicsomics.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://omicsomics.blogspot.com/feeds/7910595058025295165/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=36768584&amp;postID=7910595058025295165' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/36768584/posts/default/7910595058025295165'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/36768584/posts/default/7910595058025295165'/><link rel='alternate' type='text/html' href='http://omicsomics.blogspot.com/2009/09/locking-in-new-functions.html' title='Locking in new functions'/><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='08368724497474381730'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-958044851312071593</id><published>2009-09-25T20:20:00.005-04:00</published><updated>2009-09-25T21:21:06.224-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='evolution'/><category scheme='http://www.blogger.com/atom/ns#' term='controversies'/><title type='text'>How many genomes did I just squash?</title><content type='html'>Yesterday was a good day for catching up on the literature; not only did I finally get around to the IL28B papers I blogged about yesterday, but I also took a run through the genome fusion paper which is being seen as the fitting marker of the end of the "Communicated by" mechanism of PNAS (sample coverage by &lt;a href="http://pipeline.corante.com/archives/2009/09/23/pnas_shuts_a_door.php"&gt;In The Pipeline&lt;/a&gt; and &lt;a href="http://www.sciencemag.org/cgi/content/full/325/5947/1486-b"&gt;Science&lt;/a&gt;, though the latter requires a subscription).&lt;br /&gt;&lt;br /&gt;The paper, by Donald Williamson and communicated by &lt;a href="http://en.wikipedia.org/wiki/Lynn_Margulis"&gt;Lynn Margulis&lt;/a&gt;, takes the position that  " in animals that metamorphose, the basic types of larvae originated as adults of different lineages, i.e., larvae were transferred when, through hybridization, their genomes were acquired by distantly related animals".  This is a whopper of a proposal and definitely interesting.&lt;br /&gt;&lt;br /&gt;Margulis is famous for proposing the endosymbiont hypothesis to explain mitochondria and chloroplasts and other organelles.  The gist of it is that some ancestral eukaryote took in a guest species and in the long run integrated it fully into its operations so that the two could not be separated.  An important observation which this explained is the fact that mitochondria and chloroplasts have their own genomes, which encode (almost?) entirely for proteins and RNAs used in these structures.  However, their genomes do not encode many of the proteins required -- indeed in metazoans such as ourselves only a tiny pittance of genes are encoded by the mitochondrial genome.  A further observation which fits into this framework is the curious case of Cyanophora paradoxa, a photosynthetic organism whose chloroplast-like structure is surrounded by a rudimentary cell wall.&lt;br /&gt;&lt;br /&gt;When I was an undergraduate, there was still significant controversy on the validity of the endosymbiont hypothesis.  I remember this well, as I wrote a term paper on the subject. What really nailed it down was the careful comparison of gene trees in the cases where the same function is required both in the organelle and in the cytoplasm and both are nuclear encoded.  In the vast majority of these cases, the two are evolutionarily distant from one another and in the case of chloroplasts the gene whose protein goes to the chloroplast looks more like homologs in cyanobacteria and the copy producing cytoplasmic protein looks more like homologs in non-photosynthetic eukaryotes.  There are some fascinating exceptions, such as cases in which one gene does double duty -- via (for example) alternative splicing or promoters including or excluding the chloroplast targeting sequences.&lt;br /&gt;&lt;br /&gt;Margulis and others have tried to extend this notion to other systems.  There are definitely other success -- unicellular organisms which appear to carry three genomes &amp; the always challenging to classify Euglena, which appears to be a genome fusion.  But there have also been some prominent non-successes, such as the eukaryotic flagellum/cillium.  Also when I was an undergraduate a &lt;a href="http://linkinghub.elsevier.com/retrieve/pii/0092-8674(89)90875-1"&gt;Cell paper&lt;/a&gt; made a big splash claiming to find a chromosome associated with the basal body, the organelle associated with flagellum synthesis.  However, this work was never repeated and the &lt;a href="http://www.sciencemag.org/cgi/content/full/318/5848/245"&gt;publication of the Chlamydomas genome&lt;/a&gt; failed to find such a chromosome. &lt;br /&gt;&lt;br /&gt;After reading the paper at hand, I'm both confused and disappointed.  The confusion is embarassing, but the paper goes into a lot of detail on taxonomy and gross development of which I'm horribly ignorant.  But, conversely the disappointment comes from what I do understand and how cursorily that is treated.  And since it is the stuff I understand which is the route Williamson proposes to test his hypothesis, that is a big let down.&lt;br /&gt;&lt;br /&gt;A key part that I do understand (minus a few terms I hadn't encountered before), with my emphasis:&lt;br /&gt;&lt;blockquote&gt;Many corollaries of my hypothesis are testable. If insects acquired larvae by hybrid transfer, the total base pairs of DNA of exopterygote insects that lack larvae will be smaller than those of endopterygote (holometabolous) species that have both larvae and pupae. &lt;span style="font-style:italic;"&gt;Genome sequences are known for the fruitfly, Drosophila melanogaster, the honeybee, Apis mellifera, the malarial mosquito, Anopheles gambiae, the red flour beetle, Tribolium castaneum, and the silkworm, Bombyx mori: holometabolous species, with marked metamorphoses.&lt;/span&gt; &lt;span style="font-weight:bold;"&gt;I predict that an earwigfly (Mercoptera Meropeidae), an earwig (Dermaptera), a cockroach (Dictyoptera), or a locust (Orthoptera) will have not necessarily fewer chromosomes but will have fewer base pairs of protein-coding chromosomal DNA than have these holometabolans. Also the genome of an onychophoran that resembles extant species will be found in insects with caterpillar or maggot-like larvae.&lt;/span&gt; Onychophoran genomes will be smaller than those of holometabolous insects. Urochordates, comprising tunicates and larvaceans, present a comparable case. Larvaceans are tadpoles throughout life. Garstang  regarded larvaceans as persistent&lt;br /&gt;tunicate larvae, and, if so, their genomes would resemble those of tunicates. But if larvaceans provided the evolutionary source of marine tadpole larvae, their genomes would be smaller and included in those of adult tunicates. The genome of the larvacean Oikopleura dioica is about one-third that of the tunicate Ciona intestinalis, consistent with my thesis&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;Williamson is obviously not an expert on genomics, but Margulis should have known better and pushed him to improve this section.  In the "communicated by" path, the academy member can basically hand-pick the reviewers and is supposed to act as an editor would.  &lt;br /&gt;&lt;br /&gt;The first problem is a rather naive view of genome size and evolution.  Genome sizes vary all over the map even within related species; Fugu to salmon is several fold as is fruit fly to malaria vector.  The latter pair is particularly relevant since these are both dipteran insects, and therefore in the same bin by Williamson's standard (as stated in the quoted text).  Now, that is overall genome size; if you restrict to protein coding regions these pairs are more similar, which leaves some wiggle room.  But, by the same token the &lt;a href="=http://www.sciencemag.org/cgi/pmidlookup?view=long&amp;pmid=11752568"&gt;Oikopleura&lt;/a&gt; and &lt;a href="=http://www.sciencemag.org/cgi/pmidlookup?view=long&amp;pmid=12481130"&gt;Ciona genomes&lt;/a&gt; contain about the same number of genes (~15-16K).&lt;br /&gt;&lt;br /&gt;But furthermore, his hypothesis should be quite testable &lt;span style="font-style:italic;"&gt;right now&lt;/span&gt;, at least in a basic form.  If a genome fusion occurred, then genes active in larval stages and genes active in the adult should show different gene trees if they are homologs.  Given that there is a lot of data to annotate which Drosophila genes are active when, this should be a practical exercise.  While I leave this as an exercise for the student, I would point out that it is already known that in Drosophila many proteins are active in both phases.  This can probably also be tallied in some fashion.  I'm guessing that the fraction of genes shared between stages will be quite large, which would not be very supportive of the fusion hypothesis.&lt;br /&gt;&lt;br /&gt;Should a paper like this get into a journal such as PNAS?  Given what I've written above, I think not, simply on its demerits.  On the other hand, crazy hypotheses do need a place to go because they are sometimes the right hypotheses -- Margulis's formulation of endosymbiont hypothesis had very tough sledding on its path to the textbooks.  However, in the modern world there is a place for odd speculations and journeying outside your expertise.  It's called a blog!&lt;br /&gt;&lt;span style="float: left; padding: 5px;"&gt;&lt;a href="http://www.researchblogging.org"&gt;&lt;img alt="ResearchBlogging.org" src="http://www.researchblogging.org/public/citation_icons/rb2_large_gray.png" style="border:0;"/&gt;&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Proceedings+of+the+National+Academy+of+Sciences+of+the+United+States+of+America&amp;rft_id=info%3Apmid%2F19717430&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=Caterpillars+evolved+from+onychophorans+by+hybridogenesis.&amp;rft.issn=0027-8424&amp;rft.date=2009&amp;rft.volume=&amp;rft.issue=&amp;rft.spage=&amp;rft.epage=&amp;rft.artnum=&amp;rft.au=Williamson+DI&amp;rfe_dat=bpr3.included=1;bpr3.tags=Biology%2CEvolutionary+Biology%2C+Computational+Biology"&gt;Williamson DI (2009). Caterpillars evolved from onychophorans by hybridogenesis. &lt;span style="font-style: italic;"&gt;Proceedings of the National Academy of Sciences of the United States of America&lt;/span&gt; PMID: &lt;a rev="review" href="http://www.ncbi.nlm.nih.gov/pubmed/19717430"&gt;19717430&lt;/a&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-958044851312071593?l=omicsomics.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://omicsomics.blogspot.com/feeds/958044851312071593/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=36768584&amp;postID=958044851312071593' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/36768584/posts/default/958044851312071593'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/36768584/posts/default/958044851312071593'/><link rel='alternate' type='text/html' href='http://omicsomics.blogspot.com/2009/09/how-many-genomes-did-i-just-squash.html' title='How many genomes did I just squash?'/><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='08368724497474381730'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-7527382912728523948</id><published>2009-09-24T19:25:00.009-04:00</published><updated>2009-09-25T10:06:03.548-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='GWAS'/><title type='text'>Unwarranted pessimism on IL28A/B &amp; HCV?</title><content type='html'>I finally got around to reading the Nature News &amp; Views article by Iadonato and Katze summarizing and opining on the recent quartet of papers linking genetic variation around IL26B and the response to standard therapy for Hepatitis C Virus.  The N&amp;V has at least one glaring flaw and also (IMHO) goes down the cliched route of concluding that the result will be clinically useless.&lt;br /&gt;&lt;br /&gt;The four GWAS studies found the same cluster of SNPs around IL28B, nicely cross-validating the studies.  One curious statement in the N&amp;V is&lt;br /&gt;&lt;blockquote&gt;Although all of the identified variants in the three studies lie in or near the IL28B gene, none of them has an obvious effect on the function of this gene, which encodes interferon-3, a growth factor with similarities to the interferon- preparations used as treatment.&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;Two of the papers provide direct evidence as to at least one effect of these SNPs; &lt;a href="http://dx.doi.org/10.1038/ng.447"&gt;one&lt;/a&gt; showed that the SNPs are linked to the expression of both IL28B and the nearby related gene IL28A; &lt;a href="http://dx.doi.org/10.1038/ng.449"&gt;the other&lt;/a&gt; looked only at IL28B.  Lower expression of these loci was correlated with the genotype with worse prognosis.&lt;br /&gt;&lt;br /&gt;The N&amp;V goes on with some boilerplate pessimism about GWAS studies impact on medicine&lt;br /&gt;&lt;blockquote&gt;The question remains, however, as to how readily these and other observations from GWAS can be translated into meaningful changes in patient care. The field of human genetics has described many associations between specific mutations and medically important outcomes, but rarely have these observations resulted in new therapies to treat disease or in major shifts in existing treatments. This failure is exemplified by the lack of clinical benefit that followed the cloning in 1989 of the gene responsible for cystic fibrosis11 — the first example of the use of molecular genetics to discover the cause of an otherwise poorly understood condition. Although some progress has been made in treating patients with cystic fibrosis, in the ensuing 20 years neither of the two newly approved drugs for this condition were developed using knowledge of the gene mutations that cause it. Apart from a few well-characterized beneficial mutations (for example, those resulting in resistance to HIV infection), genetics has been an inefficient tool for drug discovery.&lt;br /&gt;&lt;br /&gt;So although these findings raise the tantalizing prospect of a more personalized approach to treating HCV by tailoring treatment to patients who are most likely to benefit, the reality is more sobering. Diagnostic testing to identify likely responders to interferon may be a future possibility, but clinical decision-making will be clouded by the fact that the effect of the advantageous variant is not absolute — not all carriers of the variant clear the virus, nor do all patients lacking the variant fail to benefit from treatment. Furthermore, there is currently no alternative to interferon therapy for the HCV-infected population.&lt;/blockquote&gt;.  They also pile on with graphs showing the exponential growth of Genbank and dbSNP vs. the flat numbers for INDs (new drugs into trials) and NMEs (new approvals). &lt;br /&gt;&lt;br /&gt;Of course, I could respond with the boilerplate response (found in at least one of the papers) that patients with the "poor response" genotype.  And indeed, new HCV therapies are in the pipeline, perhaps most prominently a compound under development by Vertex.  Understanding if these variants affect response to the new compounds now becomes an important research question.  &lt;br /&gt;&lt;br /&gt;But, it's also stunning that the N&amp;V authors didn't suggest a rather obvious approach suggested by these papers.  Not only do patients with the "high expression" genotype respond better to therapy, but this genotype also predicts spontaneous clearance of the virus.  Furthermore, these loci encode secreted immune factors.  So to me at least, this can be viewed as a classic protein replacement therapy candidate -- a subset of patients produce too little of a natural protein (or two natural proteins) and providing them with recombinant protein might provide therapeutic benefit.  I suspect that whatever companies hold patent claims on IL28A &amp; IL28B are contemplating just such a strategy.  This is also in stark contrast to cystic fibrosis, where the affected protein is damaged rather than underexpressed and is a membrane protein not a secreted protein.  By focusing on the general difficulty of converting genetic information to therapy rather than the specific circumstances of these papers, the N&amp;V authors completely blew it.&lt;br /&gt;&lt;br /&gt;IL28A &amp; IL28B loci produce proteins classified as interferons and it is another interferon (alpha) which is a key part of the standard therapy.  A more extreme version (or a bit of the flip side) of the protein shortage theory would posit that the sum of the interferons is important for response -- and perhaps also for side effects.  If this were the case, then increasing the dose of alpha interferon in the "low expression" genotype (or better yet, actually typing patients white cells for expression of these proteins) might be a reasonable clinical approach.  Given that interferon alpha is already approved, this is the sort of clinical experimentation that goes on all the time.&lt;br /&gt;&lt;br /&gt;Yet another angle suggested by the "IL28A/B deficiency hypothesis" is that a viable therapeutic discovery approach is to find compounds which increase expression of IL28A and/or IL28B in leukocytes.  This has been a successful strategy for generating new therapeutic hypotheses in oncology.  Better yet, hints may already exist -- some enterprising student should search the Broad's Connection Map or other databases of expression data for cell lines treated with compounds to identify compounds which upregulate IL28A/B transcripts.  A hit in such a search or a broader screen of already approved compounds could potentially rapidly lead to clinical experiments.&lt;br /&gt;&lt;br /&gt;The one time I had an opportunity to write a N&amp;V (as a grad student) I got writer's block and missed the boat.  It will always irk me.  But, perhaps it's better to blow a chance silently rather than write such an awful, unimaginative one which stuck to stock genomics negativity rather than creatively exploring the topic at hand.&lt;br /&gt;&lt;br /&gt;&lt;span style="float: left; padding: 5px;"&gt;&lt;a href="http://www.researchblogging.org"&gt;&lt;img alt="ResearchBlogging.org" src="http://www.researchblogging.org/public/citation_icons/rb2_large_gray.png" style="border:0;"/&gt;&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Nature&amp;rft_id=info%3Apmid%2F19759611&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=Genomics%3A+Hepatitis+C+virus+gets+personal.&amp;rft.issn=0028-0836&amp;rft.date=2009&amp;rft.volume=461&amp;rft.issue=7262&amp;rft.spage=357&amp;rft.epage=8&amp;rft.artnum=&amp;rft.au=Iadonato+SP&amp;rft.au=Katze+MG&amp;rfe_dat=bpr3.included=1;bpr3.tags="&gt;Iadonato SP, &amp; Katze MG (2009). Genomics: Hepatitis C virus gets personal. &lt;span style="font-style: italic;"&gt;Nature, 461&lt;/span&gt; (7262), 357-8 PMID: &lt;a rev="review" href="http://www.ncbi.nlm.nih.gov/pubmed/19759611"&gt;19759611&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Nature&amp;rft_id=info%3Adoi%2F10.1038%2Fnature08309&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=Genetic+variation+in+IL28B+predicts+hepatitis+C+treatment-induced+viral+clearance&amp;rft.issn=0028-0836&amp;rft.date=2009&amp;rft.volume=461&amp;rft.issue=7262&amp;rft.spage=399&amp;rft.epage=401&amp;rft.artnum=http%3A%2F%2Fwww.nature.com%2Fdoifinder%2F10.1038%2Fnature08309&amp;rft.au=Ge%2C+D.&amp;rft.au=Fellay%2C+J.&amp;rft.au=Thompson%2C+A.&amp;rft.au=Simon%2C+J.&amp;rft.au=Shianna%2C+K.&amp;rft.au=Urban%2C+T.&amp;rft.au=Heinzen%2C+E.&amp;rft.au=Qiu%2C+P.&amp;rft.au=Bertelsen%2C+A.&amp;rft.au=Muir%2C+A.&amp;rft.au=Sulkowski%2C+M.&amp;rft.au=McHutchison%2C+J.&amp;rft.au=Goldstein%2C+D.&amp;rfe_dat=bpr3.included=1;bpr3.tags="&gt;Ge, D., Fellay, J., Thompson, A., Simon, J., Shianna, K., Urban, T., Heinzen, E., Qiu, P., Bertelsen, A., Muir, A., Sulkowski, M., McHutchison, J., &amp; Goldstein, D. (2009). Genetic variation in IL28B predicts hepatitis C treatment-induced viral clearance &lt;span style="font-style: italic;"&gt;Nature, 461&lt;/span&gt; (7262), 399-401 DOI: &lt;a rev="review" href="http://dx.doi.org/10.1038/nature08309"&gt;10.1038/nature08309&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Nature&amp;rft_id=info%3Apmid%2F19759533&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=Genetic+variation+in+IL28B+and+spontaneous+clearance+of+hepatitis+C+virus.&amp;rft.issn=0028-0836&amp;rft.date=2009&amp;rft.volume=&amp;rft.issue=&amp;rft.spage=&amp;rft.epage=&amp;rft.artnum=&amp;rft.au=Thomas+DL&amp;rft.au=Thio+CL&amp;rft.au=Martin+MP&amp;rft.au=Qi+Y&amp;rft.au=Ge+D&amp;rft.au=O%27huigin+C&amp;rft.au=Kidd+J&amp;rft.au=Kidd+K&amp;rft.au=Khakoo+SI&amp;rft.au=Alexander+G&amp;rft.au=Goedert+JJ&amp;rft.au=Kirk+GD&amp;rft.au=Donfield+SM&amp;rft.au=Rosen+HR&amp;rft.au=Tobler+LH&amp;rft.au=Busch+MP&amp;rft.au=McHutchison+JG&amp;rft.au=Goldstein+DB&amp;rft.au=Carrington+M&amp;rfe_dat=bpr3.included=1;bpr3.tags="&gt;Thomas DL, Thio CL, Martin MP, Qi Y, Ge D, O'huigin C, Kidd J, Kidd K, Khakoo SI, Alexander G, Goedert JJ, Kirk GD, Donfield SM, Rosen HR, Tobler LH, Busch MP, McHutchison JG, Goldstein DB, &amp; Carrington M (2009). Genetic variation in IL28B and spontaneous clearance of hepatitis C virus. &lt;span style="font-style: italic;"&gt;Nature&lt;/span&gt; PMID: &lt;a rev="review" href="http://www.ncbi.nlm.nih.gov/pubmed/19759533"&gt;19759533&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Nature+genetics&amp;rft_id=info%3Apmid%2F19749758&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=IL28B+is+associated+with+response+to+chronic+hepatitis+C+interferon-alpha+and+ribavirin+therapy.&amp;rft.issn=1061-4036&amp;rft.date=2009&amp;rft.volume=&amp;rft.issue=&amp;rft.spage=&amp;rft.epage=&amp;rft.artnum=&amp;rft.au=the+Hepatitis+C+Study&amp;rft.au=Suppiah+V&amp;rft.au=Moldovan+M&amp;rft.au=Ahlenstiel+G&amp;rft.au=Berg+T&amp;rft.au=Weltman+M&amp;rft.au=Abate+ML&amp;rft.au=Bassendine+M&amp;rft.au=Spengler+U&amp;rft.au=Dore+GJ&amp;rft.au=Powell+E&amp;rft.au=Riordan+S&amp;rft.au=Sheridan+D&amp;rft.au=Smedile+A&amp;rft.au=Fragomeli+V&amp;rft.au=M%C3%BCller+T&amp;rft.au=Bahlo+M&amp;rft.au=Stewart+GJ&amp;rft.au=Booth+DR&amp;rft.au=George+J&amp;rfe_dat=bpr3.included=1;bpr3.tags="&gt;the Hepatitis C Study, Suppiah V, Moldovan M, Ahlenstiel G, Berg T, Weltman M, Abate ML, Bassendine M, Spengler U, Dore GJ, Powell E, Riordan S, Sheridan D, Smedile A, Fragomeli V, Müller T, Bahlo M, Stewart GJ, Booth DR, &amp; George J (2009). IL28B is associated with response to chronic hepatitis C interferon-alpha and ribavirin therapy. &lt;span style="font-style: italic;"&gt;Nature genetics&lt;/span&gt; PMID: &lt;a rev="review" href="http://www.ncbi.nlm.nih.gov/pubmed/19749758"&gt;19749758&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Nature+genetics&amp;rft_id=info%3Apmid%2F19749757&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=Genome-wide+association+of+IL28B+with+response+to+pegylated+interferon-alpha+and+ribavirin+therapy+for+chronic+hepatitis+C.&amp;rft.issn=1061-4036&amp;rft.date=2009&amp;rft.volume=&amp;rft.issue=&amp;rft.spage=&amp;rft.epage=&amp;rft.artnum=&amp;rft.au=Tanaka+Y&amp;rft.au=Nishida+N&amp;rft.au=Sugiyama+M&amp;rft.au=Kurosaki+M&amp;rft.au=Matsuura+K&amp;rft.au=Sakamoto+N&amp;rft.au=Nakagawa+M&amp;rft.au=Korenaga+M&amp;rft.au=Hino+K&amp;rft.au=Hige+S&amp;rft.au=Ito+Y&amp;rft.au=Mita+E&amp;rft.au=Tanaka+E&amp;rft.au=Mochida+S&amp;rft.au=Murawaki+Y&amp;rft.au=Honda+M&amp;rft.au=Sakai+A&amp;rft.au=Hiasa+Y&amp;rft.au=Nishiguchi+S&amp;rft.au=Koike+A&amp;rft.au=Sakaida+I&amp;rft.au=Imamura+M&amp;rft.au=Ito+K&amp;rft.au=Yano+K&amp;rft.au=Masaki+N&amp;rft.au=Sugauchi+F&amp;rft.au=Izumi+N&amp;rft.au=Tokunaga+K&amp;rft.au=Mizokami+M&amp;rfe_dat=bpr3.included=1;bpr3.tags="&gt;Tanaka Y, Nishida N, Sugiyama M, Kurosaki M, Matsuura K, Sakamoto N, Nakagawa M, Korenaga M, Hino K, Hige S, Ito Y, Mita E, Tanaka E, Mochida S, Murawaki Y, Honda M, Sakai A, Hiasa Y, Nishiguchi S, Koike A, Sakaida I, Imamura M, Ito K, Yano K, Masaki N, Sugauchi F, Izumi N, Tokunaga K, &amp; Mizokami M (2009). Genome-wide association of IL28B with response to pegylated interferon-alpha and ribavirin therapy for chronic hepatitis C. &lt;span style="font-style: italic;"&gt;Nature genetics&lt;/span&gt; PMID: &lt;a rev="review" href="http://www.ncbi.nlm.nih.gov/pubmed/19749757"&gt;19749757&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;(ugh: had a serious typo in the title on first posting; now fixed &amp; revised)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-7527382912728523948?l=omicsomics.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://omicsomics.blogspot.com/feeds/7527382912728523948/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=36768584&amp;postID=7527382912728523948' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/36768584/posts/default/7527382912728523948'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/36768584/posts/default/7527382912728523948'/><link rel='alternate' type='text/html' href='http://omicsomics.blogspot.com/2009/09/unwarranted-pessimism-on-il26b-hcv.html' title='Unwarranted pessimism on IL28A/B &amp; HCV?'/><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='08368724497474381730'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-6359271542002463498</id><published>2009-09-23T22:27:00.002-04:00</published><updated>2009-09-23T23:12:21.530-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='conferences'/><title type='text'>CHI Next-Gen Conference, Day 3 (final)</title><content type='html'>Final day of conference, with some serious fatigue setting in (my hotel room was too close to, and faced, a highway. Doh!)&lt;br /&gt;&lt;br /&gt;Discovered that I was indeed getting a reputation.  Two people I met today asked about my recurrent interest in FFPE (Formalin Fixed, Paraffin Embedded) -- which is how most of the nucleic acids I want to work with are stored.  FFPE is notoriously difficult for molecular studeis, with the informational macromolecules having been chemically and physically abused in the fixation process, but it is also famously stable, preserving histological features for years.&lt;br /&gt;&lt;br /&gt;Rain Dance sponsored the breakfast &amp; announced that their maximum primer library size has gone up to 20K.  To back up, Rain Dance uses microfluidics to create libraries of very tiny (single digit picoliter) droplets in which each droplet contains a primer pair.  The precise volume control &amp; normalization of the concentrations means that each primer droplet contains about the same number of oligos, which allows each droplet in a PCR to be run to completion -- meaning that efficient PCRs and inefficient ones in theory end up both having the same number of product molecules.  Another set of droplets are created which contain your template DNA, and these are cleverly merged &amp; the whole emulsion cycled.  Break up the emulsion &amp; you have lots of PCR amplicons ready to go into a fragmentation protocol.  Their movies of droplets marching around, splitting, merging, etc. are dangerously mesmerizing!&lt;br /&gt;&lt;br /&gt;Jin Billy Li of the Church group reviewed all the really cool stuff they've done using padlock probes (and confirmed that IP conflicts are retarding indefinitely any commercialization of these).  A padlock probe is a long DNA which primes on both sides of a targeted region.  Filling the gap between &amp; ligating the gap yields a circle, which can be purified away from any uncircularized DNA and then amplified with universal primers.  Turns the multiplex PCR problem into a very diverse set of uniplex PCRs.  Various tweaks have substantially improved uniformity, though there is still room for improvement (but the same is true for the hybridization approaches).&lt;br /&gt;&lt;br /&gt;Nicolas Bergman presented data on transcriptomic complexity in B.anthracis. I think most of this is published, but I hadn't seen it.  A very striking result is that an awful lot (~88%) of transcripts in a supposedly uniform culture are present at much less than 1 copy per cell.  He mentioned that small numbers of spores are seen in log cultures, and this might explain it.  Also showed that many unannotated genes -- including some that had been truly UNannotated (originally annotated but then removed from the catalogs) are clearly transcribed.  Operon structures could be worked out, with 90% matching computational predictions -- and in ~30 testest experimentally by RT-PCR there was 100% concordance.  &lt;br /&gt;&lt;br /&gt;Epicentre gave an overview of their clever system for fragmenting DNA upstream of either 454 or Illumina.  By hijacking a transposase in a clever way, they not only break up the DNA but add on defined sequences.  For 454 you then jam on the 454 primers &amp; just get stuck reading 19nt of transposase each time; for Illumina you must use custom sequencing primers.&lt;br /&gt;&lt;br /&gt;Eric Wommack &amp; Shawn Polson of University of Delaware (Go Hens!) described work on metagenomics of bacteriophages in seawater.  Here's a stunning estimate: if you lined all the world's phages end-to-end, they would stretch 60 &lt;span style="font-style:italic;"&gt;light years&lt;span style="font-weight:bold;"&gt;&lt;/span&gt;&lt;/span&gt;.   Also striking is the high level of bacteriophage-driven turnover of oceanic bacteria -- in about 1/2 to 2 days there is 100% turnover.  This is a huge churn of the biochemical space. &lt;br /&gt;&lt;br /&gt;Stacey Gabriel gave an update on the Broad's Cancer Genomics effort.  Some whole genomes (25 tumor+normal pairs so far) and a lot of exonic sequencing.  So far, not a lot of lightning though -- in one study the only thing popping out so far is p53, which is disappointing.  Using the Agilent system (developed at the Broad), they can scan 20Kgenes in 1/2 an Illumina run, with 82% of their targeted sequences having at least 14 reads covering.&lt;br /&gt;&lt;br /&gt;Matthew Ferber at the Mayo described trying to replace Sanger assays for inherited disorders with 454 and Illumina based approaches.  He underscored that this isn't for research -- these are actual diagnostic tests used to determine treatments, such as prophylactic removal of the colon if inherited colon cancer is likely.  Capture of the targets on the Nimblegen chips were done and the recovered DNA split to do 454 &amp; Illumina sequencing in parallel. The two next gen approaches came close -- but neither found enough that they could be relied on.  Also, some targets are just not recoverable by array capture and would need to be backstopped by something else. One caveat: older technology was used in both cases, so it may be with longer read lengths on both platforms the higher coverage &amp; higher mapping confidence needed would be obtained.  On the other hand, some of the mutations were picked to be difficult for the platforms (small indel for Illumina, homopolymer run of &gt;20 for 454) and might remain problems even with more coverage.  PCR amplification in place of chip capture is another approach that might improve coverage and get some targets missed by the chip (this is certainly a claim RainDance made in their presentation).&lt;br /&gt;&lt;br /&gt;The last talk I took notes on was by Michael Zody on signatures of domestication in chickens.  If I had organized things, this would have been just before or after the phage talk!  Alas, while the Rhode Island Red was amongst the lines sequenced (apropos the location) Blue Hens were missing -- how could that be?  Seriously, the basic design was to sequence pools of DNA from either various domestic chicken lines or the Red Jungle Fowl (representing pre-domestication chicken).  Some of these lines were commercial egg layer strains and others commercial broiler (meat) strains.  He commented that this level of specification occured very resently (forgot to write down when, but I think it was around a century ago).  Two other strains are interesting as they have been selected for about 50 years for one to be very heavy and the other lean -- apparently the heavy line will eat itself silly and the other nearly starves itself.  1 SOLiD slide on each of the 10 pools was used to call out SNPs and various strategies were used to filter out errors in the new data as well as variation due to errors in the reference sequence (in some cases, even typing the reference DNA to demonstrate the need for correction).  Reduced heterozygosity was seen around BCDO3, which gives modern chickens their yellow skin (positive control) and also a bunch of other loci -- but those are still under wraps.  They also looked for deletions in exons which appear to have been fixed in various lines, and found 1284 which are fixed in one or more domestic lines relative to the Red Jungle Fowl.  One interesting one (which is present in the Red Jungle Fowl at low frequency) has gone homozygous (I think; my notes here show fatigue) in the high growth line but is either absent or heterozygous in the low growth line (terrible notes!).  It's a 19kb deletion that clips out exons 2-5 (based on the human homolog; there isn't a good transcript sequenced for chicken) and RT-PCR confirms the gene is expressed in the hypothalamus, which has been previously implicated in controlling the feeding behavior.&lt;br /&gt;&lt;br /&gt;I took almost no notes on the last talk, looking at dietary influences on gut microbiome (and also, regrettably, had to leave early to make sure I made school night) but it did feature some more "extreme genomics" -- microbiome studies on burmese pythons!&lt;br /&gt;&lt;br /&gt;One last thought: sequencing techs represented were either here-and-now (the players you can actually buy) or pretty-distant-future; absent were PacBio and Oxford Nanopore and the host of other companies (save NABSys) announced in the last 3-4 years in this space.  Have the others just disappeared quietly or are they in stealth mode? It's hard to imagine the conference would have deliberately snubbed them, which would be a third possibility.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-6359271542002463498?l=omicsomics.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://omicsomics.blogspot.com/feeds/6359271542002463498/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=36768584&amp;postID=6359271542002463498' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/36768584/posts/default/6359271542002463498'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/36768584/posts/default/6359271542002463498'/><link rel='alternate' type='text/html' href='http://omicsomics.blogspot.com/2009/09/chi-next-gen-conference-day-3-final.html' title='CHI Next-Gen Conference, Day 3 (final)'/><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='08368724497474381730'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-4802940967423838292</id><published>2009-09-22T21:34:00.004-04:00</published><updated>2009-09-22T22:16:22.196-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='conferences'/><title type='text'>CHI Next-Gen Conference, Day 2</title><content type='html'>I'll confess that in the morning I took notes on only one talk, but the afternoon got back into gear.&lt;br /&gt;&lt;br /&gt;The morning talk was by John Quackenbush over at Dana Farber Cancer Institute and covered a wide range of topics.  Some was focused on various database approaches to tracking clinical samples but a lot of the talk was on microarrays.  He described a new database his group has curated from the cancer microarray literature called GeneSigDb.  He also described some work on inferring networks from such data &amp; how it is very difficult to do with no prior knowledge, but with a little bit of network information entered in a lot of other interactions fall out which are known to be real.  He also noted that if you look at the signatures collected in GeneSigDb, most human genes are in at least one -- suggesting either cancer affects a lot of genes (probable) and/or a lot of the microarray studies are noisy (certainly!).  I did a similar curation at MLNM (whose results were donated to Science Commons when the group dissolved, though I think it never quite emerged from there) &amp; saw the same pattern.  I'd lean heavy on "bad microarray studies", as far too many studies on similar diseases come up with disjoint results, whereas there are a few patterns which show up in far too many results (suggesting, for example, that they are signatures of handling cells not signatures of disease).  He also described some cool work initiated in another group but followed-up by his group of looking at trajectories of gene expression during the forced differentiation of a cell line.  Using two agents that cause the same final differentiated state (DMSO &amp; all-trans retinoic acid), the trajectories are quite different even with the same final state.  Some talk at the end of attractors &amp; such.&lt;br /&gt;&lt;br /&gt;In the afternoon I slipped over to the "other conference" -- in theory there are two conferences with some joint sessions &amp; a common vendor/poster area, but in reality there isn't much reason to hew to one or the other &amp; good-sounding talks are split between them.  I did, alas, accidentally stick myself with a lunch ticket for a talk on storage -- bleah!  But, the afternoon was filled with talks on "next next" generation approaches, and despite (or perhaps because of, as the schedule had been cramped) two cancellations, it was a great session.&lt;br /&gt;&lt;br /&gt;All but one of the talks at least mentioned nanopore approaches, which have been thought about for close to two decades now.  Most of these had some flavor of science fiction to them in my mind, though I'll freely admit the possibility that this reflects more the limitations of my experience than wild claims by the speakers.&lt;br /&gt;&lt;br /&gt;One point of (again, genteel) contention between the speakers was around readout technology, with one camp arguing that electrical methods are the way to go, because that is the most semiconductor-like (there is a bit of a cult worship of the semiconductor industry evident at the meeting).  Another faction (well, one speaker) argues that optics is better because it can be more naturally multiplexed.  Another speaker had no multiplexing in his talk, but that will be covered below&lt;br /&gt;&lt;br /&gt;Based on the cluster of questioners (including myself) afterwards, the NABSys talk by John Oliver had some of the strongest buzz.  The speaker showed no data from actual reads and was circumspect about a lot of details, but some important ones emerged (at least for me; perhaps I'm the last to know).  Their general scheme is to fragment DNA to ~150Kb (well, that's the plan -- so far they go only to 50Kb) and create 384 such pools of single-stranded DNA.  Each pool is probed with a set of short (6-10) oligonucleotide probes.  Passing a DNA through a machined pore creates a distinct electrical signal for an aligned probe vs. a single stranded region.  You can't tell which probe just rode through, but the claim is that by designing the pools carefully and comparing fingerprints you can infer a complete "map" and ultimately a sequence, with some classes of sequence which can't be resolved completely (such as long simple repeats).  While no actual data was shown, in conversation the speaker indicated that they could do physical mapping right now, which, I doubt is a big market but would be scientifically very valuable (and yes, I will get back to &lt;a href="http://researchblogging.org/post/gotourl/id/135627"&gt;my series on physical maps&lt;/a&gt; &amp; finish it up soon).&lt;br /&gt;&lt;br /&gt;Oliver did have a neat trick for downplaying the existing players.  It is his contention that any system that can't generate 10^20 bases per year isn't going to be a serious player in medical genomics.  This huge figure is arrived at by multiplying the number of cancer patients in the developed world by 100 samples each and 20X coverage.  The claim is that any existing player would need 10^8 sequencers to do this (Illumina is approaching 10^3 and SOLiD 10^2).  I'm not sure I buy this argument -- there may be value in collecting so many samples per patient, but good luck doing it!  It's also not clear that the marginal gain from the 11th sample is really very much (just to pick an arbitrary number).  Shave a factor of 10 off there &amp; increase the current platforms by a factor of 10 and, well, you're down to 10^6 sequencers.  Hmm, that's still a lot.  Anyway, only if the cost gets down to 10s of dollars could national health systems afford any such extravagance.  &lt;br /&gt;&lt;br /&gt;Another speaker, Derek Stein of Brown University (whose campus I stumbled on today whilst trying to go from my distant hotel to the conference on foot) gave an interesting talk on trying to marry nanopores to mass spec.  The general concept is to run the DNA through the pore, break off each nucleobase on the other side &amp; slurp that into the mass spec for readout.  It's pretty amazing -- one one side of the membrane a liquid and the other a vacuum!  It's just beginning and a next step is to prove that each nucleotide gives a distinct signal. Of course, one possible benefit of this readout is that covalent epigenetic modifications will probably be directly readable -- unless, of course, the modified base has a mass too close to one of the other bases.  &lt;br /&gt;&lt;br /&gt;Another nanoporist, Amit Meller at Boston University, is back in the optical camp. The general idea here is for the nanopore to strip off probes from a specially modified template.  the probes make a rapid fluorescent flash -- they are "molecular beacons" which are inactive when hybridized to template, become unquenched when the come off but then immediately fold unto themselves and quench again.  Meller was the only nanopore artist to actually show a read -- 10nt!!!  One quirk of the system is that a cyclic TypeIIS digestion &amp; ligation process is used to substitute each base in the original template with 2 bases to give more room for the beacon probes.  He seemed to think read lengths of 900 will be very doable and much longer possible.&lt;br /&gt;&lt;br /&gt;One other nanopore talk was from Peiming Zhang at Arizona State, who is tackling the readout problem by having some clever molecular probes to interrogate the DNA after it exits the nanopore.  He also touched on sequencing-by-hybridization &amp; using atomic microscopy to try to read DNA.&lt;br /&gt;&lt;br /&gt;The one non-nanopore talk is one I'm wrestling with my reaction to it.  Xiaohua Huang at UCSC described creating a system that marries some of the best features of 454 with some of the features of the other sequencing-by-synthesis systems.  His talk helped crystalize in my mind why 454 has such long read lengths but also is a laggard in density space.  He attributed the long reads to the fact that 454 uses natural nucleotides rather than the various reversible terminator schemes. But, since pyrosequencing is real-time you get fast reads but the camera must always watch every bead on the plate.  In contrast, the other systems can scan the camera across their flowcells, enabling one camera to image many more targets -- but the terminators don't always reverse successfully.  His solution is to use 90% natural nucleotides and 10% labeled nucleotides -- but &lt;span style="font-style:italic;"&gt;no&lt;/span&gt; terminators.  After reading one nucleotide, the labels are stripped (he mentioned photobleaching, photolabile tags and chemical removal as all options he is working with) and the next nucleotide flowed in.  It will have the same trouble with long mononucleotide repeats as 454 -- but also should have very long read lengths.  He puts 1B beads on his plates -- and has some clever magnetic and electric field approaches to jiggle the beads around so that nearly every well gets a bead.  In theory I think you could run his system on the Polonator, but he actually built his own instrument.&lt;br /&gt;&lt;br /&gt;If I had to rate the approaches by which is most likely to start generating real sequence data, I'd vote for Huang -- but is that simply because it seems more conservative?  NABSys talks like they are close to being able to do physical maps -- but will that be a dangerous detour?  Or simply too financially uninteresting to attract their attention?  The optically probed nanopores actually showed read data -- but what will the errors look like?  Will the template expansion system cause new errors? &lt;br /&gt;&lt;br /&gt;One minor peeve: pretty much universally, simulations look too much like real data and need more of a scarlet S on them.  On the other hand, I probably should have a scarlet B on my forehead, since I've only once warned someone that I blog.  One movie today of DNA traversing a nanopore looked very real, but was mentioned later to be simulated.  Various other plots were not explained to be simulations until near the end of the presentation of that slide.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-4802940967423838292?l=omicsomics.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://omicsomics.blogspot.com/feeds/4802940967423838292/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=36768584&amp;postID=4802940967423838292' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/36768584/posts/default/4802940967423838292'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/36768584/posts/default/4802940967423838292'/><link rel='alternate' type='text/html' href='http://omicsomics.blogspot.com/2009/09/chi-next-gen-conference-day-2.html' title='CHI Next-Gen Conference, Day 2'/><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='08368724497474381730'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-3676414067556160966</id><published>2009-09-21T22:46:00.002-04:00</published><updated>2009-09-21T23:16:30.231-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='conferences'/><title type='text'>CHI Next-Gen Conference, Day 1</title><content type='html'>Interesting set of talks today.  I never did explicitly check on the blogging policy, but given that the session chair kidded a speaker that I would be blogging her live, it wouldn't seem to be a problem.  I would honor a ban (particularly since blogging is a bit hard to hide after the fact!), but quite a few folks were photographing slides despite an admonition not to (one person was clearly worried neither about being caught nor being courteous nor being clever; he had his flash on, which is clearly useless for projected images!).&lt;br /&gt;&lt;br /&gt;The morning talks ended up as just a trio.  The best of the three was Robert Cook-Deegan's talk "So my genome costs less than my bike, what's the big deal?".  He obviously has more expensive tastes in bicycles than I do -- or knows a really cheap genome shop!  He covered a lot of the ground around what sort of regulatory model will encompass personal genome sequencing.  The U.S. weakly and Germany strongly have gone with the model that genome sequencing should be treated like a diagnostic with M.D.s as the absolute gatekeeper (a position which is rather vocally promoted by certain bloggers).  Cook-Deegan pointed out something that increasingly worries me, which is that this locks genome sequencing into a very expensive cost model which doesn't improve with scale; you are locking in some very pricey labor that will only increase in price.  Cook-Deegan also felt that M.D.s were being picked as the gatekeeper primarily because they are who the regulators are comfortable with historically, not because they are particularly well-trained for the job.&lt;br /&gt;&lt;br /&gt;Jonathan Rothberg gave an entertaining talk on his various ventures, which built up to Ion Torrents but where the crescendo was expected by the audience there was instead the request for audience questions.  Ion Torrents seems to be a company (Joule is another) which is still trying to be in the public eye without releasing any key information.  Understandable, but frustrating.&lt;br /&gt;&lt;br /&gt;Henry Erlich gave a nice presentation on using PCR amplification and 454 sequencing to do HLA typing for transplantation.  All sorts of advantages to 454 over Sanger here, but cost will probably remain an issue and definitely corral this in very large centers (one 454 run, with multiplexing, can type ~20 samples).&lt;br /&gt;&lt;br /&gt;Lunch was given over to IT stuff.  CycleComputing presented their bioinformatics-friendly gateway to Amazon's cloud computing stuff (plus some benchmarking).  I'll confess to checking email during the presentation on compressing data on servers; far too IT for me.&lt;br /&gt;&lt;br /&gt;The afternoon was devoted to a series of presentations by the 6 next gen sequencing platforms with some flavor of being here-and-now: 454, SOLiD, GA2, Helicos, Dover (Polonator) &amp; Complete Genomics.  Actually, that was an interesting theme running through some talks, with Illumina saying "we're now gen, not next gen" whereas Complete Genomics calls themselves "third gen".  The talks were all genteel but contained pokes at each other.  &lt;br /&gt;&lt;br /&gt;For example, 454 trumpeted a comparison of two unpublished cucumber genome sequences, one by Illumina+Sanger and one by 454.  The 454 16X assembly had a contig N50 of 87Kb vs. 9Kb for a 50X Illumina assembly (no mention made of the amount of paired end data in either, I think -- though now I'm not sure).  454 also declared they've had one perfect read 997 long, though they were open that commercial runs near this are long in the future.  &lt;br /&gt;&lt;br /&gt;The SOLiD speaker emphasized all the different applications of their technology, using a published graphic that later turned out to have been commissioned by Helicos.  Illumina's speaker emphasized the simpler sample prep over emulsion PCR systems (i.e. 454, SOLiD &amp; Polonator).  &lt;br /&gt;&lt;br /&gt;Helicos promised even simpler sample prep and offered tantalizing hints of good stuff to come -- such as my nemesis of sequencing from FFPE slides.  Helicos did detail their paired-end protocol, which is very clever (after reading a bunch of sequence, a set of timed extensions with all 4 nucleotides gives jumps of various distributions which are then followed by more reads.  Clearly this will only work with single molecule sequencing, at least in that form (must ponder thought of how to either improve this or get it to work on Illumina-style platform).   Helicos also tantalized with a bunch of data from different applications, suggesting that some more publications from this platform are imminent.&lt;br /&gt;&lt;br /&gt;Danaher's talk was mostly on details of the instrument, which is the only one actually at the conference &amp; is running.  Always fascinated by moving machines, I watched it for a while -- and it demos very nicely, with the stage moving &amp; illuminator flashing &amp; filter wheel spinning.  Polonator has very short reads compared to the other platforms, but is promising very low cost which could make it a contender.  &lt;br /&gt;&lt;br /&gt;Finally, Complete showed off their sequencing center approach.  One striking fact is that their read lengths are actually extremely short -- but they extract a quartet of paired short reads.  Clearly their recent announced delivery of genomes has improved their credibility &amp; they also detailed some very neat medical genetics results which are presumably going to hit the journals very soon -- in which case they will have complete lab cred.  It was pointed out in the discussion panel &amp; in several talks that human sequencing is not the whole world, but even their competitors did not violently object (and therefore seemed to grudgingly acquiesce) that Complete may grab the lion's share of the human genome sequencing market, with the other players going after non-human sequencing or human areas like FFPE or transcriptome sequence where Complete isn't positioning themselves.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-3676414067556160966?l=omicsomics.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://omicsomics.blogspot.com/feeds/3676414067556160966/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=36768584&amp;postID=3676414067556160966' title='9 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/36768584/posts/default/3676414067556160966'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/36768584/posts/default/3676414067556160966'/><link rel='alternate' type='text/html' href='http://omicsomics.blogspot.com/2009/09/chi-next-gen-conference-day-1.html' title='CHI Next-Gen Conference, Day 1'/><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='08368724497474381730'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>9</thr:total></entry></feed>