Post a Comment On: Computational Complexity

"Is Cheminformatics the new Bioinformatics? (Guest Post by Aaron Sterling)"

29 Comments -

1 – 29 of 29
Anonymous John Sidles said...

Please let me say that Aaron's guest post is one of the best-ever here on Computational Complexity.

At my university I regularly attend the weekly seminar on synthetic biology that is hosted by David Baker's group ... this seminar is standing-room-only for a predominantly young audience ... and all the computational themes that Aaron's guest-post emphasizes are prominently on display.

As a unifying context for the various links that Aaron provides, I would like to recommend the International Roadmap Committee (IRC) "More than Moore" White Paper. The IRC abstracts five elements for rapid progress in STEM enterprises: FOM, a Figure of Merit for assessing progress; LEP, a Law of Expected Progress describing the cadence of expected improvement; WAT, Wide Applicability of Technology, SHR, willingness to SHaRe the key elements responsible for progress; and ECO, an Existing COmmunity upon which to build what the IRC calls "virtuous circles" of technological progress.

In abstracting these elements, the IRC has, I think, done us all a tremendous service.

As the links of Aaron's post show implicitly, the Moore-style acceleration of progress in computational biology is associated to a culture that broadly and consciously embraces the IRC's "More than Moore" enterprise elements of FOM, LEP, WAT, SHR, and ECO. And I can personally testify, from attending the Baker Group's lively synthetic biology seminars, that this roadmap for linking math, science, and engineering is workable and fun! :)

Hmmmm ... to borrow a theme from Lance's previous post The Ideal Conference, if the computer science were to consciously embrace FOM, LEP, WAT, SHR, and ECO, then what informatic themes might a CS conference emphasize? Here I think Bill Thurston provides us with a mighty good answer:

----------------------
"Mathematics is an art of human understanding. ... Mathematical concepts are abstract, so it ends up that there are many different ways that they can sit in our brains. A given mathematical concept might be primarily a symbolic equation, a picture, a rhythmic pattern, a short movie---or best of all, an integrated combination of several different representations."
----------------------

To conceive of computational complexity as a Thurston-style "integrated combination of mathematical representations" that "sit in our brains in many different ways," with a deliberate view toward fostering in CS the "More than Moore" enterprise elements of FOM, LEP, WAT, SHR, and ECO, in order to grasp the opportunities and challenges in chemoinformatics (and many other enterprises) that Aaron's post identifies ... aye, lasses and laddies ... now *that* would be "An Ideal Conference". Not least because job opportunities in this field are burgeoning. Good! :)

I have to say, though, that at any such conference, the computer scientists will learn at least as much from the biologists, chemists, and medical researchers, as the biologists, chemists, and medical researchers will learn from the computer scientists. Also good! :)

12:21 PM, January 31, 2011

Blogger Suresh said...

Clearly I was ahead of my time :). My thesis in 1999 was on chemoinformatic algorithms, specifically finding pharmacophores for drug design.

12:33 PM, January 31, 2011

Anonymous Anonymous said...

The problem you referred is similar to problems in computer vision and machine learning.

1:04 PM, January 31, 2011

Anonymous matt said...

It is sad that the first example in this post involves patent law as a motivation for a computational problem. There are already enough interesting problems given to us by nature! But great post otherwise.

1:53 PM, January 31, 2011

Blogger Egon Willighagen said...

Dear Aaron, would you be so kind to update your book review to list my actual last name, "Willighagen"?

Thanx!

1:55 PM, January 31, 2011

Blogger Egon Willighagen said...

(Returning after having read the full blog post, and a good part of the review.)

@Aaron, interesting review! I believe you conclude that the algorithms in the book are pretty basic... sadly, that was deliberate... it's more oriented at showing the casual chemist what cheminformatics algorithms are about, rather than explaining to cutting-edge algorithms there are in cheminformatics (...), which would make the book unreadable to the target audience. But in doing so, we alienated the people we (well, I do) would love to collaborate more with! :(

As was clear from particularly Rajarshi's chapter, there is a large open source cheminformatics community, who is very open (at least I am!) to collaboration, and within the CDK we actually have such in the past.

One more exciting problem in chemistry you may find attractive, is the enumeration or even counting of possible graphs given a number of atoms and bonds (vertices and edges). Now, a chemical graph is a colored graph, and not all edges are allowed. Moreover, we are only interested in graphs that are non-symmetrical.

Outstanding problems here are to calculate the number of chemical graphs given a number of atoms (as in a molecular formula, like C4H10O) without enumerating all structures.

Secondly, 'we' would love an open source implementation of an efficient algorithm to enumerate all chemical graphs. The efficiency here relies primarily in not calculating solutions for which there already has been calculated a different solution, which happens to be symmetrical equivalent.

Now, this problem has been solved in the proprietary Molgen software, but may provide you with the right amount of complexity you are looking for.

2:16 PM, January 31, 2011

Blogger Joerg Kurt Wegner said...

Thank you for the long and critical review. I pretty much appreciate a critical view on the "computer science" side of cheminformatics before I have to read yet another questionable analysis mixing combinatorial and continuous problems and then concluding that it just works fine for "this non-benchmark data set".

As pointed out multiple times, the lack of multi-label graph standards is still a major problem in this area, and things are just getting worse when going large-scale, or to 3D modelling problems.

And its Wegner, not Wagner (chapter 4). Thanks, again, was fun to read.

2:54 PM, January 31, 2011

Comment deleted

This comment has been removed by the author.

2:55 PM, January 31, 2011

Blogger Joerg Kurt Wegner said...

I would like to throw another (depressing) book into the discussion, it is the Handbook of molecular descriptors. There you will find yet another 1001 descriptors leaving at least me with the feeling that there are too many names for the very same algorithm by just changing a tunable parameter (often the labeling function). So, I do not understand why people rather create unrelated and unoptimizable combinatorial problems, while they could just turn things into smooth and optimizable problems. Then it would be so much clearer for everyone that the optimal parameter set has to differ for the underlying data, while the computing procedure remains the same. Besides, on the long term we would learn so much more. Though, I do not believe that many computer scientists do even remotely understand the challenges we observe in the life science arena, at the end of the day silicon and carbon are different.

Anyway, developing a common understanding is already a good start.

4:46 PM, January 31, 2011

Blogger Suresh said...

One bigger problem with getting involved with this work is access to data. From my understanding, a lot of the more interesting data sets (molecule collections etc) are proprietary and under lock and key in drug companies. Or am I wrong here ?

5:19 PM, January 31, 2011

Anonymous John Sidles said...

Suresh asks: a lot of the more interesting data sets (molecule collections etc) are proprietary and under lock and key in drug companies. Or am I wrong here ?

Nowadays, especially for younger scientists, the scientific ideal of "data" is flexing and bending like the Tacoma Narrows Bridge ... e.g., from NMR spectra ("raw data") is deduced a set of distance constraints ("reduction #1") from which is deduced a set of candidate ground-state structures ("reduction #2") from which is deduced a set of binding energies ("reduction #3") from which is deduced a set of enhanced-binding mutations ("reduction #4") ... which are synthesized and tested for binding affinity ... at which point the synthetic cycle begins anew.

After awhile, the notion of a linear hierarchy of quality becomes indistinct ... rather, quality in synthetic biology is regarded in much the same way that Terry Tao regards quality in mathematics:

--------------------
"the concept of mathematical quality [read 'quality in synthetic biology'] is a high-dimensional one, and lacks an obvious canonical total ordering. ... There does however seem to be some undefinable sense that a certain piece of mathematics [read 'synthetic biology'] is 'on to something,' that it is a piece of a larger puzzle waiting to be explored further."
--------------------

6:15 PM, January 31, 2011

Anonymous John Sidles said...

Aaron Sterling concludes: I believe chemoinformatics, like bioinformatics, will provide an important source of problems for computer scientists.

The following conclusion is logically equivalent, yet psychologically opposite: "I believe computer science, like quantum information theory, is providing important new computational resources to synthetic chemists and biologists."

This illustrates how "mathematics can sit in our brains in different ways" (in Bill Thurston's phrase) ... these choices obviously relate to Reinhard Selten's thought-provoking aphorism "game theory is for proving theorems, not for playing games."

These cognitive choices are not binary. Instead, mixed cognitive strategies like "Theorems help computer scientists to conceive new computational resources" are globally more nearly optimal.

That is why it is very desirable—essential, really—that everyone not think alike, regarding these key mathematical issues.

For this reason, the emergence of diverse, open mathematical forums, like Computational Complexity, Gödel's Lost Letter, Shtetl Optimized, Combinatorics and More, and Math Overflow (and many more), is contributing greatly to accelerating progress across a broad span of STEM enterprises.

The resulting cross-disciplinary fertilization can be discomfiting, ridiculous, and even painful ... but also irresistibly thought-provoking, playful, and fun. Good! :)

8:37 AM, February 01, 2011

Anonymous Aaron Sterling said...

Thanks to everyone for their interest, and for the exciting discussion -- and my apologies for misspelling names, now fixed. To respond to a few points:

@matt: I explicitly chose a problem that included the building of a patent estate as a parameter because I wanted to construct a snapshot intuition of the type of problems encountered in HCA. Sad or not, economic profit is a central player in multiple chapters, both in what types of problems are "interesting," and in what kinds of tools are readily available. I believe the same could be said for most problems in computer science, though perhaps the profit influece is more veiled in TCS. As a reviewer, I felt my responsibility was to convey the lay of the land, as best I understood it -- and, as a computer scientist, I feel it's unwise to consider ourselves "purer" than chemists because we are somehow above financial or corporate pressures. (We're not.)

@Egon W: I'm quite intrigued by your project suggestions, though I doubt I fully understand them. I will follow up with you directly, if that is ok.

@Joerg KW: Thanks very much for your comments; it's intriguing to hear your perspective, seeing some of these issues from "inside." If you don't mind, could you (or any of the chemists reading) elaborate on, "the lack of multi-label graph standards is still a major problem in this area, and things are just getting worse when going large-scale, or to 3D modelling problems" ? I am not sure what you are referring to here.

@Suresh: My (very limited) understanding is that both data and functionality are slowly becoming more accessible. The PubChem Project now has 31 million chemicals in its database, and access is free. On the other hand, manipulation of that data can be expensive. Many of the descriptors mentioned in Joerg Kurt Wegner's comment can only be calculated by expensive proprietary software. In addition to cost, the lack of code transparency means that an error in code could propagate errors into many results in the academic literature without being discovered. That is part of the motivation for the open-source chemoinformatics projects currently underway, like CDK, which Egon Willighagen has been part of.

@John Sidles: Your constant enthusiasm for interdisciplinary research is inspiring. :-)

Finally, I will say a word about the review, as it seems a fair number of people outside theoretical computer science might be reading this. I wrote this review for the newsletter of SIGACT (Special Interest Group on Algorithms and Computation Theory), a professional association of theoretical computer scientists. Bill Gasarch, co-owner of this blog, is the SIGACT book review editor; and Lance Fortnow, the other co-owner of this blog, is SIGACT Chair. Previous book reviews can be found here. It will be a while (Bill might be able to provide a timeframe) before this goes to press, so I can correct inaccuracies or address concerns before this becomes unchangeable.

Thanks again to everyone.

9:51 AM, February 01, 2011

Anonymous John Sidles said...

Aaron says: @John Sidles: Your constant enthusiasm for interdisciplinary research is inspiring. :-)

It's not enthusiasm, Aaron ... it's a concrete roadmap ... a roadmap that was originally laid out by von Neumann, Shannon, and Feynman for sustained exponential expansion in sensing, metrology, and dynamical simulation capabilities.

Recent advances in CS/QIT are providing new, concrete math-and-physics foundations for sustaining the expansion that von Neumann, Shannon, and Feynman envisioned. Good!

Several posters have diffidently expressed concerns relating to increasing tensions between openness, curation, and property rights. But there is no need to address these key concerns with diffidence ... my wife highly recommends Philip Pullman's thoughtful and plain-spoken analysis of these issues.

All in all, as Al Jolson sang in 1919 "You Ain't Heard Nothing Yet!" ... in the specific sense that the capabilities and challenges that Aaron's review addresses, are almost surely destined to continue their "More than Moore" expansion. Good! :)

11:07 AM, February 01, 2011

Blogger Rajarshi said...

Hi Aaron, wrt your comment about descriptors - I'd argue that you can actually evaluate many commonly used descriptors with open source software. For example the CDK implements many descriptors -certainly not all noted in the Handbook of Molecular Descriptors - but as Joerg points out, many descriptors are minor variations of others. A recent paper (dx.doi.org/10.1124/dmd.110.034918) showed that CDK descriptors give equivalent results to those obtained using a commercial tool (MOE). But it is also true that certain descriptors (logP for example) depend on having access to large datasets, for which there isn't always a freely available version

12:05 PM, February 01, 2011

Anonymous matt said...

@Aaron: I didn't say that the use of patents in the example was sad because of the financial aspect. Of course, without the financial aspect no one would have the money to do this research. Rather, I felt it was sad because in many cases now patent law harms innovation rather than helping it. I would have been completely happy with an example that involved other important financial considerations, such as return on investment, time to market, first mover advantage, economy of scale, and so on, all of which are important even when separated from patent law.

12:29 PM, February 01, 2011

Blogger Joerg Kurt Wegner said...

"the lack of multi-label graph standards is still a major problem in this area, and things are just getting worse when going large-scale, or to 3D modelling problems"

I will break it down into some examples.

A molecular graph is ... a connection of some atoms with certain bonds. Though, chemists will look at such a pattern and they might say that this ring (e.g. Benzene) is an aromatic and conjugated system, while Cyclohexyl is a non-aromatic and aliphatic system. So, implicitely many chemists will assign multiple properties to a molecular graph at the same time, e.g. aromaticity, hybridization, electronegativity, and so on.
Now, algorithms like PATTY [1] and MQL can be used for assigning such implicite properties in an explicite form allowing us to work with it. This converts an unchemical graph into a multi-labelled molecular graph with 'chemistry' knowledge. One of the remaining problems that we have till today not one standard grammar definition, but only mixed assignment cascades in various software packages. For details of this dilemma see OpenSmiles.org. For making it worse, sometimes the assignment cascades are cyclic and depend on the execution order, aka, the assignment process is 'instable' and can produce varying results.

Now, lets us simply call the whole process a 'chemical expert system' (with all limitations of an expert system) or a 'cheminformatics kernel'.

The resulting problem is that any subsequent analysis, e.g. a descriptor calculation, 3D conformer generation, docking, and so on ... depend on the intial assignment, which might be unstable. Approximately it might work, still strictly speaking does it remain a problem, and at the end people are comparing dockingX against dockingY, while there might be many steps in-between, which are 'approximately the same'.

One classical example is one paper where they showed that the 3D conformer generation can be influenced by different SMILES input (a line notation for molecules) of the very same molecules [3]. Under the assumtion that a SMILES defines a multi-label molecular graph, why should than an internal numbering (order of SMILES) change the output of 3D conformations? So, this stochastic element in that stage looks strange to me, but I know that this is 'daily business' we have to account for.

[1] B. L. Bush and R. P. Sheridan, PATTY: A Programmable Atom Typer and Language for Automatic Classification of Atoms in Molecular Databases, J. Chem. Inf. Comput. Sci., 33, 756-762, 1993.

[2] E. Proschak, J. K. Wegner, A. Schüller, G. Schneider, U. Fechner, Molecular Query Language (MQL)-A Context-Free Grammar for Substructure Matching, J. Chem. Inf. Model., 2007, 47, 295-301. doi:10.1021/ci600305h

[3] Permuting input for more effective sampling of 3D conformer space; Giorgio Carta, Valeria Onnis, Andrew J. S. Knox, Darren Fayne and David G. Lloyd; JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN;
20, 3, 179-190, DOI: 10.1007/s10822-006-9044-4

2:58 PM, February 01, 2011

Anonymous Barry Bunin said...

Thanks for the thoughtful post. We plan to open up this public data more soon in the very near future for folks to analyze further (the idea being community SAR is a nice foundation for future community QSAR): http://www.collaborativedrug.com/pages/public_access

4:05 AM, February 02, 2011

Anonymous John Sidles said...

To provide a mathematical context for appreciating Barry Bunin's post (above), and in particular, for appreciating the objectives of his company Collaborative Drug Discovery Inc., a recommended recent article is "The mycobacterium tuberculosis drugome and its polypharmacological implications" (2010, free on-line at PLOS Computational Biology).

There are several different ways that this PLOS article can "sit in our minds" (to borrow Bill Thurston's wonderful phrase) ... and there is no need to restrict ourselves to just one way.

The Bill & Melinda Gates Foundation regards these new methods as essential to finding cures for diseases like tuberculosis and malaria. From an engineering point-of-view, the PLOS article shows how new enterprises in systems and synthetic biology are embracing the systems engineering methods that NASA applies in space missions ... except that the "autonomous vehicles" are nanoscale molecules that are designed to navigate cellular environments and seek specific molecular targets.

So it is natural to ask, in what various ways might these ideas "rest in our brains" mathematically?

This week on Gödel's Lost Letter and P=NP, Dick Lipton and Ken Regan give high praise to Mircea Pitici's new collection Best Writing on Mathematics: 2010, and I would like to draw attention especially to the Forward by Bill Thurston, which begins:

---------------------------
"Mathematics is commonly thought to be the pursuit of universal truths, of patterns that are not anchored to any single fixed concept. But on a deeper level the goal of mathematics is to develop enhanced ways for humans to see and think about the world. Mathematics is a transforming journey, and progress in it can better be measured by changes in how we think than by the external truths we discover."
---------------------------

Thurston goes on to suggest that as we read articles, we ask ourselves "What's the author trying to say? What is the author really thinking?"

Let's apply Thurston's reading methods to the PLOS mycobacterium article. For me as a medical researcher, whose primary interest is in regenerative medicine, the best answers to Thurston's question are intimately bound-up with mathematical advances in complexity theory and quantum metrology. Because what the PLOS article is talking about, and what we are really thinking about in Thurston's sense, is a roadmap that was laid down decades ago by von Neumann and Feynman, to "see individual atoms distinctly" (Feynman 1959), and thereby to find out "where every individual nut and bolt is located ... by developments of which we can already foresee the character, the caliber, and the duration" (von Neumann 1946).

Progress along this decades-old roadmap—it is a centuries-old roadmap, really—has in recent decades begun accelerated at a more-than-Moore rate, in part because of advances in sensing and metrology, and in part because of advances in simulation algorithms, but most of all, because of advances in our Thurston-style mathematical understanding of biological dynamics and complexity (both classical and quantum).

Provided that this progress continues, in sensing and metrology, and in simulation capability, and most essentially of all, in our Thurston-style mathematical understanding of how it all works, then it seems to me that enterprises like Collaborative Drug Discovery Inc. have unbounded scope for growth, and even more significantly (for medical researchers) there is unbounded scope for progress in 21st century medicine. Good! :)

9:16 AM, February 02, 2011

Anonymous Anonymous said...

"To provide an intuition for the type of problems considered, suppose you want to find a molecule that can do a particular thing. We assume that if the new molecule is structurally similar to other molecules that can do the thing, then it will have the same property. (This is called a "structure-activity relationship," or SAR.) However, we also need the molecule to be sufficiently different from known molecules so that it is possible to create a new patent estate for our discovery."



I refuse to believe that this is a valid form of research. Yes, it has been mentioned before. The very idea is still outrageous.

9:20 AM, February 02, 2011

Anonymous Anonymous said...

"Those who love justice and good sausage should never watch either one being made." -- attributed to Otto von Bismarck

9:50 AM, February 02, 2011

Anonymous Aaron Sterling said...

@Rajarshi: Thanks for the correction and explanation.

(Rajarshi is Rajarshi Guha, author of Chapter 12 of HCA, the open source software chapter.)

@Joerg KW: What an amazing comment! Thank you. I will read those references very soon.

@Barry B: Thank you too. Your news was exciting.

10:20 AM, February 02, 2011

Anonymous GASARCH said...

(I emailed Steve Salzberg, biocomp prof at UMCP, a pointer to Aaron's post. He emailed me this
response. I asked him if I could post it as a comment and he agreed.)







Interesting guest post.
Chemoinformatics is definitely an important area.
In my opinion, it is not a "hot" field, though, in part
for some of the reasons mentioned in the post - particularly the fact that the data in
the field is mostly proprietary and/or secret. So they hurt themselves by that behavior.
But the other reason I don't think it is moving that fast is that,
unlike bioinformatics, chemoinformatics is not being spurred by dramatic new technological
advances. In bioinformatics, the amazing progress in automated DNA sequencing has
driven the science forward at a tremendous pace.






I'm at a conference this week (by coincidence)
with about 1000 people, all discussing the latest advances in sequencing technology.
There are many academics here, and also vendors from all the major sequencing
companies. DNA sequencing also has multiple very, very high profile successes to
point to, such as the Human Genome Project and others. Chemoinformatics, in contrast,
does not - at least I'm not aware of any.






So it's important, yes, but it's harder to argue that it is a
rapidly advancing field. Maybe
if they shared all their data that would change.

6:46 PM, February 02, 2011

Blogger Joerg Kurt Wegner said...

When comparing chemistry and biology I must agree that data production and throughput is lower. Still, data growth is exponential and we are simply drowning in structural and activity data, not only on a small molecule, but also on a structural biology level (XRay,NMR, protein-ligand complexes). See also this data explosion collection. Besides, I would encourage more cross-disciplinary work, which in itself can create "hot"ness, no matter if other disciplines produce more data. If people think that, we all should work for Google analyzing YouTube videos.

1:42 AM, February 03, 2011

Anonymous John Sidles said...

Joerg Kurt Wegner is correct that there is a gaping capability mismatch between (fast and accelerating) sequence throughput and (relatively slow) structure throughput. An even more serious mismatch is that sequence coverage is strikingly comprehensive, while structure coverage is exceedingly sparse.

Chromatin structure provides a good example. What Francis Crick called in the 1950s "the central dogma of molecular biology: the one-way flow of information from genome to cell" is now understood to be grossly wrong.

Broadly speaking, the heritable trait of being a neuron (brain cell) rather than a hepatocyte (liver cell) is associated not to DNA, but to the conformational winding of DNA around histones. Thus, for purposes of regenerative medicine (my own main interest), sequences alone are very far from being all we need to know; conformational information is equally vital.

We have wonderfully comprehensive instruments for showing us the pair-by-pair sequence of the DNA strands, but (at presents) no similarly comprehensive instruments for showing us the histone-by-histone structural winding of DNA in the cell nucleus.

Still, structure determination capabilities are advancing at an incredibly rapid pace, and are largely paced by advances in CCT/QIT/QSE.

There is every reason to anticipate that eventually (even reasonably soon) our structure-determining capabilities will begin to match our sequence-determining capabilities in comprehensive scope, speed, and cost. These fundamental capabilities will be much-discussed at the ENC Conference in Asilomar this coming April. It will be exciting!

It is striking too that the 11-nanometer size of histone complexes is comparable to the resist half-pitch dimensions of coming generations of VLSI technologies ... according to the ITRS Roadmaps, anyway. Thus problems of structure-determination in biology, and in nanoelectronics, are foreseeably going to be solved together (or not at all).

Just as Hilbert's motto for the 20th century was "We must know, we will know", so for the 21st century, in fields as various as biology, astronomy, and chemistry, the motto is "We must see, we will see." This age-old dream was shared by von Neumann and Feynman, and now in our century it is coming true. Good!

7:16 AM, February 03, 2011

Anonymous Anonymous said...

How does Cheminformatics intersect with QSAR and Systems Biology ?

Has there been much progress with Bioinfo in the last several years? After the hype of the HGH, the Proteome is a long way from being mapped. I was under the impression that there were aprox 500,000 proteins in the human body, most of which are hidden by high abundance proteins such as Albumin. most discovered proteins have not had their 3D structure determined (X-Ray crysto / NMR are expensive) and insilico structure prediction has hit a wall.

7:36 AM, February 09, 2011

Blogger Egon Willighagen said...

Dear Steve Salzberg (via GASARCH), I do think it is rapidly moving. The Blue Obelisk movement has repeated in some 15 years in open source cheminformatics what the whole community did in the 30 years before that, and more. Indeed, one problem when I started in this field 15 years ago is that cheminformatics was not considered academic, and was for long pushed into commercial entities based on source code as IP, resulting in a slowdown. But with the open source cheminformatics movement things have picked up speed again, and very fast too.

That bioinformatics is going faster, is not intrinsic to these problems. That just reflects the amount of funding, IMHO. In fact, most cheminformaticians I know work actually as bioinformatician. Moreover, do not underestimate the amount of contributions bioinformatics fields like metabolomics, flux analysis, assay data, chemogenomics, etc, has to cheminformatics. That said, 98% of the current cheminformatics literature is about applications, rather than metadological work.

The adoption of XML (CML) and RDF as a semantic representations of chemical information is a nice example where the open source cheminformatics community is ahead of its field, and highlights many of the simplifications proprietary solutions made in the past.

1:39 AM, February 10, 2011

Anonymous Aaron Sterling said...

Hi Anon26,

Several of the commenters on this thread are more qualified to respond to you than I am, but the post is a week old now, and I don't know who will see it. I'll say one thing.

It seems to me that the opening up of dramatically understudied chemical databases should provide areas for new research, even if a wall has been hit elsewhere (and I don't know that it has). Here's a link to a 2010 Wall Street Journal article you might find interesting.

http://online.wsj.com/article/SB10001424052748703341904575266583403844888.html?mod=WSJ_Tech_RIGHTTopCarousel

2:43 PM, February 10, 2011

Blogger Joerg Kurt Wegner said...

Here we go again, do you think cheminformatics graph canonicalization is a solved problem? Think, again!. And any subsequent (large-scale) data mining efforts are impacted.

BTW, how many large-scale molecule mining tools do you know, which are part of active scientific research?

12:39 PM, February 13, 2011

You can use some HTML tags, such as <b>, <i>, <a>

Comment moderation has been enabled. All comments must be approved by the blog author.

You will be asked to sign in after submitting your comment.
OpenID LiveJournal WordPress TypePad AOL
Please prove you're not a robot