Mauro Cavalcanti brought Chris Anderson's
The End of Theory article in Wired to my attention, part of the July issue on "The End of Science".
Of course, the end of science is hyperbole of the highest order (as, indeed, is the "end of theory"). It is also ironic that in the same issue Wired confess to
having gotten 5 predictions of the death of something hopelessly wrong (including web browsers and online music swapping, no less). However, I guess the reason Mauro sent me the link is this section:
The best practical example of this is the shotgun gene sequencing by J. Craig Venter. Enabled by high-speed sequencers and supercomputers that statistically analyze the data they produce, Venter went from sequencing individual organisms to sequencing entire ecosystems. In 2003, he started sequencing much of the ocean, retracing the voyage of Captain Cook. And in 2005 he started sequencing the air. In the process, he discovered thousands of previously unknown species of bacteria and other life-forms.
If the words "discover a new species" call to mind Darwin and drawings of finches, you may be stuck in the old way of doing science. Venter can tell you almost nothing about the species he found. He doesn't know what they look like, how they live, or much of anything else about their morphology. He doesn't even have their entire genome. All he has is a statistical blip — a unique sequence that, being unlike any other sequence in the database, must represent a new species.
This sequence may correlate with other sequences that resemble those of species we do know more about. In that case, Venter can make some guesses about the animals — that they convert sunlight into energy in a particular way, or that they descended from a common ancestor. But besides that, he has no better model of this species than Google has of your MySpace page. It's just data. By analyzing it with Google-quality computing resources, though, Venter has advanced biology more than anyone else of his generation.
Leaving aside whether Venter has indeed "advanced biology more than anyone else of his generation" (how, exactly, can one measure that?), it started me thinking about the yawning chasm between efforts such as the
Encyclopedia of Life and the Catalogue of Life on one hand, and, say metagenomics on the other. EoL and CoL have a view of life that is taxon, indeed, species-centric, that appeals to our sense of what matters (basically those organisms we can see comfortably with the naked eye, and interact with). But if you browse the NCBI taxonomy, not only do you see an attempt to classify organisms phylogenetically, you will also encounter "taxa" that are
metagenomes (e.g., NCBI Taxonomy ID
408169). These metagenomes are the result of shotgun sequencing environmental samples, they comprise multiple taxa. In this way, they resemble large-scale sampling events such as plankton netting, or tree fogging, which results in masses of material, much of it unidentified. One difference is that the metagenomes are digitised (i.e., sequenced), and hence can be analysed further (as opposed to a mass of specimens in jars). Indeed, this is one motivation behind DNA barcoding -- the ability to digitise massive samples of organisms.
So, perhaps if we overlook the "end of theory" bit (although this is appealing given that some critiques of DNA barcoding have made overblown claims for taxonomy as hypothesis-driven science), the key here is that much of what in an earlier age might have been provisional knowledge unfit for public consumption (e.g., a bunch of unidentified samples) is now very public. In the past, taxonomists wouldn't describe new taxa without sufficient information for a decent description, now the most actively growing taxonomic database (NCBI) has "taxa" that are aggregates of unidentified, unknown (and possibly, unknowable) organisms.