Monday, May 05, 2008

iPhylo Demo

I started this blog with the goal of documenting my own efforts to make a database of evolutionary trees, based on ideas sketched in hdl:10.1038/npre.2007.1028.1. I've felt that the major task is link phylogenies to other information, such as taxon names, specimens, localities, images, publications, etc. That is, to embed trees in a broader context. Discovering how to engage with that broader context led to a bunch of experiments, toys, and diversions:
  1. iSpecies, a toy to aggregate information on a species.
  2. Semant, experiments with RDF and triple stores (AKA the Semantic Web).
  3. bioGUID, an attempt to make identifiers resolvable, with an increasing focus on developing an OpenURL resolver for biodiversity literature.

iSpecies and bioGUID are still operational, but the ant work fell victim to server crashes, and a growing frustration with the limitations of triple stores. Blogs for all three projects document their histories: iSpecies, Semant, and bioGUID. In a sense, these blogs document the steps along the way to iPhylo.

Based on this experience, I started again with what I've previously referred to as a database of everything. The first public demo is online at iphylo.org/~rpage/demo1. It's very crude, but may give a sense of what I'm trying to do.

The goal of iPhylo is to treat biodiversity objects as equal citizens. Each object has a unique identifier, associated metadata, and is linked to other objects (for example, a specimen is linked to sequences, sequences are linked to publications, etc.). By following the links it is possible to generate new views on existing information, such as a map for study that doesn't have any maps. For example, below is a map generated for Brady et al. (doi:10.1073/pnas.0605858103), based on links between sequences and specimens (if you can't see the map you need a SVG-capable web browser, such as Safari 3 or Firefox 2).


Error, browser must support "SVG"




Ironically there are no phylogenies yet. At this stage I'm trying to link the bits together.

How does it work?
More on this later. Briefly, iPhylo uses a entity-attribute-value model database to store objects and their relationships. Like bioGUID, iPhylo relies on a suite of web services (most external, some I've developed locally) to locate and resolve identifiers. iPhylo resolves identifiers for PubMed records, GenBank sequences, museum specimens, publications, etc. and adds the associated metadata to a local database. Wherever possible it resolves any links in the metadata (e.g., if a GenBank record mentions a specimen, iPhylo will try and retrieve information on that specimen). When you view an object in iPhylo, these links are displayed. iPhylo will also try and convert bibliographic records to identifiers (such as DOIs) if no identifiers are provided, and also extracts georeferences for specimens and sequences, either from original records or by using a georeferencing service. Taxonomic names are resolved using uBio, and are treated as "tags."

At present iPhylo is being populated by various scripts, there is no facility for users to add data. This is something I will add in the future.

Getting the data
One of the biggest challenges is getting data (or, to be more precise, figuring out how to harvest available data). iPhylo builds on code for bioGUID. I've also been exploring bulk harvesting data sources. Sometimes this is easy. Many sequences in Genbank are linked to records in PubMed, so if you know the Pubmed id for a paper, you can harvest the sequences. For example, even though the Bulletin of the American Museum of Natural History isn't indexed in PubMed, it is possible to retrieve all the sequences GenBank records as being from papers published in the Bulletin. You can retrieve this list in XML form by clicking here.


Why?

There are all sorts of things which could be done with this. For example, by linking objects together we can also track the provenance of data, and ultimately build "citation networks" of specimens, sequences, etc. For background see my paper on "Biodiversity informatics: the challenge of linking data and the role of shared identifiers" (doi:10.1093/bib/bbn022, preprint at hdl:10101/npre.2008.1760.1).

As mentioned above, we can generate maps even if the original study didn't include one (by following the links). Given that we can geotag studies, this opens up the possibility to query studies spatially. For example, this study on bats and this study on rodents deal with taxa with very similar distributions. A spatial query could find these easily. Imagine interested in, say, Madagascar, and being able to find phylogenetic studies , even if the title and abstract of the paper don't mention Madagascar by name.

There also potential to clean data. One of the first studies I uploaded is Grant et al.'s study of dart-poison frogs. The map for this study shows a outlying point in California:


Error, browser must support "SVG"




This point is MVZ 199828, which is a specimen of the salamander Aneides flavipunctatus. In GenBank, MVZ 199828 is listed as the voucher for seven sequences from the frog Mannophryne trinitatis. Oops. A quick iSpecies search, and a click on the GBIF map reveals that there is a specimen MVZ 199838 of Mannophryne trinitatis. i suspect that this is the true voucher for these sequences, and the GenBank records contain a typo.

Future
This is all still very, very crude. The demo is slow, the queries aren't particularly clever, and I've probably gone overboard on using Javascript to populate the web pages. The real value isn't in the web pages, but the links between the data objects. This is my main focus -- extracting and adding links. For now the data is displayed but you can't edit it. However, this is coming. Very basic RDF and RSS feeds are available for each object, and fans of microformats will find some goodies, and sociologists of science may find some of the coauthorship graphs intriguing

2 comments:

Anonymous said...

And suddenly I find my name in iPhylo...
This project is awesome - keep up the good work i feel it's going to be a big thing!

Katharina

sexy said...

情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,A片,視訊聊天室,聊天室,視訊,視訊聊天室,080苗栗人聊天室,上班族聊天室,成人聊天室,中部人聊天室,一夜情聊天室,情色聊天室,視訊交友網

免費A片,AV女優,美女視訊,情色交友,免費AV,色情網站,辣妹視訊,美女交友,色情影片,成人影片,成人網站,A片,H漫,18成人,成人圖片,成人漫畫,情色網,日本A片,免費A片下載,性愛

A片,色情,成人,做愛,情色文學,A片下載,色情遊戲,色情影片,色情聊天室,情色電影,免費視訊,免費視訊聊天,免費視訊聊天室,一葉情貼圖片區,情色,情色視訊,免費成人影片,視訊交友,視訊聊天,視訊聊天室,言情小說,愛情小說,AIO,AV片,A漫,avdvd,聊天室,自拍,情色論壇,視訊美女,AV成人網,色情A片,SEX,成人論壇

情趣用品,A片,免費A片,AV女優,美女視訊,情色交友,色情網站,免費AV,辣妹視訊,美女交友,色情影片,成人網站,H漫,18成人,成人圖片,成人漫畫,成人影片,情色網


情趣用品,A片,免費A片,日本A片,A片下載,線上A片,成人電影,嘟嘟成人網,成人,成人貼圖,成人交友,成人圖片,18成人,成人小說,成人圖片區,微風成人區,成人文章,成人影城,情色,情色貼圖,色情聊天室,情色視訊,情色文學,色情小說,情色小說,臺灣情色網,色情,情色電影,色情遊戲,嘟嘟情人色網,麗的色遊戲,情色論壇,色情網站,一葉情貼圖片區,做愛,性愛,美女視訊,辣妹視訊,視訊聊天室,視訊交友網,免費視訊聊天,美女交友,做愛影片

av,情趣用品,a片,成人電影,微風成人,嘟嘟成人網,成人,成人貼圖,成人交友,成人圖片,18成人,成人小說,成人圖片區,成人文章,成人影城,愛情公寓,情色,情色貼圖,色情聊天室,情色視訊,情色文學,色情小說,情色小說,色情,寄情築園小遊戲,情色電影,aio,av女優,AV,免費A片,日本a片,美女視訊,辣妹視訊,聊天室,美女交友,成人光碟

情趣用品.A片,情色,情色貼圖,色情聊天室,情色視訊,情色文學,色情小說,情色小說,色情,寄情築園小遊戲,情色電影,色情遊戲,色情網站,聊天室,ut聊天室,豆豆聊天室,美女視訊,辣妹視訊,視訊聊天室,視訊交友網,免費視訊聊天,免費A片,日本a片,a片下載,線上a片,av女優,av,成人電影,成人,成人貼圖,成人交友,成人圖片,18成人,成人小說,成人圖片區,成人文章,成人影城,成人網站,自拍,尋夢園聊天室