Wednesday, May 09, 2007

Catalogue of Life design flaw


A bit more browsing of the Catalogue of Life annual checklist for 2007 reveals a rather annoying feature that, I think, cripples the Catalogue's utility. With each release the checklist grows in size. From their web site:
The Species 2000 & ITIS Catalogue of Life is planned to become a comprehensive catalogue of all known species of organisms on Earth by the year 2011. Rapid progress has been made recently and this, the seventh edition of the Annual Checklist, contains 1,008,965 species.

However, with each release the identifiers for each taxon change. For example, if I were to link to the record for the peacrab Pinnotheres pisum this year (2007), I would link to record 3803555, but last year I would have linked to 872170. Record 872170 no longer exists in the 2007 edition.

So, what would a user who based their taxonomic database on the Catalogue of Life do? All their links would break (not just because the URL interface has changed, but the underlying identifiers have changed as well). It's as if the authors of the catalogue have been oblivious to the discussion on globally unique identifiers (GUIDs) and the need for stable, persistent identifiers.

Anybody building a database that gets updated, and possible rebuilt needs to thik about how their identifiers will change. If identifiers are simply the primary keys in a table, then they will likely be unstable, unless great care is taken. Althernatively, databases that are essentially aggregations of data available elsewhere could use GUIDs as the primary keys. This means that even if the database is restructured, the keys (and hence the identifiers) don't change. For the user, everything still works.

Despite the favourable press about its progress (e.g., doi:10.1038/news050314-6, Environmental Research Web, and CNN), I think the catalogue needs some serious rethinking if it is to be genuinely useful. For more on this, see my earlier posting on how the catalogue handles literature.

Image of Pinnotheres pisum by Hans Hillewaert obtained from Wikimedia Commons.

2 comments:

David said...

This has long been a problem with Species2000 and ITIS. This is why it is far better to use uBio's web services to pull classifications and their associated LSIDs, which are stable. However, what's not clear to me is how uBio will deal with this issue in their ClassificationBank, which is supposed to act as an aggregation of synonymies, etc. Since The Catalogue of Life will necessarily be the scaffolding upon which the Encyclopedia of Life will be built, this will be really interesting.

sexy said...

情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,A片,視訊聊天室,聊天室,視訊,視訊聊天室,080苗栗人聊天室,上班族聊天室,成人聊天室,中部人聊天室,一夜情聊天室,情色聊天室,視訊交友網

免費A片,AV女優,美女視訊,情色交友,免費AV,色情網站,辣妹視訊,美女交友,色情影片,成人影片,成人網站,A片,H漫,18成人,成人圖片,成人漫畫,情色網,日本A片,免費A片下載,性愛

A片,色情,成人,做愛,情色文學,A片下載,色情遊戲,色情影片,色情聊天室,情色電影,免費視訊,免費視訊聊天,免費視訊聊天室,一葉情貼圖片區,情色,情色視訊,免費成人影片,視訊交友,視訊聊天,視訊聊天室,言情小說,愛情小說,AIO,AV片,A漫,avdvd,聊天室,自拍,情色論壇,視訊美女,AV成人網,色情A片,SEX,成人論壇

情趣用品,A片,免費A片,AV女優,美女視訊,情色交友,色情網站,免費AV,辣妹視訊,美女交友,色情影片,成人網站,H漫,18成人,成人圖片,成人漫畫,成人影片,情色網


情趣用品,A片,免費A片,日本A片,A片下載,線上A片,成人電影,嘟嘟成人網,成人,成人貼圖,成人交友,成人圖片,18成人,成人小說,成人圖片區,微風成人區,成人文章,成人影城,情色,情色貼圖,色情聊天室,情色視訊,情色文學,色情小說,情色小說,臺灣情色網,色情,情色電影,色情遊戲,嘟嘟情人色網,麗的色遊戲,情色論壇,色情網站,一葉情貼圖片區,做愛,性愛,美女視訊,辣妹視訊,視訊聊天室,視訊交友網,免費視訊聊天,美女交友,做愛影片

av,情趣用品,a片,成人電影,微風成人,嘟嘟成人網,成人,成人貼圖,成人交友,成人圖片,18成人,成人小說,成人圖片區,成人文章,成人影城,愛情公寓,情色,情色貼圖,色情聊天室,情色視訊,情色文學,色情小說,情色小說,色情,寄情築園小遊戲,情色電影,aio,av女優,AV,免費A片,日本a片,美女視訊,辣妹視訊,聊天室,美女交友,成人光碟

情趣用品.A片,情色,情色貼圖,色情聊天室,情色視訊,情色文學,色情小說,情色小說,色情,寄情築園小遊戲,情色電影,色情遊戲,色情網站,聊天室,ut聊天室,豆豆聊天室,美女視訊,辣妹視訊,視訊聊天室,視訊交友網,免費視訊聊天,免費A片,日本a片,a片下載,線上a片,av女優,av,成人電影,成人,成人貼圖,成人交友,成人圖片,18成人,成人小說,成人圖片區,成人文章,成人影城,成人網站,自拍,尋夢園聊天室