Tuesday, February 26, 2008

Encyclopedia of Life - first impressions

Some thoughts on the first release of the Encyclopedia of Life. I am being deliberately critical. This is a high profile project with tens of millions of dollars in funding, lots of people involved, and is accompanied by some of the most overblown hype in organismal biology. In a sense I think EOL has set itself up by over promising and under delivering.

Before continuing, I should point out that I am involved in EOL in an advisory capacity, but not in actually making anything. Some of the tools I've blogged about have made there way into EOL, such as Pygmybrowse and reference parsing (see David Shorthouse's excellent work on this).

Lack of content
I think the first release of EOL should have, at a minimum, provided at least as much information that I can get from iSpecies and Wikipedia. Other projects, such as Freebase, have pre-populated their databases with content from Wikipedia and other sources. Why didn't EOL? If the argument is that they want authenticated content, then this doesn't wash. Their authenticated content is minimal, and waiting for authentication will, in my view, cripple EOL.

Exemplars are incomplete
The first release contains 25 exemplars. Pages for these taxa
...show the kind of rich environment, with extensive information, to which all the species pages will eventually grow. The information on the exemplar pages has been authenticated (endorsed) by the scientists whose names are listed on these pages.
Well, I hope this isn't the standard EOL aspires to. The pages are incomplete and not interlinked. One of the 25 chosen exemplars is Anolis carolinensis. EOL lists its distribution as:
Widely-distributed throughout the southeastern United States: North Carolina to Key West, Florida, and west to southest Oklahoma and central Texas.

However, the GBIF map EOL displays shows lots of dots in Hawaii:


The EOL account is silent on this interesting distribution pattern. It will come as no suprise that the Wikipedia account of the same species tells us that it has been introduced into Hawaii. Wikipedia 1, EOL 0.

Links

If two pages talk about species that are ecologically associated, then surely those pages should be linked? Among the exemplars is Pissodes strobi, the white pine weevil. In the EOL account, among the hosts listed is Pinus strobus, another exemplar taxon. The accounts of these two taxa are not linked. No hyperlink, nothing. The reader has no idea that there is an exemplar account for Pinus strobus. Furthermore, when reading the account for Pinus strobus there is no indication that it is host to the white pine weevil.
Surely the point of having all this information in one place is so that it can be linked together?

BHL
EOL also exposes some limitations of the Biodiversity heritage Library. Consider the exemplar page for Pinus strobus L. The "L." indicates that this species was described by Linnaeus. Among the many references listed by BHL, none are by Linnaeus. What gives?

Well, the IPNI record reveals that this species was described on p. 1001 of Species Plantarum. BHL has digitised Species Plantarum, and page 1001 has Pinus strobus:



Now, BHL relies on uBio's tools to extract names, and Linnaeus didn't make this easy (the specific epithet strobus is in the right hand margin, separate from Pinus), but one would have thought that for the exemplar taxa an effort would have been made to link Linnaean names to BHL content -- what better place to showcase the link between a name and its publication? It's quite easy to do, given that IPNI has page numbers for plant names. Just map page numbers to BHL URLs, and you're done.

Inconsistency
Going down the taxonomic hierarchy weird things happen. When viewing the plant genus Morus if I can see a picture of Morus nigra (presumably this is "authenticated" content). If I drill down to the species Morus nigra, I'm told there is no authenticated content for this species. Either the image is Morus nigra or it isn't. If it is, why not show it, if it isn't, why claim that it is?



Logos

Way too much space is devoted to logos of various contributors, BHL being the worst offender (it doesn't help that the BHL content is incomplete, lacking links for Linnaean names). I don't care about logos. Contributors may care about getting their logos displayed, but users couldn't care less. They get in the way. On some pages, there's more screen space devoted to logos than information (e.g., the page for Apomys datae). This is, frankly, ridiculous, and reflects a warped set of priorities.

What's worse, all these logos are associated with links that take people away from EOL. Hence EOL becomes little more than a collection of web links to other sites.

Search
The search is based on the Catalogue of Life, and inherits the same problems. For example, if I search for "Morus" I get a list in alphabetical order of taxonomic names that contain the string "morus". The two names that are an exact match occur as items three and four on the list -- they should be first and second.

It gets worse if I search on "Tyrannosaurus rex". EOL doesn't do dinosaurs, and so doesn't contain anything on T. rex, but the search results tell me that The following 116 search results contain 'Tyrannosaurus rex'. Nope, none of them do.

The search engine is poorly done, it fails to rank results sensibly, incorrectly reports what it does find, and has no support for spelling mistakes.

Authenticated content
This is probably the thing that, if left as it is, will strangle EOL. The insistence on "authenticated (endorsed)" content places a severe brake on what EOL can offer.

It's a web site
EOL's web site has no mechanism for people to extract data (e.g., RSS feeds, microformats, links to RDF, etc.). It's intended to be read by humans, not machines. This greatly diminishes its utility.

So, I've got that off my chest. The first release was always going to be a disappointment, especially given the hype. What frustrates me, however, is just how far the first release is from what it could have been.

The real question is how much the issues I've raised are things which are easy to fix given time, or whether they reflect underlying problems with the way the project is conceived.

12 comments:

David Shorthouse said...

These are but a small portion of the issues now on the table and we were/are fully aware of all of them. Some we could have fixed, some we spotted and did fix, but other functions were needlessly crippled because we had to perform major fixes. I won't bore with the details. However, there are some shining lights, which have largely gone undiscussed or unnoticed.

First, all front-end materials are built off RESTful web services. Granted these web services may only be of value to reproducing an EOL page (and there will be a review to assess that), but it none the less affords us an opportunity to open the doors more widely than they are now. Hats off to Patrick Leary.

Second, we forged an agreement with CrossRef so we will be assigning DOIs to pages. The exact mechanism needs more thought (attribution - how with multiple authors & a ton of AJAX?).

Indeed the release was rushed. Because it was released as is, warts and all, we are open to plenty of criticism. That's the only way we can now proceed. If we flew through this release without so much as a whimper, it would surely have been a disappointment. Not having sufficient time to test loads on the servers was by far the biggest upset for us.

We are also tracking all comments in our forum, all responses sent via email on the servers, and comments in our blog. These are all being triaged and will be part of the critical post-launch review.

By the way, there's no reason to nod my work on the ref parser. I merely made a pretty front-end to your deeper work on this ;)~ Did ya catch my mildly informative and apologetic screencasts...that is, when the site is up?

Roderic Page said...

The fact that EOL buckled under the (enormous) load suggests that something like Akamai might be needed.

Regarding DOIs and attribution, I'm not convinced that having "authors" makes sense. Many pages will be automatically generated from remote sources. In this sense, the "author" is EOL. By all means have a means to list who did what, but I wonder whether the model of having an author makes sense. I would be a shame if the combination of authorship and "authentication/endorsement" got in the way of things.

Anonymous said...

the plant genus Morus

Yay, another intercode homonym. Hooray.

Chris Freeland said...

Rod - Any idea where BHL can grab names from Syst. Nat.? We can get Sp. Pl. names from Tropicos & IPNI, but at a loss for where to get names from Syst. Nat. without manual entry.

Roderic Page said...

Chris, as far as I know there isn't an equivalent list. For generic and subgeneric names I guess you could harvest uBio's Nomenclator Zoologicus data, which includes page-level citations (e.g., the record for Pediculus humanus).

For species names, unless somebody has made a universal list, I guess you're stuck with what taxon-specific nomenclators can provide.

Donat Agosti said...

Chris

as Rich Pyle at ZooBank who has the entire 10th edition of Sys.Nat. in a database.

Then you might get in touch with the animalbase.de team who actually also tries to extract names from legacy publications making their way up from 1758, and just got a new grant to continue their scanning operations.

The third person you might want to ask about this is Dave Remsen at GBIF who, to the best of my knowledge, has been talking to the animalbase.de organiziers.

Anonymous said...

クレジットカード現金化とは、キャッシング枠を枠一杯利用済みで、さらに現金を必要としている方を狙った、アンダーグラウンドなサービスです。
ク レジットカードには、通常、ショッピング専用のショッピング枠と、キャッシング専用のキャッシング枠が存在しています。キャッシング枠を目一杯利用してい ると、当然ながら、カードで現金を借りることが出来なくなります。ショッピングは可能な状態ですが、そのショッピング枠だと、利用用途や利用場所に制限が 生まれます。

Anonymous said...

不動産投資
不動産
格安 名刺
賃貸
名刺作成
価格
名刺 激安
価格比較

Anonymous said...

seo
seo対策
seo
SEO対策
seo
SEO対策
seo
SEO対策
seo
SEO対策
seo
SEO対策
seo
SEO対策

Anonymous said...

杭州装修公司
杭州店面装修
杭州办公室装修
杭州装饰公司
杭州装饰公司

蜂王浆
芦荟
蜂胶
蜂王浆
芦荟
蜂胶

ball valve球阀
gate valve闸阀
angle valve角阀
bibcock水嘴
tap
Check valve
hot-water heating
fittings
苏州led
上海led
北京led
苏州电磁铁
苏州装修公司
苏州装饰公司
ats
ATS生产
ats
ATS开关

Unknown said...

不動産投資
不動産
格安 名刺
賃貸
名刺作成
価格
価格比較
名刺 激安
大田区賃貸 北区賃貸 江東区賃貸 品川賃貸 渋谷賃貸 新宿賃貸 杉並賃貸 世田谷賃貸 中央区賃貸 千代田区賃貸 池袋賃貸 中野賃貸 文京区賃貸 港区賃貸 目黒賃貸 新築賃貸 ペット可賃貸 楽器可賃貸 手数料なし賃貸 保証人不要賃貸 駅5分以内賃貸 部屋探し東京 部屋探しデザイナーズ 賃貸賃貸 分譲仕様賃貸 中央線賃貸 京浜東北線賃貸 京王線賃貸 東横線賃貸 丸ノ内線

gener said...

Hi. I would be a shame if the combination of authorship and "authentication/endorsement" got in the way of things.