Tuesday, January 11, 2011

Why won't The Plant List won't let me do this?

In my last post I discussed why I thought the decision of The Plant List to use a restrictive license (CC-BY-NC-ND) was such a poor choice. CC-BY-NC-ND states that
You may not alter, transform, or build upon this work.
To make this point more concrete, I've created this site:

Experiments with The Plant List

to show the kinds of things that The Plant List's choice of license prevents the taxonomic community from doing. As a first step I'm exploring linking the names in the list to the primary scientific literature, as this video demonstrates:

The Plant List from Roderic Page on Vimeo.

For example, we can take a name like Begonia zhengyiana Y.M.Shui, parse the bibliographic citation provided by The Plant List (via IPNI), and locate the actual paper online, in this case it's freely available as a PDF:

Now we can see a drawing of the plant, and instead of simply trusting that the compilers of The Plant List have correctly interpreted this paper, we can see for ourselves. Down the track, we could imagine mining this paper for details about the plant, such as its morphology and geographic distribution. This requires the link to the original literature, which The Plant List lacks.

A good chunk of the recent plant taxonomic literature has DOIs, for example journals such as the Kew Bulletin and Novon. Playing with some scripts I've managed to associate nearly 9000 accepted names with a DOI, and that's by looking at only a few journals. There are lots more DOIs to be found, but because of the way botanical nomenclators record references (see my post Nomenclators + digitised literature = fail) it can be something of a challenge to find them. This task isn't helped by the fairly lax way some publishers enter data in CrossRef (Cambridge University Press I'm looking at you). The other obvious source of digitised literature is, of course, BHL, and that's next on the list of resources to play with.

Experiments with The Plant List is very crude, and I've barely scratched the surface of linking names to primary literature. That said, given that there are exactly zero links between names and digital literature in The Plant List, I'd argue that my site adds value to the data in that The Plant List. And that's my point — by making data available for others to play with, you enable others to add value to that data. By choosing a CC-BY-NC-ND license, The Plant List has killed that possibility.

So, my question for The Plant List is "why did you do that?"