TL;DR; The Plant List is now in GBIF http://doi.org/10.15468/btkum2.
Readers of this blog may recall that I've had a somewhat jaundiced view of The Plant List. The first version was release with a Creative Commons Attribution-NonCommercial-NoDerivs (CC BY-NC-ND) license which allowed copying so long as didn't create a derived work (The Plant List: nice data, shame it's not open). This is frankly about the silliest possible license for a data set as, from my perspective, the whole reason for releasing data is so that it can be combined and enhanced with other data.
So, for the last week I've been working on getting a version of The Plant List into GBIF, and I've finally managed to achieve this. There's isn't a single place you can grab the whole plant list, so you have to scrape the web site for CSV files, then glue them together. I would could argue that converting the data into the Darwin Core Archive is a derived work, but in case this seems not derivative enough (of course, nobody seems ready to define just what "derived" actually means) I started to augment the list of names by adding bibliographic identifiers. I've long argued (see e.g. Surfacing the deep data of taxonomy) that a fundamental limitation of existing taxonomic database is that they don't explicitly link to the primary literature. This is why I built BioNames, and why I've been working to link the "micro citations" in IPNI to identifiers such as DOIs, JSTOR likes, BioStor URLs and BHL page links (see project on github). So, I've added about 120,000 DOIs and JSTOR links to names in the plant list. This is a subset of the links I've found for IPNI, but for this first release I've tried to keep things simple. I've also made the link between Plant List name and DOI/JSTOR via the IPNI identifier for a name, and the Plant List has ommitted quite a few IPNI ids for reasons which aren't clear.
The Plant List version I've created is available in GBIF (http://doi.org/10.15468/btkum2 and http://www.gbif.org/dataset/d9a4eedb-e985-4456-ad46-3df8472e00e8). Having another list of plant names will be a useful addition to the checklists that GBIF already has, even if the Plant List is already somewhat out of date.
DOIsOne feature of enhanced Plant List in GBIF is that for a subset of names (currently about 10%) there are direct links to the original publication of that name. For example, the record for Haniffia albiflora in the Plant List has a fairly cryptic bibliographic citation Nordic J. Bot. 20: 287 2000 and no link to that publication. In the version I've uploaded to GBIF the name Haniffia albiflora looks like this: Note the full citation. But more importantly, the Publisher record link is the DOI http://doi.org/10.1111/j.1756-1051.2000.tb00745.x so clicking on it takes you to the original description of this species: There is a lot of plant taxonomic literature available in JSTOR, sadly most of it (along with specimen images) behind a paywall (see Why are botanists locking away their data in JSTOR Plant Science?). Some of the links from GBIF take you to JSTOR: The DOI landscape is evolving, and there are now multiple DOI registration agencies minting DOIs for scientific papers. CrossRef provides easily the best services for discovery and metadata harvesting, other agencies often have no equivalent, which makes it hard to discover DOIs for those papers hard. I've spent some time getting this information for Chinese and Taiwanese articles, e.g. http://dx.doi.org/10.6165/tai.1985.30.5: and http://dx.doi.org/10.3969/j.issn.2095-0845.2005.04.002: to give two example of articles that are now linked to from the corresponding species page in GBIF.