Showing posts with label Open access. Show all posts
Showing posts with label Open access. Show all posts

Friday, May 08, 2015

Putting some bite into the Bouchout Declaration

There are no requirements for signing up. A signature is first and foremost a statement of support for open data . Each signatory can determine how best to make progress towards the goal. Some recommendations are included in the declaration. We hope that signatories will become early adopters of the open access approach, that they will promote change in their institutions, societies and journals, and will position themselves and their institutions as leaders. (from http://www.bouchoutdeclaration.org/faqs/)
M13QFZi4I've put off writing this post about the Bouchout Declaration for a number of reasons. I attended the meeting that launched the declaration last year, and from my perspective that was a frustrating meeting. Much talk about "Open Biodiversity Knowledge Management" with nobody seemingly willing or able to define it (see The vision thing - it's all about the links for some comments I made before attending the meeting), and as much as the signing of the Boechout Declaration provided good theatre, it struck me as essentially an empty gesture. Public pronouncements are all well and good, but are ultimately of little value unless backed up by action. We have institutions that have signed the declaration yet have much of their intellectual output locked behind paywalls (e.g., JSTOR Global Plants). So much for being open.

So, since Donat challenged me, here's what I'd like to see happen. I'd like to see metrics of "openness" that we can use to evaluate just how open the signatories actually are. These metrics could be viewed as ways to try and persuade institutions into sharing data and other information, as a league table we can use to apply pressure, or as a way to survey the field and see what the impediments are to being open (are they financial, legal, cultural, resource, etc.).

Below are some of the things I think we could "score" the openness of biodiversity institutions.

Is the collection digitised and in GBIF?

Simple criterion that is easy to measure. If an institution has specimens or other biological material, is data and or metadata on the collection freely available? What fraction of the collection has been digitised? How good is that digitsation (e.g., what fraction has been georeferenced?). We could define digitisation more broadly to include imaging and sequencing (both are methods of converting analogue specimens into digital objects).

Are the institutional publications digitised? Are they open access?

Some institutions have a history of digitising their in-house publications and making them freely available online (e.g., the AMNH), some even make them fully citable with CrossRef DOIs (e.g., the Australian Museum). But some institutions have, sadly, signed over their publications to commercial publishers or archives that charge for access (e.g., Kew's publications have been digitised by JSTOR, which limits their accessibility). As a foot note, I suspect that those institutions that lost confidence in their in-house publishing operations and outsourced them are the ones who have ended up loosing control of their intellectual output, some of which is now closed off (e.g., some of the NHM London's journals are now the property of Cambridge University Press). Those institutions that maintained a culture of in-house publishing are the ones at the vanguard of digitising and opening up those publications.

Does the institution take part on the Biodiversity Heritage Library?

There are at least two ways to participate in the Biodiversity Heritage Library (BHL), one is by becoming a member and start scanning books from institutional libraries. The other is by granting permission to BHL to scan institutional publications. BHL is often viewed as an archive of "old" literature, but in fact it has some very recent content. Some farsighted organisations have let BHL scan their journals, contributing to BHL becoming an indispensable resource for biodiversity research.

Do institution staff publish in open access journals?

A while ago I complained about how few new species descriptions were in open access journals (The top-ten new species described in 2010 and the failure of taxonomy to embrace Open Access publication). A measure of openness is whether an institution encourages its staff to publish their work in open access journals, and to make their data freely available as well. Some prefer to chase Nature and Science papers, but I'd like to think we could prioritise openness over journal impact factor.

These are just some of the more obvious things that could be used to measure openness. At the same time, it would be useful to develop ways to show the benefits of being open. For example, I've long argued that we could develop citation tracking for specimens. This gives researchers a means to track provenance of information (who said what about the identity of a specimen), and it also gives institutions a way to measure the impact of their collections. Doing this at scale is only going to be possible if collections are digitised, specimens have identifiers of some sort, and we can text mine the literature and associated data for those identifiers (in other words, the data and publications need to be open). So, perhaps on way to help make the case for being open is to develop metrics that are useful for the institutions themselves.

I guess I would have been much more enthusiastic about the Bouchout Declaration if these sort of things had been in place at the start. Anyone can sign a document. Ideas are cheap, execution is everything.

Tuesday, June 19, 2012

70,000 articles extracted from the Biodiversity Heritage Library

Biostor shadowJust noticed that BioStor now has just over 70,000 articles extracted from the Biodiversity Heritage Library. This number is a little "soft" as there are some duplicates in the database that I need to clean out, but it's a nice sounding number. Each article has full text available, and in most cases reasonably complete metadata.

Most of the articles in BioStor have been added using semi-automated methods, but there's been rather more manual entry than I'd like to admit. One task that does have to be done manually is attaching plates to papers. This is largely an issue for older publications, where printing text and figures required different processes, resulting in text and figures often being widely separated in the publication. Technology evolved, and the more recent literature doesn't have this problem.

Future plans include adding the ability to download the articles as searchable PDFs, and to support OCR correction, amongst other things. BioStor also underpins some of my other projects, such as the EOL Challenge entry, which as of now has around 80,000 animal names linked to their original description in BioStor (and some 300,000 in total linked to some form of digital identifier). One day I may also manage to get the article locations into BHL itself, so that when you browse a scanned item in BHL you can quickly find individual articles. Oh, and it would be cool to have all this on the iPad...

Friday, October 28, 2011

Sherborn presentation on Open Taxonomy

Here is my presentation from today's Anchoring Biodiversity Information: From Sherborn to the 21st century and beyond meeting.


All the presentations will be posted online, along with podcasts of the audio. Meantime, presentations by Dave Remsen and Chris Freeland are already online.

Wednesday, May 25, 2011

The top-ten new species described in 2010 and the failure of taxonomy to embrace Open Access publication

Each year the grandly titled International Institute for Species Exploration (IISE) publishes list of the top 10 species described in the previous year. This year's list is reproduced below, to which I've added the links to the original publications (why do people think still it's OK to omit links to the primary literature when all of these articles are online?).

The striking thing is that only 2 of the 10 species were described in Open Access publications (and I use that term loosely as as Arthropod Systematics & Phylogeny PDFs are freely available, but the licensing isn't clear). Sadly much of our knowledge of the planet's diversity is still locked up behind a paywall.

SpeciesReferenceDOI/PDFOpen Access
Caerostris 5Darwin's Bark SpiderKuntner, M. and I. Agnarsson. 2010. Web gigantism in Darwin's bark spider, a new species from Madagascar (Araneidae: Caerostris). The Journal of Arachnology 38(2):346-35610.1636/B09-113.1No
Mycena 2Bioluminescent MushroomDesjardin, D.E., B.A. Perry, D.J. Lodge, C.V. Stevani, and E. Nagasawa. 2010. Luminescent Mycena: new and noteworthy species. Mycologia 102(2):459-47710.3852/09-197No
HalomonasBacteriumSanchez-Porro, C., B. Kaur, H. Mann and A. Ventosa. 2010. Halomonas titanicae sp. nov., a halophilic bacterium isolated from the RMS Titanic. International Journal of Systematic and Evolutionary Microbiology 60(12):2768-277410.1099/ijs.0.020628-0No
VaranusMonitor LizardWelton, L.J., C.D. Siler, D. Bennett, A. Diesmos, M.R. Duya, R. Dugay, E.L.B. Rico, M. van Weerd and R.M. Brown. 2010. A spectacular new Philippine monitor lizard reveals a hidden biogeographic boundary and a novel flagship species for conservation. Biology Letters 6(5):654-65810.1098/rsbl.2010.0119No
GlomeremusPollinating cricketHugel, S., C. Micheneau, J. Fournel, B.H. Warren, A. Gauvin-Bialecki, T. Pailler, M.W. Chase and D. Strasberg. 2010. Glomeremus species from the Mascarene islands (Orthoptera, Gryllacrididae) with the description of the pollinator of an endemic orchid from the island of Réunion. Zootaxa 2545:58-68PDFNo
Philantomba 2DuikerColyn, M., J. Hulselmans, G. Sonet, P. Oudé, J. de Winter, A. Natta, Z.T. Nagy and E. Verheyen. 2010. Discovery of a new duiker species (Bovidae: Cephalophinae) from the Dahomey Gap, West Africa. Zootaxa 2637:1-30PDFNo
TyrannobdellaLeechPhillips, A.J., R. Arauco-Brown, A. Oceguera-Figueroa, G.P. Gomez, M. Beltran, Y.-T. Lai and M.E. Siddall. 2010. Tyrannobdella rex n. gen. n. sp. and the evolutionary origins of mucosal leech infestations. PLoS ONE 5(4):e1005710.1371/journal.pone.0010057Yes
PsathyrellaUnderwater mushroomFrank, J.L., R.A. Coffan and D. Southworth. 2010. Aquatic gilled mushrooms: Psathyrella fruiting in the Rogue River in southern Oregon. Mycologia 102(1):93-10710.3852/07-190No
SaltoblattellaJumping cockroachBohn, H., M. Picker, K.-D. Klass and J. Colville. 2010. A jumping cockroach from South Africa, Saltoblattella montistabularis, gen. nov., spec. nov. (Blattodea: Blattellidae). Arthropod Systematics and Phylogeny 68(1):53-39/td>PDFYes
HalieutichthysPancake BatfishHo, H.-C., P. Chakrabarty and J.S. Sparks. 2010. Review of the Halieutichthys aculeatus species complex (Lophiiformes: Ogcocephalidae), with descriptions of two new species. Journal of Fish Biology 77(4):841-86910.1111/j.1095-8649.2010.02716.xNo

Tuesday, July 06, 2010

ZooKeys publishes articles of the future

The open access taxonomic journal ZooKeys has published a special issue with four papers, each available in HTML, PDF, and XML, the later being extensively marked up. Penev et al. ("Semantic tagging of and semantic enhancements to systematics papers: ZooKeys working examples", doi:10.3897/zookeys.50.538) describes the process involved in creating these XML files. Two papers (doi:10.3897/zookeys.50.506 and doi:10.3897/zookeys.50.505) were created using authoring tools available in Scratchpads, as outlined by Blagoderov et al. ("Streamlining taxonomic publication: a working example with Scratchpads and ZooKeys", doi:10.3897/zookeys.50.539). When you view the HTMl for these articles you can toggle on or off the highlighting citations, taxonomic names, and geographic co-ordinates. Mousing over a taxonomic name, for example, a popup appears with links to GBIF, NCBI, EOL, BHL, Wikipedia, etc.):

brake.png

I think these papers represent one view of the future of scientific publishing ("article 2.0"), and I'm flattered that Penev et al. cite my Elsevier challenge work (doi:10.1016/j.websem.2010.03.004, preprint at hdl:10101/npre.2009.3173.1) as one of the sources of inspiration (along with the landmark Shotton et al. "Adventures in Semantic Publishing: Exemplar Semantic Enhancements of a Research Article" doi:10.1371/journal.pcbi.1000361, which I've discussed previously). It is also good to see the TaxPub XML schema used by a publisher, and Scratchpads being a part of the process of publishing taxonomic information.

Deep linking

My initial impression is that there is huge of potential here, although I think there is still lots to do. I'm not totally convinced that popups are they way to go (although I've dabbled with them as well), and we need to move beyond simply linking to other sites to a deeper form of integration. For example, a Zookeys article might link to BHL via a taxonomic name, but how about deeper linking? For example, the paper by Brake and von Tschirnhaus (doi:10.3897/zookeys.50.505) contains the following citations:

Biró L (1899) Commensalismus bei Fliegen. Természetrajzi füzetek 22: 198–204.

Kertész K (1899) Verzeichnis einiger, von L. Biró in Neu-Guinea und am Malayischen Archipel gesammelten Dipteren. Természetrajzi füzetek 22: 173–19

Neither reference has any links in the HTML, so the user is under the impression that they aren't available online, but both references have been scanned by BHL. You can see full text for these articles in BioStor (references 52005 and 52004, respectively -- note that the pagination for Biró 1899 is given incorrectly in the paper). This is one area where BHL has a lot to offer publishers, and it would be great to see BHL provide the services publishers need to add these links to their articles.

This integration should go both ways. It's odd that the paper by Brake and von Tschirnhaus contains LSID used by the ZooBank for this paper (urn:lsid:zoobank.org:pub:DABB03F4-A128-43BB-990C-02F25D656B00, see the <self-uri> tag in the XML), but ZooBank doesn't know about the DOI for the paper, hence the ZooBank page for this article has no link to the article itself. It's time to join this stuff together.

What's next?

What I'd really like to see is article XML repurposed as, say, RDF, and used to populate a database so that we can query it. In this way we can start to atomise the article into useful parts, and recombine them in new and interesting ways. Might be something to play with over the summer.

On a practical level, I'm somewhat bemused by the variety of XML formats being used by open access publishers. PLoS use version 2.0 of the NLM Journal Archiving and Interchange Tag Suite, and I wrote a XSLT style sheet to transform PLoS articles for viewing on an iPad. TaxPub is based on version 3.0 of the NLM DTD, which breaks quite a bit of my code relating to citations, so I'll have to tweak this to get it to display Zookeys articles correctly. Handling TaxPub itself will also require some additional work. Then there are the BMC journals, which have their own flavour of XML (based on something called the "KETON DTD"). It's all a bit messy. But I guess it'd be no fun if it was too easy...


Wednesday, August 20, 2008

ZooKeys, DOIs, Open Access, and RSS, but why?


ZooKeys (ISSN 1313-2970) is a new journal for the rapid publication of taxonomic names, rather like Zootaxa. On first glance it has some nice features, such as being Open Access (using the Creative Commons Attribution license), DOIs, and RSS feeds -- although these don't validate, partly due to an error at the bottom of the feeds:
<b>Warning</b>:  Cannot modify header information - headers already sent by (output started 
at /home/pensofto/public_html/zookeys/cache/t_compile/%%C2^C2D^C2D18A7A%%rss.tpl.php:5)
in <b>34</b><br />
So, something to fix there.

The RSS feeds are reasonably informative, although they don't include the DOI, which somewhat defeats the point of having them. DOIs need to be first class citizens in taxonomic literature.

But these are technical matters, the real question is why? Why create a new journal when Zootaxa is pumping out new taxaonomic papers at an astonishing rate. Why not combine forces (DOIs and RSS for Zootaxa, yay!)? There is an editorial doi:10.3897/zookeys.1.11 that is rather coy about this. Yes, Open Access is a Good Thing™, but Zootaxa has some Open Access articles. Why dilute the effort to transform zoological taxonomy by creating a new journal?

Thursday, May 08, 2008

Open Access logo - help


Trivial as this may seem, I'm trying to find out who designed this "Open Access" logo, and whether there are some original files for it. I've seen this logo (or variations on it) on the PLoS web site, the open access publisher Hindawi Publishing, and the Mac OS X program Papers uses it.

It's driving me nuts that I can't find the original. Other widely used logos typically have a site where a designer or organisation provides a bunch of versions in different formats, such as the Creative Commons symbols, the ubiquitous RSS feed icon, and other projects such as the Geotag icon. It's sometimes desirable to have different formats of an icon, and ideally have a vector-based version (e.g., in EPS or SVG) format that can be used to create images at different resolutions, and these projects provide these files.

Apart from the interesting fact that there doesn't seem to be a standard logo or symbol for Open Access, does anybody know where this logo came from?