Tuesday, October 15, 2013

What can Global Biodiversity Information Facility (GBIF) do for you?

I've recently been appointed Chair of the Science Committee of the Global Biodiversity Information Facility (GBIF) http://www.gbif.org [1]. The committee is a small group of people with a range of backgrounds, and one of our roles is to advise GBIF on matters scientific (e.g., what kinds of data GBIF should collect?, what kinds of scientific questions should GBIF help answer?, etc.).

There have been formal surveys (see the papers in the journal "Biodiversity Informatics" https://journals.ku.edu/index.php/jbi/issue/view/370/showToc ), meetings, and a "vision" statement (the "Global Biodiversity Informatics Outlook, http://www.biodiversityinformatics.org/ ). But there's always the chance that these fora may miss some points of view, so I'm keen to get feedback on what sort of things GBIF could do to improve the way it can help people tackle the scientific questions they are interested in.

For example, is there some fundamental limitation that GBIF has that prevents it being useful to you? Is there some feature/data type/geographic coverage/etc. that could be addressed that would make it more useful? Is there a role that GBIF should take on that it hasn't done so? A useful analogy might be to think of the central role GenBank plays in genomics, both as a place to archive your data (sequences), a repository of other people's data that you can access, and a research tool (e.g., BLAST searches to locate similar sequences). Is that the sort of thing you'd want from GBIF, or is it something entirely different?

I'd welcome any comments, suggestions, views, etc. Feel free to add them as comments to this blog, or email me (rdmpage at gmail.com).

I should stress that this is simply me trying to calibrate my perception of GBIF's role with what others think. Also, note if you have specific comments on things such as the GBIF web site please use the feedback tab on the site (that way it will reach the people who can do something about it).

[1] For those unfamiliar with GBIF, its mission "is to make the world's biodiversity data freely and openly available via the Internet". At present the bulk of the data are observations of organisms (mostly multicellular eukaryotes, i.e., animals, plants and fungi) based on either museum collections or observations of living organisms. You can get an idea of the kind of science that uses GBIF-hosted data from this list of papers on Mendeley http://www.mendeley.com/groups/1068301/gbif-public-library/


Based on responses so far I'll compile a list below of suggestions/themes.


  • Have the ability to annotate records (e.g., flag errors) and some mechanism where those annotations get incorporated into GBIF and/or primary data providers.

Dashboard/gap analysis

  • For any search provide information on how complete and/or representative the data is likely to be (for example, are vertebrates over-represented, what is the extent of sampling in this area, etc.).

Geographic coverage

  • Fill big gaps in coverage (e.g., Russia, China, much of the tropics).


  • Link GBIF occurrence records to sequences in GenBank


  • Who identified specimen?
  • Details on georeferencing (esp. if not GPS)

Data types

  • DNA sequences
  • abundance

Data sources

  • GenBank
  • Literature records (e.g., data mining published papers)
    MEIER, R., & DIKOW, T. (2004). Significance of Specimen Databases from Taxonomic Revisions for Estimating and Mapping the Global Species Diversity of Invertebrates and Repatriating Reliable Specimen Data. Conservation Biology, 18(2), 478–488. doi:10.1111/j.1523-1739.2004.00233.x
  • "Gray" literature, e.g. field books, reports


  • Lack of stable identifiers for occurrences
  • Contributors of specimen data not (yet) in an institution have to mint their own identifiers, with no way of linking those to any future identifier minted by the institution that will eventually house their collection)


  • Being able to refine taxon search by geographic region
  • Search on any Darwin Core field
  • Wild card search
  • Support for GIS data formats
  • Search using arbitrary bounding polygons (e.g., draw a shape on a map)