This note accompanies a dataset that I uploaded to Zenodo (https://doi.org/10.5281/zenodo.15824274). My goal in creating this dataset is to link data created on the Barcode of Life Data Systems to the DOIs for those datasets, and then to link those data DOIs to DOIs for the papers (if any) that created those datasets, and/or cited them.
For example, the paper “DNA barcodes enable higher taxonomic assignments in the Acari” (Young et al., 2021) cites three barcode datasets: DS-BINFL, DS-5FLR, and DS-10FLR. Each of these datasets has a DOI of the form: https://doi.org/10.5883/
plus the DS
number. One reason I want to m ake these links is so that when the dataset is displayed, say, in my BOLD View app, I could also show the papers that created/cited the dataset, providing some context to the data (e.g., why was the data collected?). Another reason is that once we link data to papers we can do some interesting things, such as assign credit (Zeng et al. 2020), or discover what organisations funded the work. I hope to explore these topics in the future.
Matching datasets to publications was a tedious process, there are more details on the GitHub repository. I started with a Google Scholar search, then did lots of manual filtering and cleaning. Most of the articles have DOIs, and only these articles are included in the Zenodo dataset, which is intended as a contribution to Make Data Count.
This only scratches the surface of what could be done. There are many datasets that I could not find in the literature (they may never have been cited). I also want to retrieve links between individual DNA barcodes and the papers that published them. Apart from context and metrics, I’m also interested in whether these papers might contain more detailed information about the sequences, such as geographic localities. In this way we could potentially enrich the BOLD database, as part of the “virtuous cycle” envisioned by David Schindel (Schindel and Page, 2024).
References
Page, R. (2025). Citations of datasets published by Barcode of Life Data Systems (BOLD) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.15824274
Schindel, D. E., & Page, R. M. P. (2024). Creating Virtuous Cycles for DNA Barcoding: A Case Study in Science Innovation, Entrepreneurship, and Diplomacy. DNA Barcoding, 7–32. https://doi.org/10.1007/978-1-0716-3581-0_1
Young, M. R., deWaard, J. R., & Hebert, P. D. N. (2021). DNA barcodes enable higher taxonomic assignments in the Acari. Scientific Reports, 11(1). https://doi.org/10.1038/s41598-021-95147-8
Zeng, Tong, Longfeng Wu, Sarah Bratt, and Daniel E. Acuna. ‘Assigning Credit to Scientific Datasets Using Article Citation Networks’. Journal of Informetrics 14, no. 2 (1 May 2020): 101013. https://doi.org/10.1016/j.joi.2020.101013.
Written with StackEdit.