Tuesday, October 29, 2024

Internet Archive as a single point of failure

How to cite: Page, R. (2024). Internet Archive as a single point of failure https://doi.org/10.59350/1r3m1-c5e22

Just a placeholder to mark the ongoing impact of the Internet Archive being attacked (see here, here and here for details).

The impact of this on the Biodiversity Heritage Library (BHL) has been huge, and reveals the extent to which BHL depends on the Archive. The Archive is:

  • BHL’s long-term archival storage of book scans
  • BHL’s processing pipeline for converting images to text
  • BHL’s store for additional metadata (e.g., page numbers)
  • BHL’s image server (i.e., all the images of scanned books on the BHL website are served from the Archive)

The attack on the Archive has crippled BHL (parts are slowly coming back). I think this is time for a fundamental rethink in how BHL manages its data, its processing pipeline, and how it serves images.

Written with StackEdit.