These snippets come from Wikipedia, well actually, from the DBpedia project. Behind the scenes I have a script that takes the GBIF taxon id for an ALA taxon (if it exists), queries Wikidata for the corresponding taxon and any associate identifiers of interest, and if there's a link to an English language Wikipedia page I do a quick SPARQL query to DBpedia to retrieve the snippet of text. At some point all of this could be sped up by adding the relevant data to the triple store and doing the query locally but for now it works well enough.
Of course, many snippets are little more than stubs, e.g. the snippet for another dung beetle genus Diorygopyx doesn't tell us much more than we can get from the information already displayed.
But having a text summary still seems worthwhile, which raises the question of what to do when Wikipedia doesn't know anything about a taxon? Obviously, we could start editing Wikipedia to flesh out its content, but that will take a while to filter into databases such as DBpedia. Another approach is to generate snippets from the triple store itself, in other words, generate natural language summaries from structured data. For example, we could generate summaries such as "Diorygopyx is a genus of Scarabaeidae or scarab beetles in the superfamily Scarabaeoidea" fairly easily from knowing the taxonomic hierarchy and a few common names. But we could also do more. In browsing Ozymandias I'm struck at times by how much our knowledge of one taxon depends on a major piece of taxonomic work, often done some time ago. For example, The Australian Crickets (Orthoptera: Gryllidae). Academy of Natural Sciences of Philadelphia, Monograph 22 by Otte and Alexander (1983) is a monumental taxonomic monograph, and many Australian cricket genera had most (or all) of their species described in that work. Imagine having a snippet that mentioned that (e.g., "Most species in this genus were described in 1983, no species have been discovered since."). That would give the reader some useful information, and perhaps also prompt them to ask "so, why haven't any more species been described?".
I think there's scope here to make the output from triple stores (and other databases) more approachable using natural language generation. This is obviously a big area, and there are some very sophisticated approaches for outputting very natural language (think chatbots), perhaps the most striking example of which is Google Duplex.
But we don't need quite this level of sophistication, something using much simpler techniques (e.g., nalgene-js) would probably be enough. Armed with some basic facts from the triple store, and some simple templates, we could probably generate some useful text snippets for many taxa in Ozymandias, and indeed for other entities. For example, David Shorthouse is outputting simple text summaries of the contribution of taxonomists to specimen collection and identification:
Arthur Loveridge identified Scolecomorphidae and collected Chamaeleonidae https://t.co/NimKzmUuOX— Bloodhound (@BloodhoundTrack) May 24, 2019
Imagine extending this to take into account publications, geography, etc. I think there's lots of scope here for moving beyond just displaying data and trying to generate human-friendly summaries of data.