One metaphor from "A Vast Machine" is the difference between "global data" and "making data global". Getting data from around the world ("global data") is one challenge, but then comes making that data global:
building complete, coherent, and consistent global data sets from incomplete, inconsistent, and heterogeneous datasources
The focus of GBIF's data portal is global data, bringing together specimen records and observations from all around the world. This is global data, but one could argue that it's not yet ready to be used for most applications. For example, GBIF doesn't give you the geographic distribution of a given species, merely where it's been recorded from (based on that subset of records that have been digitised). That's a very important start, but if we had for each species an estimated distribution based on museum records, observations, published maps, together with habitat modelling, then we'd be closer to a dataset that we could use to tackle key questions about the distribution of biodiversity.
But if we continue with the theme that microbiology is the dark matter of biology, and if we look at projects like the Earth Microbiome Project, then we could argue that focussing on eukaryote, particularly macro-eukaryote such as plants, fungi, and animals, may be a mistake. To use a crude analogy, perhaps we have been focussing on the big phenomena (equivalent to thunder storms, flash floods, tornados, etc.) rather than the underlying drivers (equivalent to climatic processes such as those captured in global climate models). Certainly, any attempt to model the biosphere is going to have to include the microbiome, and indeed perhaps the microbiome would be enough to have a working model of the biosphere?
I'm simply waving my arms around here (no, really?), but it's worth thinking about whether the macroecology that conservation and biodiversity focusses on is actually the important thing to consider if you want to model fundamental biological processes. Might macro-organisms be like the weather, and the microbiome is like the climate. As humans we notice the weather, because it is at a scale that affects us directly. But if the weather is a (not entirely predictable) consequence of the climate, what is the equivalent of global climate model for biodiversity?