One metaphor from "A Vast Machine" is the difference between "global data" and "making data global". Getting data from around the world ("global data") is one challenge, but then comes making that data global:
building complete, coherent, and consistent global data sets from incomplete, inconsistent, and heterogeneous datasources
The focus of GBIF's data portal is global data, bringing together specimen records and observations from all around the world. This is global data, but one could argue that it's not yet ready to be used for most applications. For example, GBIF doesn't give you the geographic distribution of a given species, merely where it's been recorded from (based on that subset of records that have been digitised). That's a very important start, but if we had for each species an estimated distribution based on museum records, observations, published maps, together with habitat modelling, then we'd be closer to a dataset that we could use to tackle key questions about the distribution of biodiversity.
I'm simply waving my arms around here (no, really?), but it's worth thinking about whether the macroecology that conservation and biodiversity focusses on is actually the important thing to consider if you want to model fundamental biological processes. Might macro-organisms be like the weather, and the microbiome is like the climate. As humans we notice the weather, because it is at a scale that affects us directly. But if the weather is a (not entirely predictable) consequence of the climate, what is the equivalent of global climate model for biodiversity?