Nystactes Bohlke | 2735131 |
Nystactes | 2787598 |
Nystactes Gloger 1827 | 4888093 |
Nystactes Kaup 1829 | 4888094 |
If I want to map these names to GBIF then these are corresponding taxa with the name "Nystactes":
Nystactes Böhlke, 1957 | 2403398 |
Nystactes Gloger, 1827 | 2475109 |
Nystactes Kaup, 1829 | 3239722 |
Clearly the names are almost identical, but there are enough little differences (presence or absence of comma, "o" versus "ö") to make things interesting. To make the mapping I construct a bipartite graph where the nodes are taxon names, divided into two sets based on which database they came from. I then connect the nodes of the graph by edges, weighted by how similar the names are. For example, here is the graph for "Nystactes" (displayed using Google images:
I then compute the maximum weighted bipartite matching using a C++ program I wrote. This matching corresponds to the solid lines in the graph above.
In this way we can make a sensible guess as to how names in the two databases relate to one another.