|Nystactes Gloger 1827||4888093|
|Nystactes Kaup 1829||4888094|
If I want to map these names to GBIF then these are corresponding taxa with the name "Nystactes":
|Nystactes Böhlke, 1957||2403398|
|Nystactes Gloger, 1827||2475109|
|Nystactes Kaup, 1829||3239722|
Clearly the names are almost identical, but there are enough little differences (presence or absence of comma, "o" versus "ö") to make things interesting. To make the mapping I construct a bipartite graph where the nodes are taxon names, divided into two sets based on which database they came from. I then connect the nodes of the graph by edges, weighted by how similar the names are. For example, here is the graph for "Nystactes" (displayed using Google images:
I then compute the maximum weighted bipartite matching using a C++ program I wrote. This matching corresponds to the solid lines in the graph above.
In this way we can make a sensible guess as to how names in the two databases relate to one another.