Monday, October 27, 2008

Modelling GUIDs and taxon names in Mediawiki

Thinking more and more about using Mediawiki (or, more precisely, Semantic Mediawiki) as a platform for storing and querying information, rather than write my own tools completely from scratch. This means I need ways of modelling some relationships between identifiers and objects.

The first is the relationship between document identifiers such as DOIs and metadata about the document itself. One approach which seems natural is to create a wiki page for the identifier, and have that page consist of a #REDIRECT statement which redirects the user to the wiki page on the actual article.



This seems a reasonable solution because:
  • The user can find the article using the GUID (in effect we replicate the redirection DOIs make use of)
  • The GUID itself can be annotated
  • It is trivial to have multiple GUIDs linking to the same paper (e.g., PubMed identifiers, Handles, etc.).

Taxon names present another set of problems, mainly because of homonyms (the same name being give to two or more diferent taxa).The obvious approach is to do what Wikipedea does (e.g., Morus), namely have a disambiguation page that enable the user to choose which taxon they want. For example:



In this example, there are two taxon names Pinnotheres, so the user would be able to choose between them.

For names which had only one corresponding taxon name we would still have two pages (one for the name string, and one for the taxon name), which would be linked by a REDIRECT:



The advantage of this is that if we subsequently discover a homonym we can easily handle it by changing the REDIRECT page to a disambiguation page. In the meantime, users can simply use the name string because they will be automatically redirected to the taxon name page (which will have the actual information about the name, for example, where it was published).

Of course, we could do all of this in custom software, but the more I look at it the power to edit the relationships between objects, as well as the metadata, and also make inferences makes Semantic Mediawiki look very attractive.

3 comments:

denny said...

Hi Rod,

even though you could redirect the DOI, I'd actually think that the DOI is rather a property of a paper. In this case you could not only offer a specific search by DOI page (using the search by property special page), but also when exporting the data in RDF the knowledge within your wiki could be consolidated with knowledge in other databases that also use the DOI as a property, and thus a metasearch for all entries on a specific article can be created.

Just a thought, I hope your project moves on well,
cheers,
denny

Roderic Page said...

My plan was to do both, in the sense that the DOI would be listed on the page (say, using [[dcterms:identifier::doi:10.1038/nature05634]] ), but would also have it's own page so that users could find the page describing the article simply by using the DOI in a URL.

denny said...

Ah yes, this sounds like the best solution.