Thursday, January 15, 2009

Wikis versus Scratchpads

Yes, I know this is ultimately a case of the "genius of and", but the more I play with the Semantic Mediawiki extension the more I think this is going to be the most productive way forward. I've had numerous conversations with Vince Smith about this. Vince and colleagues at the NHM have been doing a lot of work on "Scratchpads" -- Drupal based webs sites that tend to be taxon-focussed. My worry is that in the long term this is going to create lots of silos that some poor fool will have to aggregate together to do anything synthetic with. This makes inference difficult, and also raises issues of duplication (for example, in bibliographies).

I've avoided wikis for a while because of the reliance on plain text (i.e., little structure) (see this old post of mine on Semant), but Semantic Mediawiki provides a fairly simple way to structure information, and it also provides some basic inference. This makes it possible to create wiki pages that are largely populated by database queries, rather than requiring manual editing. For example, I have queries now that will automatically populate a page about a person with that person's publications, and any taxa named after that person. The actual wiki page itself has hardly any text (basically the name of the person). That is, nobody has to manually edit the wiki page to update lists of published papers. Similarly, maps can be generated in situ using queries that aggregate localities mentioned on a wiki page with localities for GenBank sequences and specimens. Very quickly relationships start to emerge without any manual intervention. The combination of templates and Semantic Mediawiki queries seems a pretty powerful way to aggregate information. There are, of course, limitations. The queries are fairly basic, and there's not the power of something like SPARQL, but it's a start. Coupled with the ease of editing to fix the errors in the contributing databases, and the ease of redirecting to handle multiple identifiers, I think a wiki-based approach has a lot of promise.

So, I've been teasing Vince that Drupal (or another CMS) is probably the wrong approach, and that semantic wikis are much more powerful (something Gregor Hagedorn has also been arguing). Vince would probably counter that the goal of scratchpads is to move taxonomists into the digital age by providing them with a customisable platform for them to store and display their data, hence his mission is to capture data. My focus is more to do with aggregating and synthesising the large amount of data we already have (and are struggling to do anything exciting with). Hence, the "genius of and". However, I still worry that when we have a world with loads of scratch pads with overlapping data, some poor fool will still have to merge them together to make sense of it all.


Drycafe said...

I can't say that I'm a big fan of the Drupal CMS (why on earth are URLs like /node/23 for human-intended content a good idea?), but isn't it database driven as well, and just as Semantic Mediawiki is based on Mediawiki, couldn't one create Semantic Drupal with similar benefits? This may also nicely build upon the taxonomy work that the EOL codesprint last year tried to further.

Roderic Page said...

Yes, they are both database driven, but I think a wiki environment is much more flexible. For example, templates provide a way for users to program the wiki itself, generating new ways to display content. Drupal (based on my experience with the Systematic Biology web site) is much more geared towards modular content (such as a blog).

I tend to view Mediawiki as a development environment (it comes with a text editor, built in version control, and a scripting language). Hence, it becomes easy to quickly prototype a complete system. Perhaps to put it another way. You could quite easily build EOL itself on Mediawiki, but I doubt anyone would try to do that using Drupal,

Ed said...

The aim of the Scratchpads is to be a useful tool for scientists and it allows the flexibility for them to create a resource useful both to them and their peers. They are, therefore, generally taxon-focussed. It's a good thing. Many people work on one (or a few) groups of organisms and don't need, or want, to be presented with irrelevant information.

The panels feature of the Scratchpads allows for aggregation from external sources. It uses one of the site's taxonomies to do this based on the taxon names. If you fed it a list of author names it could feasibly show a list of their papers (this is already possibly to a certain extent using Google Scholar).

It's all a matter of what you want to do. Drupal can already (or if not, could be) expanded to do everything you have mentioned. Whether people prefer to use a Wiki or Drupal is down to personal preference. Aggregators of data have always been, and probably will be for a long while yet, faced with processing data that's less than ideal for aggregation.

The fact is most people don't create data for it to be aggregated, rather they create it to be useful for what they and their peers want to achieve.

Roderic Page said...

Ed, I'm not trying to knock Drupal as such. It's just that, to me, Mediawiki (specifically with the Semantic Mediawiki extensions) feels much more like an environment within which I can do inference.

I'm struggling to find an analogy to express the difference. Perhaps its akin to the difference between writing HTML and writing code to generate HTML. One is much more powerful than the other. And it's not just about aggregation (a la panels, or my own iSpecies), it's about integration, which ultimately leads to discovering things we didn't previously know.

Ed said...

My scepticism may come from not playing with Semantic Mediawiki enough?

I guess my previous comment should have been more doubting whether we could compare Drupal as it's used for Scratchpads and Mediawiki as you're using. Similar tools, very different uses.

I'm all for integration, but we also have to put content online in the first place. The people putting information online won't always be able, or be interested in integration. A tool like Scratchpads means the data these people generate can be at least partially integrated into a bigger picture.

Roderic Page said...

Ed, I agree we need to get stuff online. My concern, though, is that people sometimes independently put the same data online, in some cases repeating old errors or introducing new ones that would have been caught if we had decent tools for integration and data cleaning in place.

Anonymous said...

Rod - see here for my response. It was a bit too long to post here, and would break my principle of keeping my content together ;-)


Roderic Page said...

Vince, re "the data aggregation problem (aka - make Rod's life easier)" -- I suspect the problem is a little larger than ensuring my life is easy (although all efforts in that direction are always appreciated) ;)

Anonymous said...

I know Rod – I’m being facetious. My point is that there are a lot more taxonomists out there than aggregators, and those aggregators are much more tech-savvy that the average taxonomist. It’s in the taxonomists’ interest to have their content aggregated for all the reasons you’ve espoused. But it is a question of priorities, and taking things gradually to bring about the kind of cultural shift I’m trying to induce. Without the taxonomists needs being met first, there would be nothing to aggregate except legacy content, and no one (i.e. no taxonomists) left to sort out this legacy. Doubtless we could have done more from the outset to make aggregation easier, but with the very limited resources we have, we had to prioritize.

Udi Oron said...


I believe that in the last years many web based projects and frameworks are developing so many features they become very similar - both drupal and (semantic) mediawiki grew huge communities, builidng extensions/plugins/modules, whihc usually provide 80-90% of any web app you could think of, including digital biodiversity apps.
Drupal have borrowed a lot of winning features from Wikis, while semantic mediawiki can be almost set up as a regular CMS (For example, I have pointed Roderic to Semantic Forms - a cool extension which lets end users fill in very complex data via forms, into live semantic pages). I guess in both systems you will need to write almost the same 5-10% custom code to make it behave nicer in a taxonomic context (whether it''s templates, views, content types, or God forbid, actual PHP...).

Personally I think the big question is whether the underlying technology, drupal or mediawiki, is truly good enough for advanced taxonomic apps - for example, I dislike mixing my data with a lot of non-scientific information in a single sql db, and SQL might not be the right solution for complex semantic relationships in the first place - i.e., I see semantic mediawiki and Drupal CCK/views as clever hacks, but for complex queries you need to do some loops in the air, hope the server won't choke, and never never look at the sql itself... I am really looking forward for a truly object-oriented/semantic database to use for the underlying technology, with a smart framework on top to allow smart content publishing and manipulation.

[Cross posted @ vsmith ]