Wednesday, November 19, 2025

Model Context Protocol (MCP) and triple stores: natural language queries for knowledge graphs

Some quick notes based on experiments with Model Context Protocol (MCP) and (Claude](https://claude.ai).

Model Context Protocol (MCP) is all the rage right now, and I’ve been slow to take a look. Kingsley Idehen recently wrote The Semantic Web Project Didn’t Fail — It Was Waiting for AI (The Yin of its Yang) where he argued that Large Language Models (LLMs) provide (finally) a user-friendly way to query triple stores (i.e., knowledge graphs). Instead of simply presenting users with an empty SPARQL query box, we can now formulate a query in natural language and have AI convert that into SPARQL.

That eases the challenge of learning a new query language, but it get’s better. MCP enables us to connect an AI with another service. It acts a bit like a broker. You tell the AI what you want to do, the AI talks to the MCP server to figure out how to do what you want, gets the results, then converts them into a natural language (or other format) result that you can use. hence you can have a conversation with a knowledge graph!

There are examples of MCP servers that speak SPARQL, such as MCP Server SPARQL by
Eric Zhu. Since I mostly program in PHP (gasp) version 7 (gasp) I ended up asking ChatGPT to help write a simple MCP server. There then followed a dance between ChatGPT and Claude where ChatGPT would very confidently declare that the code was done, and Claude would get increasingly exasperated that I appeared to be trying to do something that wasn’t working. I eventually had to tell Claude to back the f**k off with its snarky comments and maybe be more helpful in its messages. Eventually I got a simple server up and running.

The code php-mcp-server is very basic, but supports SPARQL queries running on an instance of Oxigraph that runs on my Mac. For example, I can ask:

What publication cites the sequence https://identifiers.org/insdc/MH493846

and Claude will respond:

I’ll query for publications that cite that sequence.
Good! I found a publication. Let me get more details about it:
Perfect! The sequence https://identifiers.org/insdc/MH493846 is cited by:
Publication: “Integrative taxonomy reveals a new Gammarus species (Crustacea, Amphipoda) surviving in a previously unknown southeast European glacial refugium”
Authors: CopilaČ™-Ciocianu, D., Zimta, A.-A., and Petrusek, A.
DOI: https://doi.org/10.1111/jzs.12248

It will also show me the SPARQL queries it makes to find this information.

This feels like a game changer. The MCP server I’ve written is incredibly crude, but I can now start to query a knowledge graph about DNA barcodes and associated literature in plain English, and get back useful results.

What I really want to do is combine this with details on the actual papers (for example, lists of specimens sequenced, whether they are type specimens, where were the samples collected from, etc.) as a way to help curate databases such as BOLD. I recently released BOLD View (see blog post BOLD View: exploring DNA barcodes) to make it easier to explore DNAbarcode data, and I’m fascinated by how much scope there is for curation to add taxonomic identifications, geographic location, etc.

To make this curation eassier I’ve started to assemble a knowledge graph linking barcodes, Genbank sequences, and taxonomic names to the associated scientific literature, with the ultimate goal of being able to ask: “given this barcode that lacks a proper scientific name, is there anything in the published literature that can tell me what it actually is?”. The idea of being able to literally ask that question using a combination of an AI and a MCP server is vert exciting.

Written with StackEdit.