iPhylo: The Business of Extracting Knowledge from Academic Publications

Roderic D. M. Page

Saturday, December 11, 2021

The Business of Extracting Knowledge from Academic Publications

Markus Strasser (@mkstra write a fascinating article entitled "The Business of Extracting Knowledge from Academic Publications".

I spent months working on domain-specific search engines and knowledge discovery apps for biomedicine and eventually figured that synthesizing "insights" or building knowledge graphs by machine-reading the academic literature (papers) is *barely useful* :https://t.co/eciOg30Odc
— Markus Strasser (@mkstra) December 7, 2021

His TL;DR:

TL;DR: I worked on biomedical literature search, discovery and recommender web applications for many months and concluded that extracting, structuring or synthesizing "insights" from academic publications (papers) or building knowledge bases from a domain corpus of literature has negligible value in industry.

Close to nothing of what makes science actually work is published as text on the web.

After recounting the many problems of knowledge extraction - including a swipe at nanopubs which "are ... dead in my view (without admitting it)" - he concludes:

I’ve been flirting with this entire cluster of ideas including open source web annotation, semantic search and semantic web, public knowledge graphs, nano-publications, knowledge maps, interoperable protocols and structured data, serendipitous discovery apps, knowledge organization, communal sense making and academic literature/publishing toolchains for a few years on and off ... nothing of it will go anywhere.

Don’t take that as a challenge. Take it as a red flag and run. Run towards better problems.

Well worth a read, and much food for thought.