ExplorerPharmaceutical ResearchBiochemistry
Research PaperResearchia:202605.27026

MetaboKG: An Analysis-centric Knowledge Graph Framework for Untargeted Metabolomics

Matthieu Féraud

Abstract

Untargeted metabolomics generates large volumes of tandem mass spectrometry (MS/MS) data and computational annotations that can reveal molecular mechanisms across organisms and environments. Public reuse has improved through harmonized repository metadata and access infrastructures such as Pan-ReDU, and through metabolomics knowledge graphs such as ENPKG and METRIN-KG. Yet the analytical layer remains fragmented: spectra, features, workflow outputs, annotations, confidence evidence, and contextu...

Submitted: May 27, 2026Subjects: Biochemistry; Pharmaceutical Research

Description / Details

Untargeted metabolomics generates large volumes of tandem mass spectrometry (MS/MS) data and computational annotations that can reveal molecular mechanisms across organisms and environments. Public reuse has improved through harmonized repository metadata and access infrastructures such as Pan-ReDU, and through metabolomics knowledge graphs such as ENPKG and METRIN-KG. Yet the analytical layer remains fragmented: spectra, features, workflow outputs, annotations, confidence evidence, and contextual metadata are still scattered across repositories and tabular artifacts. We present MetaboKG, an analysis-centric knowledge graph framework for engineering reusable metabolomics knowledge from public repositories, metadata, and GNPS molecular network results. MetaboKG contributes a transformation workflow that preserves links between repository exports, analytical files, spectra, features, and annotation results; a semantic model grounded in PROV-O and SIO and aligned with the Mass Spectrometry ontology (MS), ChEBI, NCBITaxon, ENVO, and NCIT to represent provenance, analytical evidence, metadata attributes, and controlled vocabulary terms; and a Universal Annotation Identifier strategy extending the Universal Spectrum Identifier (USI) with workflow-specific components for late binding, incremental ingestion, and post hoc linkage across analyses. We demonstrate MetaboKG at the public-repository scale on 680 GNPS molecular networking results and evaluate it through competency questions covering biochemical enrichment, environmental specificity, and cross instrument analytical variation. Results show that graph-based integration supports traceable annotation reuse and reproducible SPARQL exploration of biochemical relationships that remain fragmented across repository-native resources.


Source: arXiv:2605.24706v1 - http://arxiv.org/abs/2605.24706v1 PDF: https://arxiv.org/pdf/2605.24706v1 Original Link: http://arxiv.org/abs/2605.24706v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
May 27, 2026
Topic:
Pharmaceutical Research
Area:
Biochemistry
Comments:
0
Bookmark
MetaboKG: An Analysis-centric Knowledge Graph Framework for Untargeted Metabolomics | Researchia