ExplorerBiomedical EngineeringEngineering
Research PaperResearchia:202606.23037

A Benchmark of (MRI-) Foundation Models to Predict IDH Mutational Status in Glioma

Nathan Hollet

Abstract

Non-invasive prediction of glioma molecular status from routine magnetic resonance imaging (MRI) has shown promising performance, but model generalization remains challenging given small-scale matched imaging-genomic datasets. Foundation models may address this bottleneck, but a comprehensive benchmark is needed to establish the impact of diverse architectures, pre-training domains, and objectives. Given the use case of isocitrate dehydrogenase (IDH) mutation prediction from FLAIR and post-contr...

Submitted: June 23, 2026Subjects: Engineering; Biomedical Engineering

Description / Details

Non-invasive prediction of glioma molecular status from routine magnetic resonance imaging (MRI) has shown promising performance, but model generalization remains challenging given small-scale matched imaging-genomic datasets. Foundation models may address this bottleneck, but a comprehensive benchmark is needed to establish the impact of diverse architectures, pre-training domains, and objectives. Given the use case of isocitrate dehydrogenase (IDH) mutation prediction from FLAIR and post-contrast T1 MRIs, we compared four image-based foundation models, BrainIAC, MRI-CORE, BiomedCLIP, and BrainDINO, against radiomics-based TabPFN and logistic regression baselines. Prediction performance and calibration were assessed across four public adult glioma cohorts and an external post-treatment cohort. Within-cohort, TabPFN matched or outperformed all visual encoders, achieving 0.92 (0.03) AUROC and 0.74 (0.17) AUPRC (mean (SD) across all datasets). Among visual encoders, BiomedCLIP performed best (0.85 (0.08) AUROC), with BrainDINO competitive (0.82 (0.09) AUROC), while MRI-specific encoders (BrainIAC, MRI-CORE) consistently underperformed. Cross-cohort transfer showed moderate AUROC degradation but stronger AUPRC sensitivity to prevalence shifts. On the external cohort, BiomedCLIP achieved the highest AUROC (0.74 (0.07)), whereas TabPFN provided superior calibration (Expected Calibration Error 0.07 (0.01)). These results indicate that representation modality and evaluation context critically influence foundation-model performance in MRI-based molecular prediction. Tabular foundation models on radiomic features provide a strong, well-calibrated baseline, while image foundation models may offer complementary value under clinically distinct distribution shifts. Code available at https://github.com/nathanhollet/idh-status-prediction


Source: arXiv:2606.23172v1 - http://arxiv.org/abs/2606.23172v1 PDF: https://arxiv.org/pdf/2606.23172v1 Original Link: http://arxiv.org/abs/2606.23172v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
Jun 23, 2026
Topic:
Biomedical Engineering
Area:
Engineering
Comments:
0
Bookmark
A Benchmark of (MRI-) Foundation Models to Predict IDH Mutational Status in Glioma | Researchia