Back to Explorer
Research PaperResearchia:202603.13002[Artificial Intelligence > AI]

SciMDR: Benchmarking and Advancing Scientific Multimodal Document Reasoning

Ziyu Chen

Abstract

Constructing scientific multimodal document reasoning datasets for foundation model training involves an inherent trade-off among scale, faithfulness, and realism. To address this challenge, we introduce the synthesize-and-reground framework, a two-stage pipeline comprising: (1) Claim-Centric QA Synthesis, which generates faithful, isolated QA pairs and reasoning on focused segments, and (2) Document-Scale Regrounding, which programmatically re-embeds these pairs into full-document tasks to ensure realistic complexity. Using this framework, we construct SciMDR, a large-scale training dataset for cross-modal comprehension, comprising 300K QA pairs with explicit reasoning chains across 20K scientific papers. We further construct SciMDR-Eval, an expert-annotated benchmark to evaluate multimodal comprehension within full-length scientific workflows. Experiments demonstrate that models fine-tuned on SciMDR achieve significant improvements across multiple scientific QA benchmarks, particularly in those tasks requiring complex document-level reasoning.


Source: arXiv:2603.12249v1 - http://arxiv.org/abs/2603.12249v1 PDF: https://arxiv.org/pdf/2603.12249v1 Original Link: http://arxiv.org/abs/2603.12249v1

Submission:3/13/2026
Comments:0 comments
Subjects:AI; Artificial Intelligence
Original Source:
View Original PDF
arXiv: This paper is hosted on arXiv, an open-access repository
Was this helpful?

Discussion (0)

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

SciMDR: Benchmarking and Advancing Scientific Multimodal Document Reasoning | Researchia