ExplorerArtificial IntelligenceAI
Research PaperResearchia:202606.17002

ReproRepo: Scaling Reproducibility Audits with GitHub Repository Issues

Shanda Li

Abstract

Reproducing research results from papers and released code is central to scientific progress. Existing works have introduced benchmarks to evaluate whether LLM agents can assist with reproducibility, but they are difficult to scale due to their reliance on substantial manual effort for data curation and evaluation. We introduce ReproRepo, a scalable framework for reproducibility evaluation that leverages human-raised GitHub issues as naturally occurring supervision on realistic reproduction bloc...

Submitted: June 17, 2026Subjects: AI; Artificial Intelligence

Description / Details

Reproducing research results from papers and released code is central to scientific progress. Existing works have introduced benchmarks to evaluate whether LLM agents can assist with reproducibility, but they are difficult to scale due to their reliance on substantial manual effort for data curation and evaluation. We introduce ReproRepo, a scalable framework for reproducibility evaluation that leverages human-raised GitHub issues as naturally occurring supervision on realistic reproduction blockers. We instantiate ReproRepo on 1,149 recent machine learning papers from major conferences and evaluate four frontier model-agent configurations. Our results show that LLM agents, even without executing code, can identify many real-world reproducibility problems from paper-repository pairs: the best agent in our study, namely Codex with GPT-5.5, surfaces at least one semantically related human-reported blocker for ~90% of papers in the study. Further analysis shows that agents are particularly effective for surfacing visible failures and identifying the right semantic region, but may still be insufficient in exact localization. ReproRepo can serve as a reusable, scalable framework for future evaluations of LLM agents on real-world reproducibility auditing. Our code is released at https://github.com/LithiumDA/ReproRepo.


Source: arXiv:2606.18237v1 - http://arxiv.org/abs/2606.18237v1 PDF: https://arxiv.org/pdf/2606.18237v1 Original Link: http://arxiv.org/abs/2606.18237v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
Jun 17, 2026
Topic:
Artificial Intelligence
Area:
AI
Comments:
0
Bookmark
ReproRepo: Scaling Reproducibility Audits with GitHub Repository Issues | Researchia