ExplorerArtificial IntelligenceAI
Research PaperResearchia:202604.20001

ASMR-Bench: Auditing for Sabotage in ML Research

Eric Gan

Abstract

As AI systems are increasingly used to conduct research autonomously, misaligned systems could introduce subtle flaws that produce misleading results while evading detection. We introduce ASMR-Bench (Auditing for Sabotage in ML Research), a benchmark for evaluating the ability of auditors to detect sabotage in ML research codebases. ASMR-Bench consists of 9 ML research codebases with sabotaged variants that produce qualitatively different experimental results. Each sabotage modifies implementati...

Submitted: April 20, 2026Subjects: AI; Artificial Intelligence

Description / Details

As AI systems are increasingly used to conduct research autonomously, misaligned systems could introduce subtle flaws that produce misleading results while evading detection. We introduce ASMR-Bench (Auditing for Sabotage in ML Research), a benchmark for evaluating the ability of auditors to detect sabotage in ML research codebases. ASMR-Bench consists of 9 ML research codebases with sabotaged variants that produce qualitatively different experimental results. Each sabotage modifies implementation details, such as hyperparameters, training data, or evaluation code, while preserving the high-level methodology described in the paper. We evaluated frontier LLMs and LLM-assisted human auditors on ASMR-Bench and found that both struggled to reliably detect sabotage: the best performance was an AUROC of 0.77 and a top-1 fix rate of 42%, achieved by Gemini 3.1 Pro. We also tested LLMs as red teamers and found that LLM-generated sabotages were weaker than human-generated ones but still sometimes evaded same-capability LLM auditors. We release ASMR-Bench to support research on monitoring and auditing techniques for AI-conducted research.


Source: arXiv:2604.16286v1 - http://arxiv.org/abs/2604.16286v1 PDF: https://arxiv.org/pdf/2604.16286v1 Original Link: http://arxiv.org/abs/2604.16286v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
Apr 20, 2026
Topic:
Artificial Intelligence
Area:
AI
Comments:
0
Bookmark
ASMR-Bench: Auditing for Sabotage in ML Research | Researchia