ExplorerData ScienceMachine Learning
Research PaperResearchia:202605.16039

Training ML Models with Predictable Failures

Will Schwarzer

Abstract

Estimating how often an ML model will fail at deployment scale is central to pre-deployment safety assessment, but a feasible evaluation set is rarely large enough to observe the failures that matter. Jones et al. (2025) address this by extrapolating from the largest k failure scores in an evaluation set to predict deployment-scale failure rates. We give a finite-k decomposition of this estimator's forecast error and show that it has a built-in bias toward over-prediction in the typical case, wh...

Submitted: May 16, 2026Subjects: Machine Learning; Data Science

Description / Details

Estimating how often an ML model will fail at deployment scale is central to pre-deployment safety assessment, but a feasible evaluation set is rarely large enough to observe the failures that matter. Jones et al. (2025) address this by extrapolating from the largest k failure scores in an evaluation set to predict deployment-scale failure rates. We give a finite-k decomposition of this estimator's forecast error and show that it has a built-in bias toward over-prediction in the typical case, which is the safety-favorable direction. This bias is offset when the evaluation set misses a rare high-failure mode that the deployment set contains, leaving the forecast to under-predict at deployment scale. We propose a fine-tuning objective, the forecastability loss, that addresses this failure mode. In two proof-of-concept experiments, a language-model password game and an RL gridworld, fine-tuning substantially reduces held-out forecast error while preserving primary-task capability and achieving safety similar to that of supervised baselines.


Source: arXiv:2605.15134v1 - http://arxiv.org/abs/2605.15134v1 PDF: https://arxiv.org/pdf/2605.15134v1 Original Link: http://arxiv.org/abs/2605.15134v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
May 16, 2026
Topic:
Data Science
Area:
Machine Learning
Comments:
0
Bookmark
Training ML Models with Predictable Failures | Researchia