Cross-Audit Projection for Model Risk Prediction
Abstract
For training-data-based model risk prediction, $K$-fold cross-validation~(CV) is widely used to mitigate the well-known over-optimism of the empirical risk and is often regarded as reliable. However, for binary classification via empirical risk minimization, our numerical studies reveal a surprising phenomenon: $K$-fold CV may perform poorly in estimating class-specific risks, even worse than the empirical estimator. We perform a higher-order asymptotic analysis showing that $K$-fold CV may conv...
Description / Details
For training-data-based model risk prediction, -fold cross-validation~(CV) is widely used to mitigate the well-known over-optimism of the empirical risk and is often regarded as reliable. However, for binary classification via empirical risk minimization, our numerical studies reveal a surprising phenomenon: -fold CV may perform poorly in estimating class-specific risks, even worse than the empirical estimator. We perform a higher-order asymptotic analysis showing that -fold CV may converge at a slower rate, whereas the empirical estimator exhibits a second-order asymptotic bias that explains its over-optimism. These findings motivate a novel two-step procedure for model risk prediction, termed cross-audit projection (CAP). The cross-audit step adopts the same resampling scheme as -fold CV to estimate over-optimism in subsamples, while the asymptotic-theory-informed projection step adjusts for the reduced sample size in bias correction of the empirical risk. The resulting CAP estimator is first-order asymptotically equivalent to the empirical risk while achieving second-order asymptotic unbiasedness. An accompanying inference procedure is also developed. Simulation studies support theoretical advantages of CAP and demonstrate favorable finite-sample performance. An application to breast cancer detection further illustrates the proposed method.
Source: arXiv:2607.02328v1 - http://arxiv.org/abs/2607.02328v1 PDF: https://arxiv.org/pdf/2607.02328v1 Original Link: http://arxiv.org/abs/2607.02328v1
Please sign in to join the discussion.
No comments yet. Be the first to share your thoughts!
Jul 3, 2026
Data Science
Statistics
0