ExplorerData ScienceMachine Learning
Research PaperResearchia:202606.05084

How abundant are good interpolators?

August Y. Chen

Abstract

Let $S$ be the set of unit norm linear classifiers $θ\in \mathbb{R}^d$ which correctly classify every point of a labeled dataset $(X_i,y_i)_{i=1}^n$, $X_i \in \mathbb{R}^d$, $y_i \in \{-1,+1\}$, with a possibly negative margin $κ$ fixed in advance. Under two natural data-generating distributions of the $(X,y)$ pairs -- a Gaussian mixture model and a logistic model with Gaussian features -- and in the proportional regime $n/d \to α$ with small enough $α$, we establish a large deviation principle ...

Submitted: June 5, 2026Subjects: Machine Learning; Data Science

Description / Details

Let SS be the set of unit norm linear classifiers θRdθ\in \mathbb{R}^d which correctly classify every point of a labeled dataset (Xi,yi)i=1n(X_i,y_i)_{i=1}^n, XiRdX_i \in \mathbb{R}^d, yi{1,+1}y_i \in \{-1,+1\}, with a possibly negative margin κκ fixed in advance. Under two natural data-generating distributions of the (X,y)(X,y) pairs -- a Gaussian mixture model and a logistic model with Gaussian features -- and in the proportional regime n/dαn/d \to α with small enough αα, we establish a large deviation principle on the event that a point θθ chosen uniformly at random from SS achieves a given generalization error, with high probability over the choice of the data. The associated large deviation rate function is deterministic and describes the proportion, at the exponential scale in dd, of interpolating classifiers having a given desired performance. As a consequence, we establish the following concentration phenomenon: all but an exponentially small fraction of interpolating classifiers have approximately the same generalization performance given by the unique maximizer of this rate function. We numerically compare this maximizer to the performance of empirical risk minimization by gradient descent and to the performance of a natural linear program, both finding a point in SS, and deduce that in the overparametrized regime of small αα, these efficient procedures outperform the vast majority of interpolators, pointing to their nontrivial benign overfitting in this setting.


Source: arXiv:2606.06469v1 - http://arxiv.org/abs/2606.06469v1 PDF: https://arxiv.org/pdf/2606.06469v1 Original Link: http://arxiv.org/abs/2606.06469v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
Jun 5, 2026
Topic:
Data Science
Area:
Machine Learning
Comments:
0
Bookmark