A Gap Between Decision Trees and Neural Networks
Abstract
We study when geometric simplicity of decision boundaries, used here as a notion of interpretability, can conflict with accurate approximation of axis-aligned decision trees by shallow neural networks. Decision trees induce rule-based, axis-aligned decision regions (finite unions of boxes), whereas shallow ReLU networks are typically trained as score models whose predictions are obtained by thresholding. We analyze the infinite-width, bounded-norm, single-hidden-layer ReLU class through the Rado...
Description / Details
We study when geometric simplicity of decision boundaries, used here as a notion of interpretability, can conflict with accurate approximation of axis-aligned decision trees by shallow neural networks. Decision trees induce rule-based, axis-aligned decision regions (finite unions of boxes), whereas shallow ReLU networks are typically trained as score models whose predictions are obtained by thresholding. We analyze the infinite-width, bounded-norm, single-hidden-layer ReLU class through the Radon total variation () seminorm, which controls the geometric complexity of level sets. We first show that the hard tree indicator has infinite . Moreover, two natural split-wise continuous surrogates--piecewise-linear ramp smoothing and sigmoidal (logistic) smoothing--also have infinite in dimensions d>1, while Gaussian convolution yields finite but with an explicit exponential dependence on . We then separate two goals that are often conflated: classification after thresholding (recovering the decision set) versus score learning (learning a calibrated score close to ). For classification, we construct a smooth barrier score with finite whose fixed threshold exactly recovers the box. Under a mild tube-mass condition near , we prove an calibration bound that decays polynomially in a sharpness parameter, along with an explicit upper bound in terms of face measures. Experiments on synthetic unions of rectangles illustrate the resulting accuracy--complexity tradeoff and how threshold selection shifts where training lands along it.
Please sign in to join the discussion.
No comments yet. Be the first to share your thoughts!
Jan 7, 2026
Data Science
Data Science
0