Explorerβ€ΊData Scienceβ€ΊData Science
Research PaperResearchia:202601.0797a176

A Gap Between Decision Trees and Neural Networks

Akash Kumar

Abstract

We study when geometric simplicity of decision boundaries, used here as a notion of interpretability, can conflict with accurate approximation of axis-aligned decision trees by shallow neural networks. Decision trees induce rule-based, axis-aligned decision regions (finite unions of boxes), whereas shallow ReLU networks are typically trained as score models whose predictions are obtained by thresholding. We analyze the infinite-width, bounded-norm, single-hidden-layer ReLU class through the Rado...

Submitted: January 7, 2026Subjects: Data Science; Data Science

Description / Details

We study when geometric simplicity of decision boundaries, used here as a notion of interpretability, can conflict with accurate approximation of axis-aligned decision trees by shallow neural networks. Decision trees induce rule-based, axis-aligned decision regions (finite unions of boxes), whereas shallow ReLU networks are typically trained as score models whose predictions are obtained by thresholding. We analyze the infinite-width, bounded-norm, single-hidden-layer ReLU class through the Radon total variation (RTV\mathrm{R}\mathrm{TV}) seminorm, which controls the geometric complexity of level sets. We first show that the hard tree indicator 1A1_A has infinite RTV\mathrm{R}\mathrm{TV}. Moreover, two natural split-wise continuous surrogates--piecewise-linear ramp smoothing and sigmoidal (logistic) smoothing--also have infinite RTV\mathrm{R}\mathrm{TV} in dimensions d>1, while Gaussian convolution yields finite RTV\mathrm{R}\mathrm{TV} but with an explicit exponential dependence on dd. We then separate two goals that are often conflated: classification after thresholding (recovering the decision set) versus score learning (learning a calibrated score close to 1A1_A). For classification, we construct a smooth barrier score SAS_A with finite RTV\mathrm{R}\mathrm{TV} whose fixed threshold Ο„=1Ο„=1 exactly recovers the box. Under a mild tube-mass condition near βˆ‚A\partial A, we prove an L1(P)L_1(P) calibration bound that decays polynomially in a sharpness parameter, along with an explicit RTV\mathrm{R}\mathrm{TV} upper bound in terms of face measures. Experiments on synthetic unions of rectangles illustrate the resulting accuracy--complexity tradeoff and how threshold selection shifts where training lands along it.

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
Jan 7, 2026
Topic:
Data Science
Area:
Data Science
Comments:
0
Bookmark
A Gap Between Decision Trees and Neural Networks | Researchia