ExplorerArtificial IntelligenceAI
Research PaperResearchia:202607.01050

Radial Suppression Accelerates Algorithmic Generalization: A Geometric Analysis of Delayed Generalization

Srijan Tiwari

Abstract

Why do neural networks memorize algorithmic training data long before they generalize? We present a geometric case study demonstrating that, on tasks where generalization requires discovering structured low-dimensional circuits, the memorization-generalization delay is driven by radial inflation of hidden representations under cross-entropy optimization. We formalize a radial-angular decomposition of activation-space dynamics and derive three testable propositions: (i) that penalizing radial inf...

Submitted: July 1, 2026Subjects: AI; Artificial Intelligence

Description / Details

Why do neural networks memorize algorithmic training data long before they generalize? We present a geometric case study demonstrating that, on tasks where generalization requires discovering structured low-dimensional circuits, the memorization-generalization delay is driven by radial inflation of hidden representations under cross-entropy optimization. We formalize a radial-angular decomposition of activation-space dynamics and derive three testable propositions: (i) that penalizing radial inflation induces anisotropic, data-dependent weight regularization; (ii) that it suppresses radial gradient energy below the isotropic random baseline, forcing predominantly angular updates; and (iii) that it biases convergence toward flatter minima. To empirically validate these propositions, we study a single-hyperparameter norm penalty that softly constrains activations to a sqrt(d)-radius hypersphere. On modular arithmetic, this penalty accelerates grokking up to 6x across MLPs and Transformers, and halves training steps for a 10M-parameter nanoGPT on 3-digit addition.


Source: arXiv:2606.32000v1 - http://arxiv.org/abs/2606.32000v1 PDF: https://arxiv.org/pdf/2606.32000v1 Original Link: http://arxiv.org/abs/2606.32000v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
Jul 1, 2026
Topic:
Artificial Intelligence
Area:
AI
Comments:
0
Bookmark
Radial Suppression Accelerates Algorithmic Generalization: A Geometric Analysis of Delayed Generalization | Researchia