ExplorerArtificial IntelligenceAI
Research PaperResearchia:202606.26061

Beyond the Hard Budget: Sparsity Regularizers for More Interpretable Top-k Sparse Autoencoders

Nathanaël Jacquier

Abstract

Sparse autoencoders (SAEs) have become a leading tool for interpreting the representations of vision foundation models, decomposing their polysemantic activations into a larger set of sparse, more monosemantic features. The Top-$k$ SAE, a now-standard variant, enforces sparsity architecturally through its activation function, retaining only the $k$ most active latents per input. Because it was designed precisely to avoid the $\ell_1$ penalty used by earlier SAEs and its known drawbacks, it has n...

Submitted: June 26, 2026Subjects: AI; Artificial Intelligence

Description / Details

Sparse autoencoders (SAEs) have become a leading tool for interpreting the representations of vision foundation models, decomposing their polysemantic activations into a larger set of sparse, more monosemantic features. The Top-kk SAE, a now-standard variant, enforces sparsity architecturally through its activation function, retaining only the kk most active latents per input. Because it was designed precisely to avoid the 1\ell_1 penalty used by earlier SAEs and its known drawbacks, it has not been combined with an explicit sparsity regularizer, despite retaining limitations of its own, such as a budget kk that is fixed regardless of input complexity and a tendency to overfit to the training value of kk. We introduce two sparsity regularizers compatible with the Top-kk architecture, both acting on the activations before the Top-kk selection: an 1\ell_1 penalty on the unselected (off-support) units, and a scale-invariant 1/2\ell_1/\ell_2-ratio penalty that concentrates the code onto fewer effective units. Both penalties are applied only to the batch-active units, those selected by the Top-kk operator at least once within the batch. Across two datasets, three vision foundation models, and a range of kk, both regularizers consistently improve monosemanticity at no cost to reconstruction quality. The 1/2\ell_1/\ell_2 penalty further concentrates information into fewer latents, making reconstruction more robust to the inference-time choice of kk and improving small-budget linear probing. Our central finding is that hard architectural sparsity and soft sparsity regularization are complementary rather than mutually exclusive.


Source: arXiv:2606.27321v1 - http://arxiv.org/abs/2606.27321v1 PDF: https://arxiv.org/pdf/2606.27321v1 Original Link: http://arxiv.org/abs/2606.27321v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
Jun 26, 2026
Topic:
Artificial Intelligence
Area:
AI
Comments:
0
Bookmark
Beyond the Hard Budget: Sparsity Regularizers for More Interpretable Top-k Sparse Autoencoders | Researchia