ExplorerData ScienceMachine Learning
Research PaperResearchia:202606.01027

On the Relationship Between Activation Outliers and Feature Death in Sparse Autoencoders

Elana Simon

Abstract

Sparse autoencoders (SAEs) decompose neural network activations into interpretable features, but many learned features never activate, a problem called feature death that wastes dictionary capacity and can reintroduce superposition. Death rates vary dramatically between models: near-zero on GPT-2, over 70% on AlphaFold3 with identical configurations. We find that dimension-level activation outliers (dimensions whose mean magnitude is large relative to per-token variation) cause this by shifting ...

Submitted: June 1, 2026Subjects: Machine Learning; Data Science

Description / Details

Sparse autoencoders (SAEs) decompose neural network activations into interpretable features, but many learned features never activate, a problem called feature death that wastes dictionary capacity and can reintroduce superposition. Death rates vary dramatically between models: near-zero on GPT-2, over 70% on AlphaFold3 with identical configurations. We find that dimension-level activation outliers (dimensions whose mean magnitude is large relative to per-token variation) cause this by shifting pre-activations at initialization based on each feature's alignment with the activation mean. Features anti-aligned with the mean receive permanently negative pre-activations and never fire. We formalize outlier severity as γ=μ/σγ= \|μ\|/\|σ\|; it predicts initial death rates (Spearman ρ=0.89ρ= 0.89 for dead-by-TopK, 0.820.82 for dead-by-ReLU) across 454 model-layer combinations spanning language, vision, protein, and genomic models. Dead features can revive during training, but recovery requires the SAE bias to learn the activation mean, a process that is prohibitively slow at high γγ. Mean-centering (subtracting the activation mean) sidesteps this and eliminates outlier-induced death across all tested models, confirming the mechanism and providing a principled basis for when and why this preprocessing step is necessary.


Source: arXiv:2605.31518v1 - http://arxiv.org/abs/2605.31518v1 PDF: https://arxiv.org/pdf/2605.31518v1 Original Link: http://arxiv.org/abs/2605.31518v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
Jun 1, 2026
Topic:
Data Science
Area:
Machine Learning
Comments:
0
Bookmark
On the Relationship Between Activation Outliers and Feature Death in Sparse Autoencoders | Researchia