Back to Explorer
Research PaperResearchia:202603.03044[Artificial Intelligence > AI]

Efficient Discovery of Approximate Causal Abstractions via Neural Mechanism Sparsification

Amir Asiaee

Abstract

Neural networks are hypothesized to implement interpretable causal mechanisms, yet verifying this requires finding a causal abstraction -- a simpler, high-level Structural Causal Model (SCM) faithful to the network under interventions. Discovering such abstractions is hard: it typically demands brute-force interchange interventions or retraining. We reframe the problem by viewing structured pruning as a search over approximate abstractions. Treating a trained network as a deterministic SCM, we derive an Interventional Risk objective whose second-order expansion yields closed-form criteria for replacing units with constants or folding them into neighbors. Under uniform curvature, our score reduces to activation variance, recovering variance-based pruning as a special case while clarifying when it fails. The resulting procedure efficiently extracts sparse, intervention-faithful abstractions from pretrained networks, which we validate via interchange interventions.


Source: arXiv:2602.24266v1 - http://arxiv.org/abs/2602.24266v1 PDF: https://arxiv.org/pdf/2602.24266v1 Original Link: http://arxiv.org/abs/2602.24266v1

Submission:3/3/2026
Comments:0 comments
Subjects:AI; Artificial Intelligence
Original Source:
View Original PDF
arXiv: This paper is hosted on arXiv, an open-access repository
Was this helpful?

Discussion (0)

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Efficient Discovery of Approximate Causal Abstractions via Neural Mechanism Sparsification | Researchia | Researchia