ExplorerData ScienceMachine Learning
Research PaperResearchia:202602.25002

Test-Time Training with KV Binding Is Secretly Linear Attention

Junchen Liu

Abstract

Test-time training (TTT) with KV binding as sequence modeling layer is commonly interpreted as a form of online meta-learning that memorizes a key-value mapping at test time. However, our analysis reveals multiple phenomena that contradict this memorization-based interpretation. Motivated by these findings, we revisit the formulation of TTT and show that a broad class of TTT architectures can be expressed as a form of learned linear attention operator. Beyond explaining previously puzzling model...

Submitted: February 25, 2026Subjects: Machine Learning; Data Science

Description / Details

Test-time training (TTT) with KV binding as sequence modeling layer is commonly interpreted as a form of online meta-learning that memorizes a key-value mapping at test time. However, our analysis reveals multiple phenomena that contradict this memorization-based interpretation. Motivated by these findings, we revisit the formulation of TTT and show that a broad class of TTT architectures can be expressed as a form of learned linear attention operator. Beyond explaining previously puzzling model behaviors, this perspective yields multiple practical benefits: it enables principled architectural simplifications, admits fully parallel formulations that preserve performance while improving efficiency, and provides a systematic reduction of diverse TTT variants to a standard linear attention form. Overall, our results reframe TTT not as test-time memorization, but as learned linear attention with enhanced representational capacity.


Source: arXiv:2602.21204v1 - http://arxiv.org/abs/2602.21204v1 PDF: https://arxiv.org/pdf/2602.21204v1 Original Link: http://arxiv.org/abs/2602.21204v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
Feb 25, 2026
Topic:
Data Science
Area:
Machine Learning
Comments:
0
Bookmark
Test-Time Training with KV Binding Is Secretly Linear Attention | Researchia