ExplorerArtificial IntelligenceAI
Research PaperResearchia:202605.26005

Language Models Need Sleep

Sangyun Lee

Abstract

Transformer-based large language models are increasingly used for long-horizon tasks; however, their attention mechanism scales poorly with context length. To handle this, we study a sleep-like consolidation mechanism in which a model periodically converts recent context into persistent fast weights before clearing its key-value cache. During sleep, the model performs $N$ offline recurrent passes over the accumulated context and updates the fast weights in its state-space model (SSM) blocks thro...

Submitted: May 26, 2026Subjects: AI; Artificial Intelligence

Description / Details

Transformer-based large language models are increasingly used for long-horizon tasks; however, their attention mechanism scales poorly with context length. To handle this, we study a sleep-like consolidation mechanism in which a model periodically converts recent context into persistent fast weights before clearing its key-value cache. During sleep, the model performs NN offline recurrent passes over the accumulated context and updates the fast weights in its state-space model (SSM) blocks through a learned local rule. During inference, this shifts extra computation to sleep while preserving the latency of wake-time prediction. We test our method on controlled synthetic tasks, including cellular automata and multi-hop graph retrieval, as well as a realistic math reasoning task, on which a regular transformer as well as SSM-attention hybrid models fail. We then show that increasing sleep duration NN for our models improves performance, with the largest gains on examples that require deeper reasoning.


Source: arXiv:2605.26099v1 - http://arxiv.org/abs/2605.26099v1 PDF: https://arxiv.org/pdf/2605.26099v1 Original Link: http://arxiv.org/abs/2605.26099v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
May 26, 2026
Topic:
Artificial Intelligence
Area:
AI
Comments:
0
Bookmark