Back to Explorer
Research PaperResearchia:202601.29028[Artificial Intelligence > Artificial Intelligence]

A Separable Architecture for Continuous Token Representation in Language Models

Reza T. Batley

Abstract

Transformer scaling law analyses typically treat parameters as interchangeable; an abstraction that accurately predicts loss-compute relationships. Yet, in sub-billion-parameter small language models (SLMs), embedding matrices dominate the parameter budget. This work argues that this allocation is as suboptimal as it is counterintuitive. Leviathan is an architecture with a continuous embedding generator to replace the discrete lookup tables of canonical models. Evaluating on the Pile dataset under isoparametric settings, Leviathan consistently outperforms a standard, LLaMA-style architecture. By means of an empirical power-law fit, Leviathan exhibits a markedly superior effective parameter capacity. Across the regime studied, Leviathan behaves as a dense model with 1.471.47 to 2.11ร—2.11 \times more parameters.


Source: arXiv:2601.22040v1 - http://arxiv.org/abs/2601.22040v1 PDF: https://arxiv.org/pdf/2601.22040v1 Original Link: http://arxiv.org/abs/2601.22040v1

Submission:1/29/2026
Comments:0 comments
Subjects:Artificial Intelligence; Artificial Intelligence
Original Source:
View Original PDF
arXiv: This paper is hosted on arXiv, an open-access repository
Was this helpful?

Discussion (0)

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!