ExplorerArtificial IntelligenceArtificial Intelligence
Research PaperResearchia:202601.29028

A Separable Architecture for Continuous Token Representation in Language Models

Reza T. Batley

Abstract

Transformer scaling law analyses typically treat parameters as interchangeable; an abstraction that accurately predicts loss-compute relationships. Yet, in sub-billion-parameter small language models (SLMs), embedding matrices dominate the parameter budget. This work argues that this allocation is as suboptimal as it is counterintuitive. Leviathan is an architecture with a continuous embedding generator to replace the discrete lookup tables of canonical models. Evaluating on the Pile dataset und...

Submitted: January 29, 2026Subjects: Artificial Intelligence; Artificial Intelligence

Description / Details

Transformer scaling law analyses typically treat parameters as interchangeable; an abstraction that accurately predicts loss-compute relationships. Yet, in sub-billion-parameter small language models (SLMs), embedding matrices dominate the parameter budget. This work argues that this allocation is as suboptimal as it is counterintuitive. Leviathan is an architecture with a continuous embedding generator to replace the discrete lookup tables of canonical models. Evaluating on the Pile dataset under isoparametric settings, Leviathan consistently outperforms a standard, LLaMA-style architecture. By means of an empirical power-law fit, Leviathan exhibits a markedly superior effective parameter capacity. Across the regime studied, Leviathan behaves as a dense model with 1.471.47 to 2.11×2.11 \times more parameters.


Source: arXiv:2601.22040v1 - http://arxiv.org/abs/2601.22040v1 PDF: https://arxiv.org/pdf/2601.22040v1 Original Link: http://arxiv.org/abs/2601.22040v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
Jan 29, 2026
Topic:
Artificial Intelligence
Area:
Artificial Intelligence
Comments:
0
Bookmark
A Separable Architecture for Continuous Token Representation in Language Models | Researchia