ExplorerData ScienceStatistics
Research PaperResearchia:202605.14034

A Hierarchical Language Model with Predictable Scaling Laws and Provable Benefits of Reasoning

Jason Gaitonde

Abstract

We introduce a family of synthetic languages with hierarchical structure -- generated by a broadcast process on trees -- for which the role of context length and reasoning in autoregressive generation can be analyzed precisely. At the heart of our analytic approach is an \emph{exact $k$-gram ansatz} in place of transformers with context length $k$, a substitution we then validate empirically. Using this ansatz we derive explicit asymptotic predictions for distributional statistics of the sequenc...

Submitted: May 14, 2026Subjects: Statistics; Data Science

Description / Details

We introduce a family of synthetic languages with hierarchical structure -- generated by a broadcast process on trees -- for which the role of context length and reasoning in autoregressive generation can be analyzed precisely. At the heart of our analytic approach is an \emph{exact kk-gram ansatz} in place of transformers with context length kk, a substitution we then validate empirically. Using this ansatz we derive explicit asymptotic predictions for distributional statistics of the sequences produced by a trained model, instantiated in two settings. For the \emph{Ising broadcast process} (a soft-constrained language), we prove that the variance of the generated sum scales log-linearly in the context depth and its kurtosis converges to that of a Gaussian -- both deviating from the true language for any sublinear context. For the \emph{coloring broadcast process} (a hard-constrained language) in the freezing regime, bounded-context autoregression produces sequences that, with high probability, are inconsistent with \emph{any} valid coloring of the underlying tree. Together these results imply an Ω(n)Ω(n) lower bound on the context length required to faithfully sample length-nn sequences. In contrast, we prove that an autoregressive \emph{reasoning} model with only Θ(logn)Θ(\log n) working memory can sample exactly from the true language -- an exponential improvement. We confirm both the lower-bound predictions and the reasoning-based upper bound empirically with transformers trained on the synthetic language; the trained models track our asymptotic predictions quantitatively across a wide range of context sizes.


Source: arXiv:2605.13687v1 - http://arxiv.org/abs/2605.13687v1 PDF: https://arxiv.org/pdf/2605.13687v1 Original Link: http://arxiv.org/abs/2605.13687v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
May 14, 2026
Topic:
Data Science
Area:
Statistics
Comments:
0
Bookmark
A Hierarchical Language Model with Predictable Scaling Laws and Provable Benefits of Reasoning | Researchia