Back to Explorer
Research PaperResearchia:202601.29156[Numerical Analysis > Mathematics]

LAMP: Look-Ahead Mixed-Precision Inference of Large Language Models

Stanislav Budzinskiy

Abstract

Mixed-precision computations are a hallmark of the current stage of AI, driving the progress in large language models towards efficient, locally deployable solutions. This article addresses the floating-point computation of compositionally-rich functions, concentrating on transformer inference. Based on the rounding error analysis of a composition f(g(x))f(g(\mathrm{x})), we provide an adaptive strategy that selects a small subset of components of g(x)g(\mathrm{x}) to be computed more accurately while all other computations can be carried out with lower accuracy. We then explain how this strategy can be applied to different compositions within a transformer and illustrate its overall effect on transformer inference. We study the effectiveness of this algorithm numerically on GPT-2 models and demonstrate that already very low recomputation rates allow for improvements of up to two orders of magnitude in accuracy.


Source: arXiv:2601.21623v1 - http://arxiv.org/abs/2601.21623v1 PDF: https://arxiv.org/pdf/2601.21623v1 Original Link: http://arxiv.org/abs/2601.21623v1

Submission:1/29/2026
Comments:0 comments
Subjects:Mathematics; Numerical Analysis
Original Source:
View Original PDF
arXiv: This paper is hosted on arXiv, an open-access repository
Was this helpful?

Discussion (0)

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

LAMP: Look-Ahead Mixed-Precision Inference of Large Language Models | Researchia