ExplorerData ScienceMachine Learning
Research PaperResearchia:202605.07018

Understanding In-Context Learning for Nonlinear Regression with Transformers: Attention as Featurizer

Alexander Hsu

Abstract

Pre-trained transformers are able to learn from examples provided as part of the prompt without any weight updates, a remarkable ability known as in-context learning (ICL). Despite its demonstrated efficacy across various domains, the theoretical understanding of ICL is still developing. Whereas most existing theory has focused on linear models, we study ICL in the nonlinear regression setting. Through the interaction mechanism in attention, we explicitly construct transformer networks to realiz...

Submitted: May 7, 2026Subjects: Machine Learning; Data Science

Description / Details

Pre-trained transformers are able to learn from examples provided as part of the prompt without any weight updates, a remarkable ability known as in-context learning (ICL). Despite its demonstrated efficacy across various domains, the theoretical understanding of ICL is still developing. Whereas most existing theory has focused on linear models, we study ICL in the nonlinear regression setting. Through the interaction mechanism in attention, we explicitly construct transformer networks to realize nonlinear features, such as polynomial or spline bases, which span a wide class of functions. Based on this construction, we establish a framework to analyze end-to-end in-context nonlinear regression with the constructed features. Our theory provides finite-sample generalization error bounds in terms of context length and training set size. We numerically validate the theory on synthetic regression tasks.


Source: arXiv:2605.05176v1 - http://arxiv.org/abs/2605.05176v1 PDF: https://arxiv.org/pdf/2605.05176v1 Original Link: http://arxiv.org/abs/2605.05176v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
May 7, 2026
Topic:
Data Science
Area:
Machine Learning
Comments:
0
Bookmark
Understanding In-Context Learning for Nonlinear Regression with Transformers: Attention as Featurizer | Researchia