Back to Explorer
Research PaperResearchia:202603.13065[Artificial Intelligence > AI]

Neural Thickets: Diverse Task Experts Are Dense Around Pretrained Weights

Yulu Gan

Abstract

Pretraining produces a learned parameter vector that is typically treated as a starting point for further iterative adaptation. In this work, we instead view the outcome of pretraining as a distribution over parameter vectors, whose support already contains task-specific experts. We show that in small models such expert solutions occupy a negligible fraction of the volume of this distribution, making their discovery reliant on structured optimization methods such as gradient descent. In contrast, in large, well-pretrained models the density of task-experts increases dramatically, so that diverse, task-improving specialists populate a substantial fraction of the neighborhood around the pretrained weights. Motivated by this perspective, we explore a simple, fully parallel post-training method that samples NN parameter perturbations at random, selects the top KK, and ensembles predictions via majority vote. Despite its simplicity, this approach is competitive with standard post-training methods such as PPO, GRPO, and ES for contemporary large-scale models.


Source: arXiv:2603.12228v1 - http://arxiv.org/abs/2603.12228v1 PDF: https://arxiv.org/pdf/2603.12228v1 Original Link: http://arxiv.org/abs/2603.12228v1

Submission:3/13/2026
Comments:0 comments
Subjects:AI; Artificial Intelligence
Original Source:
View Original PDF
arXiv: This paper is hosted on arXiv, an open-access repository
Was this helpful?

Discussion (0)

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!