Back to Explorer
Research PaperResearchia:202601.26001[Technology > Research]

A Scalable Measure of Loss Landscape Curvature for Analyzing the Training Dynamics of LLMs

Dayal Singh Kalra

Abstract

Understanding the curvature evolution of the loss landscape is fundamental to analyzing the training dynamics of neural networks. The most commonly studied measure, Hessian sharpness (λmaxHλ_{\max}^H) -- the largest eigenvalue of the loss Hessian -- determines local training stability and interacts with the learning rate throughout training. Despite its significance in analyzing training dynamics, direct measurement of Hessian sharpness remains prohibitive for Large Language Models (LLMs) due to high computational cost. We analyze critical sharpness\textit{critical sharpness} (λcλ_c), a computationally efficient measure requiring fewer than 1010 forward passes given the update direction ΔθΔ\mathbfθ. Critically, this measure captures well-documented Hessian sharpness phenomena, including progressive sharpening and Edge of Stability. Using this measure, we provide the first demonstration of these sharpness phenomena at scale, up to 77B parameters, spanning both pre-training and mid-training of OLMo-2 models. We further introduce relative critical sharpness\textit{relative critical sharpness} (λc12λ_c^{1\to 2}), which quantifies the curvature of one loss landscape while optimizing another, to analyze the transition from pre-training to fine-tuning and guide data mixing strategies. Critical sharpness provides practitioners with a practical tool for diagnosing curvature dynamics and informing data composition choices at scale. More broadly, our work shows that scalable curvature measures can provide actionable insights for large-scale training.

Submission:1/26/2026
Comments:0 comments
Subjects:Research; Technology
Was this helpful?

Discussion (0)

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!