ExplorerArtificial IntelligenceAI
Research PaperResearchia:202602.10038

ARO: A New Lens On Matrix Optimization For Large Models

Wenbo Gong

Abstract

Matrix-based optimizers have attracted growing interest for improving LLM training efficiency, with significant progress centered on orthogonalization/whitening based methods. While yielding substantial performance gains, a fundamental question arises: can we develop new paradigms beyond orthogonalization, pushing the efficiency frontier further? We present \textbf{Adaptively Rotated Optimization (ARO}, a new matrix optimization framework that treats gradient rotation as a first class design pri...

Submitted: February 10, 2026Subjects: AI; Artificial Intelligence

Description / Details

Matrix-based optimizers have attracted growing interest for improving LLM training efficiency, with significant progress centered on orthogonalization/whitening based methods. While yielding substantial performance gains, a fundamental question arises: can we develop new paradigms beyond orthogonalization, pushing the efficiency frontier further? We present \textbf{Adaptively Rotated Optimization (ARO}, a new matrix optimization framework that treats gradient rotation as a first class design principle. ARO accelerates LLM training by performing normed steepest descent in a rotated coordinate system, where the rotation is determined by a novel norm-informed policy. This perspective yields update rules that go beyond existing orthogonalization and whitening optimizers, improving sample efficiency in practice. To make comparisons reliable, we propose a rigorously controlled benchmarking protocol that reduces confounding and bias. Under this protocol, ARO consistently outperforms AdamW (by 1.3 \sim1.35×\times) and orthogonalization methods (by 1.1\sim1.15×\times) in LLM pretraining at up to 8B activated parameters, and up to 8×8\times overtrain budget, without evidence of diminishing returns. Finally, we discuss how ARO can be reformulated as a symmetry-aware optimizer grounded in rotational symmetries of residual streams, motivating advanced designs that enable computationally efficient exploitation of cross-layer/cross module couplings.


Source: arXiv:2602.09006v1 - http://arxiv.org/abs/2602.09006v1 PDF: https://arxiv.org/pdf/2602.09006v1 Original Link: http://arxiv.org/abs/2602.09006v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
Feb 10, 2026
Topic:
Artificial Intelligence
Area:
AI
Comments:
0
Bookmark
ARO: A New Lens On Matrix Optimization For Large Models | Researchia