ExplorerMachine LearningMachine Learning
Research PaperResearchia:202601.29041

GeoNorm: Unify Pre-Norm and Post-Norm with Geodesic Optimization

Chuanyang Zheng

Abstract

The placement of normalization layers, specifically Pre-Norm and Post-Norm, remains an open question in Transformer architecture design. In this work, we rethink these approaches through the lens of manifold optimization, interpreting the outputs of the Feed-Forward Network (FFN) and attention layers as update directions in optimization. Building on this perspective, we introduce GeoNorm, a novel method that replaces standard normalization with geodesic updates on the manifold. Furthermore, anal...

Submitted: January 29, 2026Subjects: Machine Learning; Machine Learning

Description / Details

The placement of normalization layers, specifically Pre-Norm and Post-Norm, remains an open question in Transformer architecture design. In this work, we rethink these approaches through the lens of manifold optimization, interpreting the outputs of the Feed-Forward Network (FFN) and attention layers as update directions in optimization. Building on this perspective, we introduce GeoNorm, a novel method that replaces standard normalization with geodesic updates on the manifold. Furthermore, analogous to learning rate schedules, we propose a layer-wise update decay for the FFN and attention components. Comprehensive experiments demonstrate that GeoNorm consistently outperforms existing normalization methods in Transformer models. Crucially, GeoNorm can be seamlessly integrated into standard Transformer architectures, achieving performance improvements with negligible additional computational cost.


Source: arXiv:2601.22095v1 - http://arxiv.org/abs/2601.22095v1 PDF: https://arxiv.org/pdf/2601.22095v1 Original Link: http://arxiv.org/abs/2601.22095v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
Jan 29, 2026
Topic:
Machine Learning
Area:
Machine Learning
Comments:
0
Bookmark
GeoNorm: Unify Pre-Norm and Post-Norm with Geodesic Optimization | Researchia