ExplorerMathematicsMathematics
Research PaperResearchia:202606.19027

On Second-Order Methods for Bilevel Optimization

Jiawen Bi

Abstract

Bilevel optimization is an indispensable modeling tool for modern machine learning and engineering design. However, the theory and practice for finding second order stationary points in the context of bilevel optimization still remain largely unsettled. Even for bilevel optimization with strongly convex lower-level problem, the hyperfunction it induces is in general nonconvex. Although the Cubic Regularized Newton methods (CRN) famously achieve the optimal $\mathcal{O}(\varepsilon^{-1.5})$ SOSP ...

Submitted: June 19, 2026Subjects: Mathematics; Mathematics

Description / Details

Bilevel optimization is an indispensable modeling tool for modern machine learning and engineering design. However, the theory and practice for finding second order stationary points in the context of bilevel optimization still remain largely unsettled. Even for bilevel optimization with strongly convex lower-level problem, the hyperfunction it induces is in general nonconvex. Although the Cubic Regularized Newton methods (CRN) famously achieve the optimal O(ε1.5)\mathcal{O}(\varepsilon^{-1.5}) SOSP (second-order stationary point) rate in single-level optimization, it is unclear how to control the accuracy of the hypergradient and hyper-Hessian computations in the context of applying the second-order methods to bilevel problems in order for the overall process to be efficient. In this paper, we set out to answer this question. In particular, we first formulate a double loop CRN baseline that achieves the optimal outer rate but requires repeated lower level solves. Next, we propose a single loop cubic regularized Newton algorithm that combines one lower-level gradient step with one Newton step for the hypergradient, and prove an overall deterministic O(ε1.5)\mathcal{O}(\varepsilon^{-1.5}) total oracle complexity, which is optimal. In addition, we illustrate that some intuitively simple modifications of our method may fail to hold up the convergence result. To the best of our knowledge, this is the first deterministic single loop method for unconstrained NCSC (non-convex upper-level and strongly convex lower-level) bilevel optimization setting that achieves the O(ε1.5)\mathcal{O}(\varepsilon^{-1.5}) optimal convergence rate for finding an ε\varepsilon-SOSP of the hyperfunction.


Source: arXiv:2606.20534v1 - http://arxiv.org/abs/2606.20534v1 PDF: https://arxiv.org/pdf/2606.20534v1 Original Link: http://arxiv.org/abs/2606.20534v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
Jun 19, 2026
Topic:
Mathematics
Area:
Mathematics
Comments:
0
Bookmark
On Second-Order Methods for Bilevel Optimization | Researchia