Operator Splitting, Policy Iteration, and Machine Learning for Stochastic Optimal Control
Abstract
We propose a splitting approach to solve the second-order Hamilton--Jacobi equation, reducing it to a heat step and a purely first-order step. The latter is implemented using a gradient value policy iteration algorithm, enabling efficient characteristic-based machine learning methods. We establish convergence rates for the splitting method. In particular, the error is bounded below by and above by for Lipschitz initial data; this improves to for semiconcave data and to for data. We also prove an upper error estimate of order in the periodic setting, where is the splitting step. For the first-order step, we provide a weighted error analysis that shows exponential convergence. Each iteration solves linear characteristic equations and learns the value function by minimizing a weighted value gradient loss. The approach yields stable and accurate numerical results.
Source: arXiv:2603.12167v1 - http://arxiv.org/abs/2603.12167v1 PDF: https://arxiv.org/pdf/2603.12167v1 Original Link: http://arxiv.org/abs/2603.12167v1