Planning in entropy-regularized Markov decision processes and games
Abstract
We propose SmoothCruiser, a new planning algorithm for estimating the value function in entropy-regularized Markov decision processes and two-player games, given a generative model of the environment. SmoothCruiser makes use of the smoothness of the Bellman operator promoted by the regularization to achieve problem-independent sample complexity of order O~(1/epsilon^4) for a desired accuracy epsilon, whereas for non-regularized settings there are no known algorithms with guaranteed polynomial sa...
Description / Details
We propose SmoothCruiser, a new planning algorithm for estimating the value function in entropy-regularized Markov decision processes and two-player games, given a generative model of the environment. SmoothCruiser makes use of the smoothness of the Bellman operator promoted by the regularization to achieve problem-independent sample complexity of order O~(1/epsilon^4) for a desired accuracy epsilon, whereas for non-regularized settings there are no known algorithms with guaranteed polynomial sample complexity in the worst case.
Source: arXiv:2604.19695v1 - http://arxiv.org/abs/2604.19695v1 PDF: https://arxiv.org/pdf/2604.19695v1 Original Link: http://arxiv.org/abs/2604.19695v1
Please sign in to join the discussion.
No comments yet. Be the first to share your thoughts!
Apr 22, 2026
Data Science
Machine Learning
0