ExplorerData ScienceMachine Learning
Research PaperResearchia:202604.22035

Planning in entropy-regularized Markov decision processes and games

Jean-Bastien Grill

Abstract

We propose SmoothCruiser, a new planning algorithm for estimating the value function in entropy-regularized Markov decision processes and two-player games, given a generative model of the environment. SmoothCruiser makes use of the smoothness of the Bellman operator promoted by the regularization to achieve problem-independent sample complexity of order O~(1/epsilon^4) for a desired accuracy epsilon, whereas for non-regularized settings there are no known algorithms with guaranteed polynomial sa...

Submitted: April 22, 2026Subjects: Machine Learning; Data Science

Description / Details

We propose SmoothCruiser, a new planning algorithm for estimating the value function in entropy-regularized Markov decision processes and two-player games, given a generative model of the environment. SmoothCruiser makes use of the smoothness of the Bellman operator promoted by the regularization to achieve problem-independent sample complexity of order O~(1/epsilon^4) for a desired accuracy epsilon, whereas for non-regularized settings there are no known algorithms with guaranteed polynomial sample complexity in the worst case.


Source: arXiv:2604.19695v1 - http://arxiv.org/abs/2604.19695v1 PDF: https://arxiv.org/pdf/2604.19695v1 Original Link: http://arxiv.org/abs/2604.19695v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
Apr 22, 2026
Topic:
Data Science
Area:
Machine Learning
Comments:
0
Bookmark
Planning in entropy-regularized Markov decision processes and games | Researchia