Explorerβ€ΊData Scienceβ€ΊMachine Learning
Research PaperResearchia:202605.11067

Reinforcement Learning for Exponential Utility: Algorithms and Convergence in Discounted MDPs

Gugan Thoppe

Abstract

Reinforcement learning (RL) for exponential-utility optimization in discounted Markov decision processes (MDPs) lacks principled value-based algorithms. We address this gap in the fixed risk-aversion setting. Building on the Bellman-type equation for exponential utility studied in \cite{porteus1975optimality}, we derive two Q-value-style extensions and show that the associated operators are contractions in the $L_\infty$ and sup-log/Thompson metrics, respectively. We characterize their fixed poi...

Submitted: May 11, 2026Subjects: Machine Learning; Data Science

Description / Details

Reinforcement learning (RL) for exponential-utility optimization in discounted Markov decision processes (MDPs) lacks principled value-based algorithms. We address this gap in the fixed risk-aversion setting. Building on the Bellman-type equation for exponential utility studied in \cite{porteus1975optimality}, we derive two Q-value-style extensions and show that the associated operators are contractions in the L∞L_\infty and sup-log/Thompson metrics, respectively. We characterize their fixed points and prove that the induced greedy stationary policy is optimal for the exponential-utility objective among stationary policies. These structural results lead to two model-free algorithms: a two-timescale Q-learning--style algorithm, for which we establish almost-sure convergence and provide finite-time convergence rates via timescale separation, and a one-timescale algorithm governed by a sublinear power-law operator. Since the latter does not admit a global contraction in standard metrics, we prove its convergence using delicate arguments based on local Lipschitzness, monotonicity, homogeneity, and Dini derivatives, and provide a scalar finite-time analysis that highlights the challenges in obtaining convergence rates in the vector case. Our work provides a foundation for value-based RL under exponential-utility objectives.


Source: arXiv:2605.08053v1 - http://arxiv.org/abs/2605.08053v1 PDF: https://arxiv.org/pdf/2605.08053v1 Original Link: http://arxiv.org/abs/2605.08053v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
May 11, 2026
Topic:
Data Science
Area:
Machine Learning
Comments:
0
Bookmark
Reinforcement Learning for Exponential Utility: Algorithms and Convergence in Discounted MDPs | Researchia