ExplorerData ScienceMachine Learning
Research PaperResearchia:202602.11052

Optimistic World Models: Efficient Exploration in Model-Based Deep Reinforcement Learning

Akshay Mete

Abstract

Efficient exploration remains a central challenge in reinforcement learning (RL), particularly in sparse-reward environments. We introduce Optimistic World Models (OWMs), a principled and scalable framework for optimistic exploration that brings classical reward-biased maximum likelihood estimation (RBMLE) from adaptive control into deep RL. In contrast to upper confidence bound (UCB)-style exploration methods, OWMs incorporate optimism directly into model learning by augmentation with an optimi...

Submitted: February 11, 2026Subjects: Machine Learning; Data Science

Description / Details

Efficient exploration remains a central challenge in reinforcement learning (RL), particularly in sparse-reward environments. We introduce Optimistic World Models (OWMs), a principled and scalable framework for optimistic exploration that brings classical reward-biased maximum likelihood estimation (RBMLE) from adaptive control into deep RL. In contrast to upper confidence bound (UCB)-style exploration methods, OWMs incorporate optimism directly into model learning by augmentation with an optimistic dynamics loss that biases imagined transitions toward higher-reward outcomes. This fully gradient-based loss requires neither uncertainty estimates nor constrained optimization. Our approach is plug-and-play with existing world model frameworks, preserving scalability while requiring only minimal modifications to standard training procedures. We instantiate OWMs within two state-of-the-art world model architectures, leading to Optimistic DreamerV3 and Optimistic STORM, which demonstrate significant improvements in sample efficiency and cumulative return compared to their baseline counterparts.


Source: arXiv:2602.10044v1 - http://arxiv.org/abs/2602.10044v1 PDF: https://arxiv.org/pdf/2602.10044v1 Original Link: http://arxiv.org/abs/2602.10044v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
Feb 11, 2026
Topic:
Data Science
Area:
Machine Learning
Comments:
0
Bookmark
Optimistic World Models: Efficient Exploration in Model-Based Deep Reinforcement Learning | Researchia