Explorerβ€ΊData Scienceβ€ΊStatistics
Research PaperResearchia:202602.13023

On the Complexity of Offline Reinforcement Learning with $Q^\star$-Approximation and Partial Coverage

Haolin Liu

Abstract

We study offline reinforcement learning under $Q^\star$-approximation and partial coverage, a setting that motivates practical algorithms such as Conservative $Q$-Learning (CQL; Kumar et al., 2020) but has received limited theoretical attention. Our work is inspired by the following open question: "Are $Q^\star$-realizability and Bellman completeness sufficient for sample-efficient offline RL under partial coverage?" We answer in the negative by establishing an information-theoretic lower boun...

Submitted: February 13, 2026Subjects: Statistics; Data Science

Description / Details

We study offline reinforcement learning under Q⋆Q^\star-approximation and partial coverage, a setting that motivates practical algorithms such as Conservative QQ-Learning (CQL; Kumar et al., 2020) but has received limited theoretical attention. Our work is inspired by the following open question: "Are Q⋆Q^\star-realizability and Bellman completeness sufficient for sample-efficient offline RL under partial coverage?" We answer in the negative by establishing an information-theoretic lower bound. Going substantially beyond this, we introduce a general framework that characterizes the intrinsic complexity of a given Q⋆Q^\star function class, inspired by model-free decision-estimation coefficients (DEC) for online RL (Foster et al., 2023b; Liu et al., 2025b). This complexity recovers and improves the quantities underlying the guarantees of Chen and Jiang (2022) and Uehara et al. (2023), and extends to broader settings. Our decision-estimation decomposition can be combined with a wide range of Q⋆Q^\star estimation procedures, modularizing and generalizing existing approaches. Beyond the general framework, we make further contributions: By developing a novel second-order performance difference lemma, we obtain the first Ξ΅βˆ’2Ξ΅^{-2} sample complexity under partial coverage for soft QQ-learning, improving the Ξ΅βˆ’4Ξ΅^{-4} bound of Uehara et al. (2023). We remove Chen and Jiang's (2022) need for additional online interaction when the value gap of Q⋆Q^\star is unknown. We also give the first characterization of offline learnability for general low-Bellman-rank MDPs without Bellman completeness (Jiang et al., 2017; Du et al., 2021; Jin et al., 2021), a canonical setting in online RL that remains unexplored in offline RL except for special cases. Finally, we provide the first analysis for CQL under Q⋆Q^\star-realizability and Bellman completeness beyond the tabular case.


Source: arXiv:2602.12107v1 - http://arxiv.org/abs/2602.12107v1 PDF: https://arxiv.org/pdf/2602.12107v1 Original Link: http://arxiv.org/abs/2602.12107v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
Feb 13, 2026
Topic:
Data Science
Area:
Statistics
Comments:
0
Bookmark