Back to Explorer
Research PaperResearchia:202602.13023[Data Science > Statistics]

On the Complexity of Offline Reinforcement Learning with $Q^\star$-Approximation and Partial Coverage

Haolin Liu

Abstract

We study offline reinforcement learning under QQ^\star-approximation and partial coverage, a setting that motivates practical algorithms such as Conservative QQ-Learning (CQL; Kumar et al., 2020) but has received limited theoretical attention. Our work is inspired by the following open question: "Are QQ^\star-realizability and Bellman completeness sufficient for sample-efficient offline RL under partial coverage?" We answer in the negative by establishing an information-theoretic lower bound. Going substantially beyond this, we introduce a general framework that characterizes the intrinsic complexity of a given QQ^\star function class, inspired by model-free decision-estimation coefficients (DEC) for online RL (Foster et al., 2023b; Liu et al., 2025b). This complexity recovers and improves the quantities underlying the guarantees of Chen and Jiang (2022) and Uehara et al. (2023), and extends to broader settings. Our decision-estimation decomposition can be combined with a wide range of QQ^\star estimation procedures, modularizing and generalizing existing approaches. Beyond the general framework, we make further contributions: By developing a novel second-order performance difference lemma, we obtain the first ε2ε^{-2} sample complexity under partial coverage for soft QQ-learning, improving the ε4ε^{-4} bound of Uehara et al. (2023). We remove Chen and Jiang's (2022) need for additional online interaction when the value gap of QQ^\star is unknown. We also give the first characterization of offline learnability for general low-Bellman-rank MDPs without Bellman completeness (Jiang et al., 2017; Du et al., 2021; Jin et al., 2021), a canonical setting in online RL that remains unexplored in offline RL except for special cases. Finally, we provide the first analysis for CQL under QQ^\star-realizability and Bellman completeness beyond the tabular case.


Source: arXiv:2602.12107v1 - http://arxiv.org/abs/2602.12107v1 PDF: https://arxiv.org/pdf/2602.12107v1 Original Link: http://arxiv.org/abs/2602.12107v1

Submission:2/13/2026
Comments:0 comments
Subjects:Statistics; Data Science
Original Source:
View Original PDF
arXiv: This paper is hosted on arXiv, an open-access repository
Was this helpful?

Discussion (0)

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

On the Complexity of Offline Reinforcement Learning with $Q^\star$-Approximation and Partial Coverage | Researchia | Researchia