Explorerβ€ΊData Scienceβ€ΊMachine Learning
Research PaperResearchia:202603.31022

Functional Natural Policy Gradients

Aurelien Bibaut

Abstract

We propose a cross-fitted debiasing device for policy learning from offline data. A key consequence of the resulting learning principle is $\sqrt N$ regret even for policy classes with complexity greater than Donsker, provided a product-of-errors nuisance remainder is $O(N^{-1/2})$. The regret bound factors into a plug-in policy error factor governed by policy-class complexity and an environment nuisance factor governed by the complexity of the environment dynamics, making explicit how one may b...

Submitted: March 31, 2026Subjects: Machine Learning; Data Science

Description / Details

We propose a cross-fitted debiasing device for policy learning from offline data. A key consequence of the resulting learning principle is N\sqrt N regret even for policy classes with complexity greater than Donsker, provided a product-of-errors nuisance remainder is O(Nβˆ’1/2)O(N^{-1/2}). The regret bound factors into a plug-in policy error factor governed by policy-class complexity and an environment nuisance factor governed by the complexity of the environment dynamics, making explicit how one may be traded against the other.


Source: arXiv:2603.28681v1 - http://arxiv.org/abs/2603.28681v1 PDF: https://arxiv.org/pdf/2603.28681v1 Original Link: http://arxiv.org/abs/2603.28681v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
Mar 31, 2026
Topic:
Data Science
Area:
Machine Learning
Comments:
0
Bookmark
Functional Natural Policy Gradients | Researchia