Back to Explorer
Research PaperResearchia:202603.31022[Data Science > Machine Learning]

Functional Natural Policy Gradients

Aurelien Bibaut

Abstract

We propose a cross-fitted debiasing device for policy learning from offline data. A key consequence of the resulting learning principle is N\sqrt N regret even for policy classes with complexity greater than Donsker, provided a product-of-errors nuisance remainder is O(Nβˆ’1/2)O(N^{-1/2}). The regret bound factors into a plug-in policy error factor governed by policy-class complexity and an environment nuisance factor governed by the complexity of the environment dynamics, making explicit how one may be traded against the other.


Source: arXiv:2603.28681v1 - http://arxiv.org/abs/2603.28681v1 PDF: https://arxiv.org/pdf/2603.28681v1 Original Link: http://arxiv.org/abs/2603.28681v1

Submission:3/31/2026
Comments:0 comments
Subjects:Machine Learning; Data Science
Original Source:
View Original PDF
arXiv: This paper is hosted on arXiv, an open-access repository
Was this helpful?

Discussion (0)

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Functional Natural Policy Gradients | Researchia