Research PaperResearchia:202603.31022[Data Science > Machine Learning]
Functional Natural Policy Gradients
Aurelien Bibaut
Abstract
We propose a cross-fitted debiasing device for policy learning from offline data. A key consequence of the resulting learning principle is regret even for policy classes with complexity greater than Donsker, provided a product-of-errors nuisance remainder is . The regret bound factors into a plug-in policy error factor governed by policy-class complexity and an environment nuisance factor governed by the complexity of the environment dynamics, making explicit how one may be traded against the other.
Source: arXiv:2603.28681v1 - http://arxiv.org/abs/2603.28681v1 PDF: https://arxiv.org/pdf/2603.28681v1 Original Link: http://arxiv.org/abs/2603.28681v1
Submission:3/31/2026
Comments:0 comments
Subjects:Machine Learning; Data Science
Cite as:
Researchia:202603.31022https://www.researchia.net/explorer/2d6fe527-0080-4697-8f2e-0d05b6d4ad5d
Original Source:
View Original PDF
arXiv: This paper is hosted on arXiv, an open-access repository
Was this helpful?