How can I publish a research paper for free?

On Researchia, you can publish research papers, preprints, and science projects instantly and for free — no paywall and no submission fee. Create a free account, go to Explorer, and click "Publish Instantly" to share your work with a global audience.

Where can I find trending research papers?

Researchia Explorer aggregates the latest and most-discussed research papers across AI, Biology, Physics, Engineering, and more. New papers are added daily and ranked by community engagement.

What is a good free alternative to ResearchGate for publishing papers?

Researchia is a free, modern alternative to ResearchGate. You can publish papers instantly, connect with researchers, collaborate on projects, and access an open library of 200M+ scientific records — all without paywalls.

Research PaperResearchia:202602.13041

General Flexible $f$-divergence for Challenging Offline RL Datasets with Low Stochasticity and Diverse Behavior Policies

Jianxun Wang

Abstract

Offline RL algorithms aim to improve upon the behavior policy that produces the collected data while constraining the learned policy to be within the support of the dataset. However, practical offline datasets often contain examples with little diversity or limited exploration of the environment, and from multiple behavior policies with diverse expertise levels. Limited exploration can impair the offline RL algorithm's ability to estimate \textit{Q} or \textit{V} values, while constraining towar...

Submitted: February 13, 2026Subjects: AI; Artificial Intelligence

Description / Details

Offline RL algorithms aim to improve upon the behavior policy that produces the collected data while constraining the learned policy to be within the support of the dataset. However, practical offline datasets often contain examples with little diversity or limited exploration of the environment, and from multiple behavior policies with diverse expertise levels. Limited exploration can impair the offline RL algorithm's ability to estimate \textit{Q} or \textit{V} values, while constraining towards diverse behavior policies can be overly conservative. Such datasets call for a balance between the RL objective and behavior policy constraints. We first identify the connection between $f$ -divergence and optimization constraint on the Bellman residual through a more general Linear Programming form for RL and the convex conjugate. Following this, we introduce the general flexible function formulation for the $f$ -divergence to incorporate an adaptive constraint on algorithms' learning objectives based on the offline training dataset. Results from experiments on the MuJoCo, Fetch, and AdroitHand environments show the correctness of the proposed LP form and the potential of the flexible $f$ -divergence in improving performance for learning from a challenging dataset when applied to a compatible constrained optimization algorithm.

Source: arXiv:2602.11087v1 - http://arxiv.org/abs/2602.11087v1 PDF: https://arxiv.org/pdf/2602.11087v1 Original Link: http://arxiv.org/abs/2602.11087v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!