How can I publish a research paper for free?

On Researchia, you can publish research papers, preprints, and science projects instantly and for free — no paywall and no submission fee. Create a free account, go to Explorer, and click "Publish Instantly" to share your work with a global audience.

Where can I find trending research papers?

Researchia Explorer aggregates the latest and most-discussed research papers across AI, Biology, Physics, Engineering, and more. New papers are added daily and ranked by community engagement.

What is a good free alternative to ResearchGate for publishing papers?

Researchia is a free, modern alternative to ResearchGate. You can publish papers instantly, connect with researchers, collaborate on projects, and access an open library of 200M+ scientific records — all without paywalls.

Research PaperResearchia:202602.20065

Almost Sure Convergence of Differential Temporal Difference Learning for Average Reward Markov Decision Processes

Ethan Blaser

Abstract

The average reward is a fundamental performance metric in reinforcement learning (RL) focusing on the long-run performance of an agent. Differential temporal difference (TD) learning algorithms are a major advance for average reward RL as they provide an efficient online method to learn the value functions associated with the average reward in both on-policy and off-policy settings. However, existing convergence guarantees require a local clock in learning rates tied to state visit counts, which...

Submitted: February 20, 2026Subjects: AI; Artificial Intelligence

Description / Details

The average reward is a fundamental performance metric in reinforcement learning (RL) focusing on the long-run performance of an agent. Differential temporal difference (TD) learning algorithms are a major advance for average reward RL as they provide an efficient online method to learn the value functions associated with the average reward in both on-policy and off-policy settings. However, existing convergence guarantees require a local clock in learning rates tied to state visit counts, which practitioners do not use and does not extend beyond tabular settings. We address this limitation by proving the almost sure convergence of on-policy $n$ -step differential TD for any $n$ using standard diminishing learning rates without a local clock. We then derive three sufficient conditions under which off-policy $n$ -step differential TD also converges without a local clock. These results strengthen the theoretical foundations of differential TD and bring its convergence analysis closer to practical implementations.

Source: arXiv:2602.16629v1 - http://arxiv.org/abs/2602.16629v1 PDF: https://arxiv.org/pdf/2602.16629v1 Original Link: http://arxiv.org/abs/2602.16629v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!