How can I publish a research paper for free?

On Researchia, you can publish research papers, preprints, and science projects instantly and for free — no paywall and no submission fee. Create a free account, go to Explorer, and click "Publish Instantly" to share your work with a global audience.

Where can I find trending research papers?

Researchia Explorer aggregates the latest and most-discussed research papers across AI, Biology, Physics, Engineering, and more. New papers are added daily and ranked by community engagement.

What is a good free alternative to ResearchGate for publishing papers?

Researchia is a free, modern alternative to ResearchGate. You can publish papers instantly, connect with researchers, collaborate on projects, and access an open library of 200M+ scientific records — all without paywalls.

Research PaperResearchia:202604.20052

Beyond Distribution Sharpening: The Importance of Task Rewards

Sarthak Mittal

Abstract

Frontier models have demonstrated exceptional capabilities following the integration of task-reward-based reinforcement learning (RL) into their training pipelines, enabling systems to evolve from pure reasoning models into sophisticated agents. However, debate persists regarding whether RL genuinely instills new skills within a base model or merely sharpens its existing distribution to elicit latent capabilities. To address this dichotomy, we present an explicit comparison between distribution ...

Submitted: April 20, 2026Subjects: AI; Artificial Intelligence

Description / Details

Frontier models have demonstrated exceptional capabilities following the integration of task-reward-based reinforcement learning (RL) into their training pipelines, enabling systems to evolve from pure reasoning models into sophisticated agents. However, debate persists regarding whether RL genuinely instills new skills within a base model or merely sharpens its existing distribution to elicit latent capabilities. To address this dichotomy, we present an explicit comparison between distribution sharpening and task-reward-based learning, utilizing RL as a tool to implement both paradigms. Our analysis reveals the inherent limitations of distribution sharpening, demonstrating from first principles how and why the optima can be unfavorable and the approach fundamentally unstable. Furthermore, our experiments using Llama-3.2-3B-Instruct, Qwen2.5-3B-Instruct and Qwen3-4B-Instruct-2507 on math datasets confirm that sharpening yields limited gains, whereas incorporating task-based reward signal can greatly help achieve robust performance improvements and stable learning.

Source: arXiv:2604.16259v1 - http://arxiv.org/abs/2604.16259v1 PDF: https://arxiv.org/pdf/2604.16259v1 Original Link: http://arxiv.org/abs/2604.16259v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!