How can I publish a research paper for free?

On Researchia, you can publish research papers, preprints, and science projects instantly and for free — no paywall and no submission fee. Create a free account, go to Explorer, and click "Publish Instantly" to share your work with a global audience.

Where can I find trending research papers?

Researchia Explorer aggregates the latest and most-discussed research papers across AI, Biology, Physics, Engineering, and more. New papers are added daily and ranked by community engagement.

What is a good free alternative to ResearchGate for publishing papers?

Researchia is a free, modern alternative to ResearchGate. You can publish papers instantly, connect with researchers, collaborate on projects, and access an open library of 200M+ scientific records — all without paywalls.

Research PaperResearchia:202605.05001

SpecKV: Adaptive Speculative Decoding with Compression-Aware Gamma Selection

Shikhar Shukla

Abstract

Speculative decoding accelerates large language model (LLM) inference by using a small draft model to propose candidate tokens that a larger target model verifies. A critical hyperparameter in this process is the speculation length~$γ$, which determines how many tokens the draft model proposes per step. Nearly all existing systems use a fixed~$γ$ (typically~4), yet empirical evidence suggests that the optimal value varies across task types and, crucially, depends on the compression level applied...

Submitted: May 5, 2026Subjects: AI; Artificial Intelligence

Description / Details

Speculative decoding accelerates large language model (LLM) inference by using a small draft model to propose candidate tokens that a larger target model verifies. A critical hyperparameter in this process is the speculation length~ $γ$ , which determines how many tokens the draft model proposes per step. Nearly all existing systems use a fixed~ $γ$ (typically4), yet empirical evidence suggests that the optimal value varies across task types and, crucially, depends on the compression level applied to the target model. In this paper, we present \textbf{SpecKV}, a lightweight adaptive controller that selects $γ$ per speculation step using signals extracted from the draft model itself. We profile speculative decoding across 4~~task categories, 4~~speculation lengths, and 3~~compression levels (FP16, INT8, NF4), collecting 5,112 step-level records with per-step acceptance rates, draft entropy, and draft confidence. We demonstrate that the optimal~~ $γ$ shifts across compression regimes and that draft model confidence and entropy are strong predictors of acceptance rate (correlation~ $\approx 0.56$ ). SpecKV uses a small MLP trained on these signals to maximize expected tokens per speculation step, achieving a 56.0% improvement over the fixed- $γ$ =4 baseline with only 0.34,ms overhead per decision ( $<$ 0.5% of step time). The improvement is statistically significant ( $p < 0.001$ , paired bootstrap test). We release all profiling data, trained models, and notebooks as open-source artifacts.

Source: arXiv:2605.02888v1 - http://arxiv.org/abs/2605.02888v1 PDF: https://arxiv.org/pdf/2605.02888v1 Original Link: http://arxiv.org/abs/2605.02888v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!