ExplorerChemical EngineeringEngineering
Research PaperResearchia:202604.29034

SpecFed: Accelerating Federated LLM Inference with Speculative Decoding and Compressed Transmission

Ce Zheng

Abstract

Federated inference enhances LLM performance in edge computing through weighted averaging of distributed model predictions. However, autoregressive LLM inference requires frequent full-model forward passes across workers, severely limiting decoding throughput. Distributed deployment further aggravates this due to a communication bottleneck: each worker must transmit full token probability distributions per draft token, dominating end-to-end latency. To address these challenges, we introduce spec...

Submitted: April 29, 2026Subjects: Engineering; Chemical Engineering

Description / Details

Federated inference enhances LLM performance in edge computing through weighted averaging of distributed model predictions. However, autoregressive LLM inference requires frequent full-model forward passes across workers, severely limiting decoding throughput. Distributed deployment further aggravates this due to a communication bottleneck: each worker must transmit full token probability distributions per draft token, dominating end-to-end latency. To address these challenges, we introduce speculative decoding to enable parallel LLM processing and propose a top-K compressed transmission scheme with two server-side reconstruction strategies. We theoretically analyze the robustness of our method in terms of local reconstruction error, aggregation bias, and acceptance-rate bias, and derive corresponding bounds. Experiments demonstrate that our scheme achieves high generation fidelity while significantly reducing communication overhead.


Source: arXiv:2604.25777v1 - http://arxiv.org/abs/2604.25777v1 PDF: https://arxiv.org/pdf/2604.25777v1 Original Link: http://arxiv.org/abs/2604.25777v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
Apr 29, 2026
Topic:
Chemical Engineering
Area:
Engineering
Comments:
0
Bookmark