ExplorerArtificial IntelligenceAI
Research PaperResearchia:202604.24076

Replay-buffer engineering for noise-robust quantum circuit optimization

Akash Kundu

Abstract

Deep reinforcement learning (RL) for quantum circuit optimization faces three fundamental bottlenecks: replay buffers that ignore the reliability of temporal-difference (TD) targets, curriculum-based architecture search that triggers a full quantum-classical evaluation at every environment step, and the routine discard of noiseless trajectories when retraining under hardware noise. We address all three by treating the replay buffer as a primary algorithmic lever for quantum optimization. We intr...

Submitted: April 24, 2026Subjects: AI; Artificial Intelligence

Description / Details

Deep reinforcement learning (RL) for quantum circuit optimization faces three fundamental bottlenecks: replay buffers that ignore the reliability of temporal-difference (TD) targets, curriculum-based architecture search that triggers a full quantum-classical evaluation at every environment step, and the routine discard of noiseless trajectories when retraining under hardware noise. We address all three by treating the replay buffer as a primary algorithmic lever for quantum optimization. We introduce ReaPER++, an annealed replay rule that transitions from TD error-driven prioritization early in training to reliability-aware sampling as value estimates mature, achieving 432×4-32\times gains in sample efficiency over fixed PER, ReaPER, and uniform replay while consistently discovering more compact circuits across quantum compilation and QAS benchmarks; validation on LunarLander-v3 confirms the principle is domain-agnostic. Furthermore we eliminate the quantum-classical evaluation bottleneck in curriculum RL by introducing OptCRLQAS which amortizes expensive evaluations over multiple architectural edits, cutting wall-clock time per episode by up to 67.5%67.5\% on a 12-qubit optimization problem without degrading solution quality. Finally we introduce a lightweight replay-buffer transfer scheme that warm-starts noisy-setting learning by reusing noiseless trajectories, without network-weight transfer or εε-greedy pretraining. This reduces steps to chemical accuracy by up to 8590%85-90\% and final energy error by up to 90%90\% over from-scratch baselines on 6-, 8-, and 12-qubit molecular tasks. Together, these results establish that experience storage, sampling, and transfer are decisive levers for scalable, noise-robust quantum circuit optimization.


Source: arXiv:2604.21863v1 - http://arxiv.org/abs/2604.21863v1 PDF: https://arxiv.org/pdf/2604.21863v1 Original Link: http://arxiv.org/abs/2604.21863v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
Apr 24, 2026
Topic:
Artificial Intelligence
Area:
AI
Comments:
0
Bookmark
Replay-buffer engineering for noise-robust quantum circuit optimization | Researchia