ExplorerMachine LearningMachine Learning
Research PaperResearchia:202601.1204f432

Reward-Preserving Attacks For Robust Reinforcement Learning

Lucas Schott

Abstract

Adversarial robustness in RL is difficult because perturbations affect entire trajectories: strong attacks can break learning, while weak attacks yield little robustness, and the appropriate strength varies by state. We propose $α$-reward-preserving attacks, which adapt the strength of the adversary so that an $α$ fraction of the nominal-to-worst-case return gap remains achievable at each state. In deep RL, we use a gradient-based attack direction and learn a state-dependent magnitude $η\le η_{\...

Submitted: January 12, 2026Subjects: Machine Learning; Machine Learning

Description / Details

Adversarial robustness in RL is difficult because perturbations affect entire trajectories: strong attacks can break learning, while weak attacks yield little robustness, and the appropriate strength varies by state. We propose αα-reward-preserving attacks, which adapt the strength of the adversary so that an αα fraction of the nominal-to-worst-case return gap remains achievable at each state. In deep RL, we use a gradient-based attack direction and learn a state-dependent magnitude ηηBη\le η_{\mathcal B} selected via a critic Qαπ((s,a),η)Q^π_α((s,a),η) trained off-policy over diverse radii. This adaptive tuning calibrates attack strength and, with intermediate αα, improves robustness across radii while preserving nominal performance, outperforming fixed- and random-radius baselines.

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
Jan 12, 2026
Topic:
Machine Learning
Area:
Machine Learning
Comments:
0
Bookmark
Reward-Preserving Attacks For Robust Reinforcement Learning | Researchia