Back to Explorer
Research PaperResearchia:202602.05014[Mathematics > Mathematics]

Optimism Stabilizes Thompson Sampling for Adaptive Inference

Shunxing Yan

Abstract

Thompson sampling (TS) is widely used for stochastic multi-armed bandits, yet its inferential properties under adaptive data collection are subtle. Classical asymptotic theory for sample means can fail because arm-specific sample sizes are random and coupled with the rewards through the action-selection rule. We study this phenomenon in the KK-armed Gaussian bandit and identify \emph{optimism} as a key mechanism for restoring \emph{stability}, a sufficient condition for valid asymptotic inference requiring each arm's pull count to concentrate around a deterministic scale. First, we prove that variance-inflated TS \citep{halder2025stable} is stable for any K2K \ge 2, including the challenging regime where multiple arms are optimal. This resolves the open question raised by \citet{halder2025stable} through extending their results from the two-armed setting to the general KK-armed setting. Second, we analyze an alternative optimistic modification that keeps the posterior variance unchanged but adds an explicit mean bonus to posterior mean, and establish the same stability conclusion. In summary, suitably implemented optimism stabilizes Thompson sampling and enables asymptotically valid inference in multi-armed bandits, while incurring only a mild additional regret cost.


Source: arXiv:2602.06014v1 - http://arxiv.org/abs/2602.06014v1 PDF: https://arxiv.org/pdf/2602.06014v1 Original Article: View on arXiv

Submission:2/5/2026
Comments:0 comments
Subjects:Mathematics; Mathematics
Original Source:
View Original PDF
arXiv: This paper is hosted on arXiv, an open-access repository
Was this helpful?

Discussion (0)

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Optimism Stabilizes Thompson Sampling for Adaptive Inference | Researchia | Researchia