Back to Explorer
Research PaperResearchia:202603.18073[Data Science > Machine Learning]

A Practical Algorithm for Feature-Rich, Non-Stationary Bandit Problems

Wei Min Loh

Abstract

Contextual bandits are incredibly useful in many practical problems. We go one step further by devising a more realistic problem that combines: (1) contextual bandits with dense arm features, (2) non-linear reward functions, and (3) a generalization of correlated bandits where reward distributions change over time but the degree of correlation maintains. This formulation lends itself to a wider set of applications such as recommendation tasks. To solve this problem, we introduce conditionally coupled contextual C3 Thompson sampling for Bernoulli bandits. It combines an improved Nadaraya-Watson estimator on an embedding space with Thompson sampling that allows online learning without retraining. Empirical results show that C3 outperforms the next best algorithm by 5.7% lower average cumulative regret on four OpenML tabular datasets as well as demonstrating a 12.4% click lift on Microsoft News Dataset (MIND) compared to other algorithms.


Source: arXiv:2603.16755v1 - http://arxiv.org/abs/2603.16755v1 PDF: https://arxiv.org/pdf/2603.16755v1 Original Link: http://arxiv.org/abs/2603.16755v1

Submission:3/18/2026
Comments:0 comments
Subjects:Machine Learning; Data Science
Original Source:
View Original PDF
arXiv: This paper is hosted on arXiv, an open-access repository
Was this helpful?

Discussion (0)

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

A Practical Algorithm for Feature-Rich, Non-Stationary Bandit Problems | Researchia