Tight Regret Bounds for Single-pass Streaming Multi-armed Bandits

Jul 24, 2023

Speakers

About

Regret minimization in streaming multi-armed bandits (MABs) has been studied extensively, and recent work has shown that algorithms with o(K) memory have to incur Ω(T^2/3) regret, where K and T are the numbers of arms and trials. However, the previous best regret upper bound is still O(K^1/3 T^2/3log^1/3(T)), which is achieved by the simple uniform exploration algorithm. In this paper, we close this gap and complete the picture of regret minimization in single-pass streaming MABs. We first improve the regret lower bound to Ω(K^1/3T^2/3) for algorithms with o(K) memory. We then show that the log^1/3(T) factor is not necessary by designing algorithms with at most O(log^*(K))-arm memory and achieve O(K^1/3T^2/3) expected regret based on streaming ε-best arm algorithms. We further tested the empirical performances of our algorithms on simulated MABs instances, where the proposed algorithms outperform the benchmark uniform exploration algorithm by a large margin and, on occasion, reduce the regret by up to 70

Organizer

Store presentation

Should this presentation be stored for 1000 years?

How do we store presentations

Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%

Sharing

Recommended Videos

Presentations on similar topic, category or speaker

Interested in talks like this? Follow ICML 2023