Chen Wang · Tight Regret Bounds for Single-pass Streaming Multi-armed Bandits · SlidesLive

Categories

EN

Log in Get an estimate

Tight Regret Bounds for Single-pass Streaming Multi-armed Bandits

Jul 24, 2023

Speakers

About

Regret minimization in streaming multi-armed bandits (MABs) has been studied extensively, and recent work has shown that algorithms with o(K) memory have to incur Ω(T^2/3) regret, where K and T are the numbers of arms and trials. However, the previous best regret upper bound is still O(K^1/3 T^2/3log^1/3(T)), which is achieved by the simple uniform exploration algorithm. In this paper, we close this gap and complete the picture of regret minimization in single-pass streaming MABs. We first improve the regret lower bound to Ω(K^1/3T^2/3) for algorithms with o(K) memory. We then show that the log^1/3(T) factor is not necessary by designing algorithms with at most O(log^*(K))-arm memory and achieve O(K^1/3T^2/3) expected regret based on streaming ε-best arm algorithms. We further tested the empirical performances of our algorithms on simulated MABs instances, where the proposed algorithms outperform the benchmark uniform exploration algorithm by a large margin and, on occasion, reduce the regret by up to 70

Organizer

Store presentation

Should this presentation be stored for 1000 years?

How do we store presentations

Sharing

Recommended Videos

Presentations on similar topic, category or speaker