Jul 24, 2023
Speaker · 0 followers
Speaker · 8 followers
Speaker · 0 followers
Simple regret is a natural and parameter-free performance criterion for pure exploration in multi-armed bandits yet is less popular than the probability of missing the best arm or an ϵ-good arm, perhaps due to lack of easy ways to characterize it.In this paper, we make a significant progress on minimizing simple regret in both data-rich (T> n) and data-poor regime (T < n) where n is the number of arms and T is the number of samples. At its heart is our improved instance-dependent analysis of the well-known Sequential Halving (SH) algorithm where we bound the probability of returning an arm whose mean reward is not within ϵ from the best (i.e., not ϵ-good) for any choice of ϵ>0, although ϵ is not an input to SH.Our bound not only leads to an optimal worst-case simple regret bound of √(n/T) up to logarithmic factors but also essentially matches the instance-dependent lower bound for returning an ϵ-good arm reported by Katz-Samuels and Jamieson (2020).For the more challenging data-poor regime, we propose Bracketing SH (BSH) that enjoys the same improvement even without sampling each arm at least once.Our empirical study shows that BSH outperforms existing methods on real-world tasks.Simple regret is a natural and parameter-free performance criterion for pure exploration in multi-armed bandits yet is less popular than the probability of missing the best arm or an ϵ-good arm, perhaps due to lack of easy ways to characterize it.In this paper, we make a significant progress on minimizing simple regret in both data-rich (T> n) and data-poor regime (T < n) where n is the number of arms and T is the number of samples. At its heart is our improved instance-dependent analysis…
Professional recording and live streaming, delivered globally.
Presentations on similar topic, category or speaker
Jing Xu, …
Shentong Mo, …
Badih Ghazi, …