A Best-of-Both-Worlds Algorithm for Bandits with Delayed Feedback

Nov 28, 2022

Speakers

About

We present a modified tuning of the algorithm of Zimmert and Seldin [2020] for adversarial multiarmed bandits with delayed feedback, which in addition to the minimax optimal adversarial regret guarantee shown by Zimmert and Seldin [2020] simultaneously achieves a near-optimal regret guarantee in the stochastic setting with fixed delays. Specifically, the adversarial regret guarantee is π’ͺ(√(TK) + √(dTlog K)), where T is the time horizon, K is the number of arms, and d is the fixed delay, whereas the stochastic regret guarantee is π’ͺ(βˆ‘_i β‰  i^*(1/Ξ”_ilog(T) + d/Ξ”_i) + d K^1/3log K), where Ξ”_i are the suboptimality gaps. We also present an extension of the algorithm to the case of arbitrary delays, which is based on an oracle knowledge of the maximal delay d_max and achieves π’ͺ(√(TK) + √(Dlog K) + d_maxK^1/3log K) regret in the adversarial regime, where D is the total delay, and π’ͺ(βˆ‘_i β‰  i^*(1/Ξ”_ilog(T) + Οƒ_max/Ξ”_i) + d_maxK^1/3log K) regret in the stochastic regime, where Οƒ_max is the maximal number of outstanding observations. Finally, we present a lower bound that matches regret upper bound achieved by the skipping technique of Zimmert and Seldin [2020] in the adversarial setting.

Organizer

Store presentation

Should this presentation be stored for 1000 years?

How do we store presentations

Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%

Sharing

Recommended Videos

Presentations on similar topic, category or speaker

Interested in talks like this? Follow NeurIPS 2022