2. července 2022
Řečník · 0 sledujících
Řečník · 1 sledující
We introduce a modification of follow the regularised leader and combine it with the log determinant potential and suitable loss estimators to prove that the minimax regret for adaptive adversarial linear bandits is at most O(d √(T log(T))) where d is the dimension and T is the number of rounds. By using exponential weights, we improve this bound to O(√(dTlog(kT))) when the action set has size k. These results confirms an old conjecture. We also show that follow the regularized leader with the entropic barrier and suitable loss estimators has regret against an adaptive adversary of at most O(d^2 √(T)log(T)) and can be implement in polynomial time, which improves on the best known bound for an efficient algorithm of O(d^7/2√(T)(log(T))) by Lee et al 2020.We introduce a modification of follow the regularised leader and combine it with the log determinant potential and suitable loss estimators to prove that the minimax regret for adaptive adversarial linear bandits is at most O(d √(T log(T))) where d is the dimension and T is the number of rounds. By using exponential weights, we improve this bound to O(√(dTlog(kT))) when the action set has size k. These results confirms an old conjecture. We also show that follow the regularized leader with the e…
The conference is held annually since 1988 and has become the leading conference on Learning theory by maintaining a highly selective process for submissions. It is committed in high-quality articles in all theoretical aspects of machine learning and related topics.
Profesionální natáčení a streamování po celém světě.
Prezentace na podobné téma, kategorii nebo přednášejícího
Adam Block, …