First- and Second-Order Bounds for Adversarial Linear Contextual Bandits

10. Prosinec 2023

Řečníci

O prezentaci

We consider the adversarial linear contextual bandit setting, whichallows for the loss functions associated with each of K arms to changeover time without restriction. Assuming the d-dimensional contexts aredrawn from a fixed known distribution, the worst-case expected regretover the course of T rounds is known to scale as Õ(√(KdT)). Under the additional assumption that the density of the contextsis log-concave, we obtain a second-order bound of order (K√(d V_T)) in terms of the cumulative second moment of thelearner's losses V_T, and a closely related first-order bound of orderÕ(K√(d L_T^*)) in terms of the cumulative loss of the bestpolicy L_T^*. Since V_T or L_T^* may be significantly smaller thanT, these improve over the worst-case regret whenever the environmentis relatively benign. Our results are obtained using a truncated versionof the continuous exponential weights algorithm over the probabilitysimplex, which we analyse by exploiting a novel connection to the linearbandit setting without contexts.

Organizátor

Uložení prezentace

Měla by být tato prezentace uložena po dobu 1000 let?

Jak ukládáme prezentace

Pro uložení prezentace do věčného trezoru hlasovalo 0 diváků, což je 0.0 %

Sdílení

Doporučená videa

Prezentace na podobné téma, kategorii nebo přednášejícího

Zajímají Vás podobná videa? Sledujte NeurIPS 2023