First- and Second-Order Bounds for Adversarial Linear Contextual Bandits

Dec 10, 2023

Speakers

About

We consider the adversarial linear contextual bandit setting, whichallows for the loss functions associated with each of K arms to changeover time without restriction. Assuming the d-dimensional contexts aredrawn from a fixed known distribution, the worst-case expected regretover the course of T rounds is known to scale as Õ(√(KdT)). Under the additional assumption that the density of the contextsis log-concave, we obtain a second-order bound of order (K√(d V_T)) in terms of the cumulative second moment of thelearner's losses V_T, and a closely related first-order bound of orderÕ(K√(d L_T^*)) in terms of the cumulative loss of the bestpolicy L_T^*. Since V_T or L_T^* may be significantly smaller thanT, these improve over the worst-case regret whenever the environmentis relatively benign. Our results are obtained using a truncated versionof the continuous exponential weights algorithm over the probabilitysimplex, which we analyse by exploiting a novel connection to the linearbandit setting without contexts.

Organizer

Store presentation

Should this presentation be stored for 1000 years?

How do we store presentations

Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%

Sharing

Recommended Videos

Presentations on similar topic, category or speaker

Interested in talks like this? Follow NeurIPS 2023