A Parameter-Free Algorithm for Misspecified Linear Contextual Bandits

14. Duben 2021

Řečníci

O prezentaci

We investigate the misspecified linear contextual bandit (MLCB) problem, which is a generalization of the linear contextual bandit (LCB) problem. The MLCB problem is a decision-making problem in which a learner observes $d$-dimensional feature vectors, called arms, chooses an arm from them, and then obtains a reward of the chosen arm in each round. The learner aims to maximize the sum of the rewards over $T$ rounds. In contrast to the LCB problem, the rewards in the MLCB problem may not be represented by a linear function in feature vectors, but is approximated by a linear function with additive approximation parameter $\varepsilon \geq 0$. In this paper, we propose an algorithm that achieves regret bound $\tilde{O}(d\sqrt{T} + \eps\sqrt{d}T)$, where $\tilde{O}(\cdot)$ ignores polylogarithmic factors in $d$ and $T$. This is the first algorithm for the MLCB problem that works without knowing the approximation parameter $\varepsilon$ in advance. The obtained regret bound matches the regret bound by Lattimore et al. (2020) when $\varepsilon$ is known in advance. We remark that the proposed algorithm is optimal when $\varepsilon = 0$, i.e., for the case of the LCB problem.

Organizátor

Kategorie

O organizátorovi (AISTATS 2021)

The 24th International Conference on Artificial Intelligence and Statistics was held virtually from Tuesday, 13 April 2021 to Thursday, 15 April 2021.

Uložení prezentace

Měla by být tato prezentace uložena po dobu 1000 let?

Jak ukládáme prezentace

Pro uložení prezentace do věčného trezoru hlasovalo 0 diváků, což je 0.0 %

Sdílení

Doporučená videa

Prezentace na podobné téma, kategorii nebo přednášejícího

Zajímají Vás podobná videa? Sledujte AISTATS 2021