A Parameter-Free Algorithm for Misspecified Linear Contextual Bandits

Apr 14, 2021

Sprecher:innen

Über

We investigate the misspecified linear contextual bandit (MLCB) problem, which is a generalization of the linear contextual bandit (LCB) problem. The MLCB problem is a decision-making problem in which a learner observes $d$-dimensional feature vectors, called arms, chooses an arm from them, and then obtains a reward of the chosen arm in each round. The learner aims to maximize the sum of the rewards over $T$ rounds. In contrast to the LCB problem, the rewards in the MLCB problem may not be represented by a linear function in feature vectors, but is approximated by a linear function with additive approximation parameter $\varepsilon \geq 0$. In this paper, we propose an algorithm that achieves regret bound $\tilde{O}(d\sqrt{T} + \eps\sqrt{d}T)$, where $\tilde{O}(\cdot)$ ignores polylogarithmic factors in $d$ and $T$. This is the first algorithm for the MLCB problem that works without knowing the approximation parameter $\varepsilon$ in advance. The obtained regret bound matches the regret bound by Lattimore et al. (2020) when $\varepsilon$ is known in advance. We remark that the proposed algorithm is optimal when $\varepsilon = 0$, i.e., for the case of the LCB problem.

Organisator

Kategorien

Über AISTATS 2021

The 24th International Conference on Artificial Intelligence and Statistics was held virtually from Tuesday, 13 April 2021 to Thursday, 15 April 2021.

Präsentation speichern

Soll diese Präsentation für 1000 Jahre gespeichert werden?

Wie speichern wir Präsentationen?

Ewigspeicher-Fortschrittswert: 0 = 0.0%

Freigeben

Empfohlene Videos

Präsentationen, deren Thema, Kategorie oder Sprecher:in ähnlich sind

Interessiert an Vorträgen wie diesem? AISTATS 2021 folgen