A Parameter-Free Algorithm for Misspecified Linear Contextual Bandits

Apr 14, 2021

Speakers

About

We investigate the misspecified linear contextual bandit (MLCB) problem, which is a generalization of the linear contextual bandit (LCB) problem. The MLCB problem is a decision-making problem in which a learner observes $d$-dimensional feature vectors, called arms, chooses an arm from them, and then obtains a reward of the chosen arm in each round. The learner aims to maximize the sum of the rewards over $T$ rounds. In contrast to the LCB problem, the rewards in the MLCB problem may not be represented by a linear function in feature vectors, but is approximated by a linear function with additive approximation parameter $\varepsilon \geq 0$. In this paper, we propose an algorithm that achieves regret bound $\tilde{O}(d\sqrt{T} + \eps\sqrt{d}T)$, where $\tilde{O}(\cdot)$ ignores polylogarithmic factors in $d$ and $T$. This is the first algorithm for the MLCB problem that works without knowing the approximation parameter $\varepsilon$ in advance. The obtained regret bound matches the regret bound by Lattimore et al. (2020) when $\varepsilon$ is known in advance. We remark that the proposed algorithm is optimal when $\varepsilon = 0$, i.e., for the case of the LCB problem.

Organizer

Categories

About AISTATS 2021

The 24th International Conference on Artificial Intelligence and Statistics was held virtually from Tuesday, 13 April 2021 to Thursday, 15 April 2021.

Store presentation

Should this presentation be stored for 1000 years?

How do we store presentations

Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%

Sharing

Recommended Videos

Presentations on similar topic, category or speaker

Interested in talks like this? Follow AISTATS 2021