Conservative Exploration in Bandits and Reinforcement Learning

Jul 17, 2020



A major challenge in deploying machine learning algorithms for decision-making problems is the lack of guarantee for the performance of their resulting policies, especially those generated during the initial exploratory phase of these algorithms. Online decision-making algorithms, such as those in bandits and reinforcement learning (RL), learn a policy while interacting with the real system. Although these algorithms will eventually learn a good or an optimal policy, there is no guarantee for the performance of their intermediate policies, especially at the very beginning, when they perform a large amount of exploration. Thus, in order to increase their applicability, it is important to control their exploration and to make it more conservative. To address this issue, we define a notion of safety that we refer to as safety w.r.t. a baseline. In this definition, a policy considered to be safe if it performs at least as well as a baseline, which is usually the current strategy of the company. We formulate this notion of safety in bandits and RL and show how it can be integrated into these algorithms as a constraint that must be satisfied uniformly in time. We derive contextual linear bandits and RL algorithms that minimize their regret, while ensure that at any given time, their expected sum of rewards remains above a fixed percentage of the expected sum of rewards of the baseline policy. This fixed percentage depends on the amount of risk that the manager of the system is willing to take. We prove regret bounds for our algorithms and show that the cost of satisfying the constraint (conservative exploration) can be controlled. Finally, we report experimental results to validate our theoretical analysis. We conclude the talk by discussing a few other constrained bandit formulations.



About ICML 2020

The International Conference on Machine Learning (ICML) is the premier gathering of professionals dedicated to the advancement of the branch of artificial intelligence known as machine learning. ICML is globally renowned for presenting and publishing cutting-edge research on all aspects of machine learning used in closely related areas like artificial intelligence, statistics and data science, as well as important application areas such as machine vision, computational biology, speech recognition, and robotics. ICML is one of the fastest growing artificial intelligence conferences in the world. Participants at ICML span a wide range of backgrounds, from academic and industrial researchers, to entrepreneurs and engineers, to graduate students and postdocs.

Store presentation

Should this presentation be stored for 1000 years?

How do we store presentations

Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%


Recommended Videos

Presentations on similar topic, category or speaker

Interested in talks like this? Follow ICML 2020