Dec 6, 2021
Speaker · 0 followers
Speaker · 0 followers
Speaker · 0 followers
Speaker · 1 follower
Speaker · 0 followers
Speaker · 0 followers
Speaker · 0 followers
Speaker · 9 followers
This paper addresses the problem of policy selection in domains with abundant logged data, but with a very restricted interaction budget. Finding a solution to this problem is important because it would enable safe evaluation and deployment of offline reinforcement learning policies in industry, robotics, and healthcare domain among others. Several off-policy evaluation (OPE) techniques have been proposed to assess the value of policies using only logged data. However, there is still a big gap between the evaluation by OPE and the full online evaluation in the real environment.To reduce this gap, we introduce a novel active offline policy selection problem formulation, which relies both on logged data and limited online interactions with the environment. We build upon Bayesian optimization to decide which policies to evaluate in a sequential manner. We rely on the advances in OPE to warm start the evaluation and combine it with online policy evaluations as evaluation progresses.To make this approach scalable to the large number of policies, we introduce a kernel function to model similarity between policies.We use several benchmark environments to show that the proposed approach improves upon state-of-the-art OPE estimates and outperforms common interaction strategies including uniform sampling and independent multi-armed bandits.This paper addresses the problem of policy selection in domains with abundant logged data, but with a very restricted interaction budget. Finding a solution to this problem is important because it would enable safe evaluation and deployment of offline reinforcement learning policies in industry, robotics, and healthcare domain among others. Several off-policy evaluation (OPE) techniques have been proposed to assess the value of policies using only logged data. However, there is still a big gap b…
Account · 1.9k followers
Neural Information Processing Systems (NeurIPS) is a multi-track machine learning and computational neuroscience conference that includes invited talks, demonstrations, symposia and oral and poster presentations of refereed papers. Following the conference, there are workshops which provide a less formal setting.
Professional recording and live streaming, delivered globally.
Presentations on similar topic, category or speaker
Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%
Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%
Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%
Mark Bun, …
Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%
Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%
Tian Gao, …
Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%