Dez 6, 2022
Human teammates often form conscious and subconscious expectations of each other to better interact. Teaming success is contingent on whether such expectations can be met. For an intelligent agent to operate with humans beside them, similarly, it must consider the human's expectation of its behavior. Otherwise, it may result in loss of trust and degraded team performance. A key challenge here is that the human's expectation may not align with the agent's optimal behavior, due to, for example, the human's partial or inaccurate understanding of the task domain. Prior work on explicable planning describes the ability of agents to respect their human teammate's expectations by trading off task performance for more expected or "explicable" behaviors. In this paper, we introduce Explicable Policy Search (EPS) to significantly extend such an ability to a reinforcement learning (RL) setting and to handle stochastic domains with continuous state and action spaces. However, in contrast to the traditional RL method, EPS must at the same time infer the human's hidden expectations. Such inferences require information about the human's belief about domain dynamics and her reward model but directly querying them is impractical. We demonstrate that they can be sufficiently encoded by a surrogate reward function, which can be learned based on the human's feedback on the agent's behavior. The surrogate reward function is then used to reshape the agent's reward function, which is shown to be equivalent to searching for an explicable policy. We evaluate our method for EPS in a set of continuous navigation domains with synthetic human models and in an autonomous driving domain with a user study. The results suggest that our method can generate explicable behaviors that reconcile task performance with human expectation intelligently and has real-world relevance in human-agent teaming domains.Human teammates often form conscious and subconscious expectations of each other to better interact. Teaming success is contingent on whether such expectations can be met. For an intelligent agent to operate with humans beside them, similarly, it must consider the human's expectation of its behavior. Otherwise, it may result in loss of trust and degraded team performance. A key challenge here is that the human's expectation may not align with the agent's optimal behavior, due to, for example, th…
Konto · 962 Follower:innen
Professionelle Aufzeichnung und Livestreaming – weltweit.
Präsentationen, deren Thema, Kategorie oder Sprecher:in ähnlich sind
Ewigspeicher-Fortschrittswert: 0 = 0.0%
Allison Tam, …
Ewigspeicher-Fortschrittswert: 0 = 0.0%
Ewigspeicher-Fortschrittswert: 0 = 0.0%
Ewigspeicher-Fortschrittswert: 0 = 0.0%
Ruijie Wang, …
Ewigspeicher-Fortschrittswert: 0 = 0.0%
Yu-Ting Lin, …
Ewigspeicher-Fortschrittswert: 0 = 0.0%