Dec 10, 2023
Speaker · 0 followers
Speaker · 0 followers
Speaker · 0 followers
Speaker · 8 followers
Recently, we have seen a growing interest in device-control systems that can interpret human natural language instructions and execute them on a device by directly controlling its UI. We present a new dataset for mobile device control, AndroidInTheWild, which is orders of magnitude larger than current datasets. The dataset contains human-collected demonstrations of natural language instructions, UI screens, and actions. It consists of 715k episodes spanning 30k unique prompts, four versions of Android (v10–13), and 8 different device types (Pixel 2 XL to Pixel 6) with varying screen resolutions. It contains long-horizon multi-step tasks that require semantic understanding. In addition to its language complexity, this dataset poses a new challenge as the UI elements composing an application's screen must be inferred solely from its visual appearance. Instead of simple UI element-based actions, the action space consists of precise gestures allowing for a higher-degree of interactivity (e.g., horizontal scrolls to operate carousel widgets). We organize our dataset to support and encourage robustness analysis of device-control systems, i.e., how well a system performs in the presence of new task descriptions, new applications or new platform versions. We develop two baseline agents and report performance on the dataset. The dataset is available at https://github.com/google-research/google-research/tree/master/android_in_the_wild.Recently, we have seen a growing interest in device-control systems that can interpret human natural language instructions and execute them on a device by directly controlling its UI. We present a new dataset for mobile device control, AndroidInTheWild, which is orders of magnitude larger than current datasets. The dataset contains human-collected demonstrations of natural language instructions, UI screens, and actions. It consists of 715k episodes spanning 30k unique prompts, four versions of …
Professional recording and live streaming, delivered globally.
Presentations on similar topic, category or speaker