Alexander Pan, Jun Shern Chan, Andy Zou, Nathaniel Li, Steven Basart, Thomas Woodside, Hanlin Zhang, Scott Emmons, Dan Hendrycks · Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the Machiavelli Benchmark · SlidesLive

Categories

EN

Log in Talk to sales

Next

Livestream will start soon!

Livestream has already ended.

Presentation has not been recorded yet!

SlidesLive

title: Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the Machiavelli Benchmark

0:00 / 0:00

Report Issue
Settings
Playlists
Bookmarks
Subtitles Off
Playback rate
Quality

Settings
Debug information
Server sl-yoda-v2-stream-007-alpha.b-cdn.net
Subtitles size Medium

Bookmarks

Server
sl-yoda-v2-stream-007-alpha.b-cdn.net
sl-yoda-v2-stream-007-beta.b-cdn.net
1678031076.rsc.cdn77.org
1932936657.rsc.cdn77.org

Subtitles
Off
English

Playback rate

Quality

Subtitles size
Large
Medium
Small

Mode
Video Slideshow
Audio Slideshow
Slideshow
Video

Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the Machiavelli Benchmark

Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the Machiavelli Benchmark

Jul 25, 2023

Speakers

Alexander Pan

Speaker · 0 followers

Jun Shern Chan

Speaker · 0 followers

Andy Zou

Speaker · 0 followers

About

Artificial agents have traditionally been trained to maximize reward, which may incentivize power-seeking and deception, analogous to how next-token prediction in language models (LMs) may incentivize toxicity. So do agents naturally learn to be Machiavellian? And how do we measure these behaviors in general-purpose models such as GPT-4? Towards answering these questions, we introduce Machiavelli, a benchmark of 134 Choose-Your-Own-Adventure games containing over half a million rich, diverse sce…

Organizer

ICML 2023

Account · 627 followers

Like the format? Trust SlidesLive to capture your next event!

Professional recording and live streaming, delivered globally.

Sharing

Recommended Videos

Presentations on similar topic, category or speaker

Graphically Structured Diffusion Models

05:02

Graphically Structured Diffusion Models

Watch later

Favorite

Christian Weilbach, …

ICML 2023 2 years ago

Supervised Metric Learning to Rank for Retrieval via Contextual Similarity Optimization

05:13

Supervised Metric Learning to Rank for Retrieval via Contextual Similarity Optimization

Watch later

Favorite

Christopher Liao, …

ICML 2023 2 years ago

Abstract-to-Executable Trajectory Translation for One-Shot Task Generalization

02:33

Abstract-to-Executable Trajectory Translation for One-Shot Task Generalization

Watch later

Favorite

ICML 2023 2 years ago

Tighter Lower Bounds for Shuffling SGD: Random Permutations and Beyond

07:13

Tighter Lower Bounds for Shuffling SGD: Random Permutations and Beyond

Watch later

Favorite

Jaeyoung Cha, …

ICML 2023 2 years ago

Bag of Tricks for Training Data Extraction from Language Models

05:21

Bag of Tricks for Training Data Extraction from Language Models

Watch later

Favorite

Weichen Yu, …

ICML 2023 2 years ago

Density of Reachable States for Safe Autonomous Motion Planning

33:35

Density of Reachable States for Safe Autonomous Motion Planning

Watch later

Favorite

ICML 2023 2 years ago