Tony Wang, Miles Wang, Kaivu Hariharan, Nir Shavit · Forbidden Facts: An Investigation into Competing Objectives in Llama 2 · SlidesLive

Categories

Arts, Design & Media

Category · 1.2k presentations

Business & Economics

Category · 3.8k presentations

Computer Science & IT

Category · 14.8k presentations

Engineering & Technology

Category · 491 presentations

Humanities & Social Sciences

Category · 1.3k presentations

Medicine & Health

Category · 529 presentations

Natural & Formal Sciences

Category · 3.3k presentations

Self Development & Lifestyle

Category · 599 presentations

EN

Log in Talk to sales

Next

Livestream will start soon!

Livestream has already ended.

Presentation has not been recorded yet!

SlidesLive

title: Forbidden Facts: An Investigation into Competing Objectives in Llama 2

0:00 / 0:00

Report Issue
Settings
Playlists
Bookmarks
Subtitles Off
Playback rate
Quality

Settings
Debug information
Server sl-yoda-v2-stream-007-alpha.b-cdn.net
Subtitles size Medium

Bookmarks

Server
sl-yoda-v2-stream-007-alpha.b-cdn.net
sl-yoda-v2-stream-007-beta.b-cdn.net
1678031076.rsc.cdn77.org
1932936657.rsc.cdn77.org

Subtitles
Off
English

Playback rate

Quality

Subtitles size
Large
Medium
Small

Mode
Video Slideshow
Audio Slideshow
Slideshow
Video

Forbidden Facts: An Investigation into Competing Objectives in Llama 2

Forbidden Facts: An Investigation into Competing Objectives in Llama 2

Dec 15, 2023

Speakers

Tony Wang

Speaker · 2 followers

Miles Wang

Speaker · 0 followers

Kaivu Hariharan

Speaker · 0 followers

About

LLMs often face competing pressures (for example helpfulness vs. harmlessness). To understand how models resolve such conflicts, we study Llama-2-7b-chat on the forbidden fact task. Specifically, we instruct Llama 2 to truthfully complete a factual recall statement while forbidding it from saying the correct answer. This often makes the model give incorrect answers. We decompose Llama 2 into 1057 different components, and rank each one with respect to how useful it is for forbidding the correct…

Organizer

NeurIPS 2023

Account · 54 followers

Like the format? Trust SlidesLive to capture your next event!

Professional recording and live streaming, delivered globally.

Sharing

Recommended Videos

Presentations on similar topic, category or speaker

World of Bits: An OPen-Domain Platform for Web-Based Agents

31:39

World of Bits: An OPen-Domain Platform for Web-Based Agents

Watch later

Favorite

Tianlin Shi, …

NeurIPS 2023 14 months ago

Analyzing Human Movement on a Planetary Scale

23:41

Analyzing Human Movement on a Planetary Scale

Watch later

Favorite

NeurIPS 2023 14 months ago

Best Arm Identification with Fixed Budget: A Large Deviation Perspective

04:55

Best Arm Identification with Fixed Budget: A Large Deviation Perspective

Watch later

Favorite

Po-An Wang, …

NeurIPS 2023 14 months ago

An (unhelpful) guide to selecting the best ASR architecture for your under-resourced language

16:49

An (unhelpful) guide to selecting the best ASR architecture for your under-resourced language

Watch later

Favorite

Robbie Jimerson

NeurIPS 2023 14 months ago

Finding Counterfactually Optimal Action Sequences in Continuous State Spaces

04:47

Finding Counterfactually Optimal Action Sequences in Continuous State Spaces

Watch later

Favorite

Stratis Tsirtsis, …

NeurIPS 2023 14 months ago

Characterizing Out-of-Distribution Error via Optimal Transport

05:02

Characterizing Out-of-Distribution Error via Optimal Transport

Watch later

Favorite

NeurIPS 2023 14 months ago