Jun Yan, Vikas Yadav, Shiyang Li, Lichang Chen, Zheng Tang, Hai Wang, Vijay Srinivasan, Xiang Ren, Hongxia Jin · Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection · SlidesLive

Kategorien

DE

Anmelden Vertrieb kontaktieren

Next

Livestream will start soon!

Livestream has already ended.

Presentation has not been recorded yet!

SlidesLive

title: Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection

0:00 / 0:00

Report Issue
Settings
Playlists
Bookmarks
Subtitles Off
Playback rate
Quality

Settings
Debug information
Server sl-yoda-v3-stream-001-alpha.b-cdn.net
Subtitles size Medium

Bookmarks

Server
sl-yoda-v3-stream-001-alpha.b-cdn.net
sl-yoda-v3-stream-001-beta.b-cdn.net
1148202645.rsc.cdn77.org
1784416251.rsc.cdn77.org

Subtitles
Off
English

Playback rate

Quality

Subtitles size
Large
Medium
Small

Mode
Video Slideshow
Audio Slideshow
Slideshow
Video

Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection

Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection

Dez 15, 2023

Sprecher:innen

Jun Yan

Sprecher:in · 0 Follower:innen

Vikas Yadav

Sprecher:in · 0 Follower:innen

Shiyang Li

Sprecher:in · 0 Follower:innen

Über

Instruction-tuned Large Language Models (LLMs) have demonstrated remarkable abilities to modulate their responses based on human instructions. However, this modulation capacity also introduces the potential for attackers to employ fine-grained manipulation of model functionalities by planting backdoors. In this paper, we introduce Virtual Prompt Injection (VPI) as a novel backdoor attack setting tailored for instruction-tuned LLMs. In a VPI attack, the backdoored model is expected to respond as…

Organisator

NeurIPS 2023

Konto · 645 Follower:innen

Gefällt euch das Format? Vertraut auf SlidesLive, um euer nächstes Event festzuhalten!

Professionelle Aufzeichnung und Livestreaming – weltweit.

Freigeben

Empfohlene Videos

Präsentationen, deren Thema, Kategorie oder Sprecher:in ähnlich sind

GenImage: A Million-Scale Benchmark for Detecting AI-Generated Image

03:38

GenImage: A Million-Scale Benchmark for Detecting AI-Generated Image

Später ansehen

Favorit

Mingjian Zhu, …

NeurIPS 2023 16 months ago

Identifiable representation learning via sparse decoding

29:14

Identifiable representation learning via sparse decoding

Später ansehen

Favorit

NeurIPS 2023 16 months ago

04:37

Opening Remark

Später ansehen

Favorit

Ashish Vaswani, …

NeurIPS 2023 16 months ago

Sequential Preference Ranking for Efficient Reinforcement Learning from Human Feedback

04:53

Sequential Preference Ranking for Efficient Reinforcement Learning from Human Feedback

Später ansehen

Favorit

Minyoung Hwang, …

NeurIPS 2023 16 months ago

SHOT: Suppressing the Hessian along the Optimization Trajectory

04:03

SHOT: Suppressing the Hessian along the Optimization Trajectory

Später ansehen

Favorit

JunHoo Lee, …

NeurIPS 2023 16 months ago

Rehearsal Learning for Avoiding Undesired Future

05:01

Rehearsal Learning for Avoiding Undesired Future

Später ansehen

Favorit

NeurIPS 2023 16 months ago