Next
Livestream will start soon!
Livestream has already ended.
Presentation has not been recorded yet!
  • title: Forbidden Facts: An Investigation into Competing Objectives in Llama 2
      0:00 / 0:00
      • Report Issue
      • Settings
      • Playlists
      • Bookmarks
      • Subtitles Off
      • Playback rate
      • Quality
      • Settings
      • Debug information
      • Server sl-yoda-v2-stream-007-alpha.b-cdn.net
      • Subtitles size Medium
      • Bookmarks
      • Server
      • sl-yoda-v2-stream-007-alpha.b-cdn.net
      • sl-yoda-v2-stream-007-beta.b-cdn.net
      • 1678031076.rsc.cdn77.org
      • 1932936657.rsc.cdn77.org
      • Subtitles
      • Off
      • English
      • Playback rate
      • Quality
      • Subtitles size
      • Large
      • Medium
      • Small
      • Mode
      • Video Slideshow
      • Audio Slideshow
      • Slideshow
      • Video
      My playlists
        Bookmarks
          00:00:00
            Forbidden Facts: An Investigation into Competing Objectives in Llama 2
            • Settings
            • Sync diff
            • Quality
            • Settings
            • Server
            • Quality
            • Server

            Forbidden Facts: An Investigation into Competing Objectives in Llama 2

            Dec 15, 2023

            Speakers

            TW

            Tony Wang

            Speaker · 2 followers

            MW

            Miles Wang

            Speaker · 0 followers

            KH

            Kaivu Hariharan

            Speaker · 0 followers

            About

            LLMs often face competing pressures (for example helpfulness vs. harmlessness). To understand how models resolve such conflicts, we study Llama-2-7b-chat on the forbidden fact task. Specifically, we instruct Llama 2 to truthfully complete a factual recall statement while forbidding it from saying the correct answer. This often makes the model give incorrect answers. We decompose Llama 2 into 1057 different components, and rank each one with respect to how useful it is for forbidding the correct…

            Organizer

            N2
            N2

            NeurIPS 2023

            Account · 54 followers

            Like the format? Trust SlidesLive to capture your next event!

            Professional recording and live streaming, delivered globally.

            Sharing

            Recommended Videos

            Presentations on similar topic, category or speaker

            World of Bits: An OPen-Domain Platform for Web-Based Agents
            31:39

            World of Bits: An OPen-Domain Platform for Web-Based Agents

            Tianlin Shi, …

            N2
            N2
            NeurIPS 2023 14 months ago

            Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%

            Analyzing Human Movement on a Planetary Scale
            23:41

            Analyzing Human Movement on a Planetary Scale

            Scott Delp

            N2
            N2
            NeurIPS 2023 14 months ago

            Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%

            Best Arm Identification with Fixed Budget: A Large Deviation Perspective
            04:55

            Best Arm Identification with Fixed Budget: A Large Deviation Perspective

            Po-An Wang, …

            N2
            N2
            NeurIPS 2023 14 months ago

            Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%

            An (unhelpful) guide to selecting the best ASR architecture for your under-resourced language
            16:49

            An (unhelpful) guide to selecting the best ASR architecture for your under-resourced language

            Robbie Jimerson

            N2
            N2
            NeurIPS 2023 14 months ago

            Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%

            Finding Counterfactually Optimal Action Sequences in Continuous State Spaces
            04:47

            Finding Counterfactually Optimal Action Sequences in Continuous State Spaces

            Stratis Tsirtsis, …

            N2
            N2
            NeurIPS 2023 14 months ago

            Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%

            Characterizing Out-of-Distribution Error via Optimal Transport
            05:02

            Characterizing Out-of-Distribution Error via Optimal Transport

            Yuzhe Lu, …

            N2
            N2
            NeurIPS 2023 14 months ago

            Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%

            Interested in talks like this? Follow NeurIPS 2023