Next
Livestream will start soon!
Livestream has already ended.
Presentation has not been recorded yet!
  • title: Is RLHF More Difficult than Standard RL? A Theoretical Perspective
      0:00 / 0:00
      • Report Issue
      • Settings
      • Playlists
      • Bookmarks
      • Subtitles Off
      • Playback rate
      • Quality
      • Settings
      • Debug information
      • Server sl-yoda-v2-stream-002-alpha.b-cdn.net
      • Subtitles size Medium
      • Bookmarks
      • Server
      • sl-yoda-v2-stream-002-alpha.b-cdn.net
      • sl-yoda-v2-stream-002-beta.b-cdn.net
      • 1001562353.rsc.cdn77.org
      • 1075090661.rsc.cdn77.org
      • Subtitles
      • Off
      • English
      • Playback rate
      • Quality
      • Subtitles size
      • Large
      • Medium
      • Small
      • Mode
      • Video Slideshow
      • Audio Slideshow
      • Slideshow
      • Video
      My playlists
        Bookmarks
          00:00:00
            Is RLHF More Difficult than Standard RL? A Theoretical Perspective
            • Settings
            • Sync diff
            • Quality
            • Settings
            • Server
            • Quality
            • Server

            Is RLHF More Difficult than Standard RL? A Theoretical Perspective

            Dec 10, 2023

            Speakers

            YW

            Yuanhao Wang

            Speaker · 2 followers

            QL

            Qinghua Liu

            Speaker · 0 followers

            CJ

            Chi Jin

            Speaker · 1 follower

            About

            Reinforcement learning from Human Feedback (RLHF) learns from preference signals, while standard Reinforcement Learning (RL) directly learns from reward signals. Preferences arguably contain less information than rewards, which makes preference-based RL seemingly more difficult. This paper theoretically proves that, for a wide range of preference models, we can solve preference-based RL directly using existing algorithms and techniques for reward-based RL, with small or no extra costs. Specifica…

            Organizer

            N2
            N2

            NeurIPS 2023

            Account · 617 followers

            Like the format? Trust SlidesLive to capture your next event!

            Professional recording and live streaming, delivered globally.

            Sharing

            Recommended Videos

            Presentations on similar topic, category or speaker

            Unsupervised Image Denoising with Score Function
            04:58

            Unsupervised Image Denoising with Score Function

            Yutong xie, …

            N2
            N2
            NeurIPS 2023 15 months ago

            Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%

            Geometry-Informed Neural Operator
            05:04

            Geometry-Informed Neural Operator

            Zongyi Li, …

            N2
            N2
            NeurIPS 2023 15 months ago

            Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%

            HYTREL: Hypergraph-enhanced  Tabular Data Representation Learning
            05:12

            HYTREL: Hypergraph-enhanced Tabular Data Representation Learning

            Pei Chen, …

            N2
            N2
            NeurIPS 2023 15 months ago

            Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%

            Learning Topology-Agnostic EEG Representations with Geometry-Aware Modeling
            04:39

            Learning Topology-Agnostic EEG Representations with Geometry-Aware Modeling

            Ke Yi, …

            N2
            N2
            NeurIPS 2023 15 months ago

            Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%

            An Efficient Dataset Condensation Plugin and Its Application to Continual Learning
            04:05

            An Efficient Dataset Condensation Plugin and Its Application to Continual Learning

            Enneng Yang, …

            N2
            N2
            NeurIPS 2023 15 months ago

            Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%

            Unsupervised Protein-Ligand Binding Energy Prediction via Neural Euler's Rotation Equation
            05:07

            Unsupervised Protein-Ligand Binding Energy Prediction via Neural Euler's Rotation Equation

            Wengong Jin, …

            N2
            N2
            NeurIPS 2023 15 months ago

            Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%

            Interested in talks like this? Follow NeurIPS 2023