Next
Livestream will start soon!
Livestream has already ended.
Presentation has not been recorded yet!
  • title: Mimetic Initialization of Self-Attention Layers
      0:00 / 0:00
      • Report Issue
      • Settings
      • Playlists
      • Bookmarks
      • Subtitles Off
      • Playback rate
      • Quality
      • Settings
      • Debug information
      • Server sl-yoda-v2-stream-005-alpha.b-cdn.net
      • Subtitles size Medium
      • Bookmarks
      • Server
      • sl-yoda-v2-stream-005-alpha.b-cdn.net
      • sl-yoda-v2-stream-005-beta.b-cdn.net
      • 1034628162.rsc.cdn77.org
      • 1409346856.rsc.cdn77.org
      • Subtitles
      • Off
      • English
      • Playback rate
      • Quality
      • Subtitles size
      • Large
      • Medium
      • Small
      • Mode
      • Video Slideshow
      • Audio Slideshow
      • Slideshow
      • Video
      My playlists
        Bookmarks
          00:00:00
            Mimetic Initialization of Self-Attention Layers
            • Settings
            • Sync diff
            • Quality
            • Settings
            • Server
            • Quality
            • Server

            Mimetic Initialization of Self-Attention Layers

            Jul 25, 2023

            Speakers

            AT

            Asher Trockman

            Speaker · 0 followers

            ZK

            Zico Kolter

            Speaker · 7 followers

            About

            It is notoriously difficult to train Transformers on small datasets; typically, large pre-trained models are instead used as the starting point. We explore the weights of such pre-trained Transformers (particularly for vision) to attempt to find reasons for this discrepancy. Surprisingly, we find that simply initializing the weights of self-attention layers so that they "look" more like their pre-trained counterparts allows us to train vanilla Transformers faster and to higher final accuracies,…

            Organizer

            I2
            I2

            ICML 2023

            Account · 646 followers

            Like the format? Trust SlidesLive to capture your next event!

            Professional recording and live streaming, delivered globally.

            Sharing

            Recommended Videos

            Presentations on similar topic, category or speaker

            Efficient Algorithms for Exact Graph Matching on Correlated Stochastic Block Models with Constant Correlation
            05:17

            Efficient Algorithms for Exact Graph Matching on Correlated Stochastic Block Models with Constant Correlation

            Joonhyuk Yang, …

            I2
            I2
            ICML 2023 2 years ago

            Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%

            Constant Matters: Fine-grained Error Bound on Differentially Private Continual Observation
            05:10

            Constant Matters: Fine-grained Error Bound on Differentially Private Continual Observation

            Hendrik Fichtenberger, …

            I2
            I2
            ICML 2023 2 years ago

            Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%

            Evolving Semantic Prototype Improves Generative Zero-Shot Learning
            05:23

            Evolving Semantic Prototype Improves Generative Zero-Shot Learning

            Shiming Chen, …

            I2
            I2
            ICML 2023 2 years ago

            Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%

            Estimating the Contamination Factor's Distribution in Unsupervised Anomaly Detection
            04:45

            Estimating the Contamination Factor's Distribution in Unsupervised Anomaly Detection

            Lorenzo Perini, …

            I2
            I2
            ICML 2023 2 years ago

            Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%

            Parameter-Level Soft-Masking for Continual Learning
            05:03

            Parameter-Level Soft-Masking for Continual Learning

            Tatsuya Konishi, …

            I2
            I2
            ICML 2023 2 years ago

            Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%

            Coarse-to-Fine: a Hierarchical Diffusion Model for Molecule Generation in 3D
            04:59

            Coarse-to-Fine: a Hierarchical Diffusion Model for Molecule Generation in 3D

            Bo Qiang, …

            I2
            I2
            ICML 2023 2 years ago

            Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%

            Interested in talks like this? Follow ICML 2023