Asher Trockman, Zico Kolter · Mimetic Initialization of Self-Attention Layers · SlidesLive

Categories

EN

Log in Talk to sales

Next

Livestream will start soon!

Livestream has already ended.

Presentation has not been recorded yet!

SlidesLive

title: Mimetic Initialization of Self-Attention Layers

0:00 / 0:00

Report Issue
Settings
Playlists
Bookmarks
Subtitles Off
Playback rate
Quality

Settings
Debug information
Server sl-yoda-v2-stream-005-alpha.b-cdn.net
Subtitles size Medium

Bookmarks

Server
sl-yoda-v2-stream-005-alpha.b-cdn.net
sl-yoda-v2-stream-005-beta.b-cdn.net
1034628162.rsc.cdn77.org
1409346856.rsc.cdn77.org

Subtitles
Off
English

Playback rate

Quality

Subtitles size
Large
Medium
Small

Mode
Video Slideshow
Audio Slideshow
Slideshow
Video

Mimetic Initialization of Self-Attention Layers

Mimetic Initialization of Self-Attention Layers

Jul 25, 2023

Speakers

Asher Trockman

Speaker · 0 followers

Zico Kolter

Speaker · 7 followers

About

It is notoriously difficult to train Transformers on small datasets; typically, large pre-trained models are instead used as the starting point. We explore the weights of such pre-trained Transformers (particularly for vision) to attempt to find reasons for this discrepancy. Surprisingly, we find that simply initializing the weights of self-attention layers so that they "look" more like their pre-trained counterparts allows us to train vanilla Transformers faster and to higher final accuracies,…

Organizer

ICML 2023

Account · 646 followers

Like the format? Trust SlidesLive to capture your next event!

Professional recording and live streaming, delivered globally.

Sharing

Recommended Videos

Presentations on similar topic, category or speaker

Efficient Algorithms for Exact Graph Matching on Correlated Stochastic Block Models with Constant Correlation

05:17

Efficient Algorithms for Exact Graph Matching on Correlated Stochastic Block Models with Constant Correlation

Watch later

Favorite

Joonhyuk Yang, …

ICML 2023 2 years ago

Constant Matters: Fine-grained Error Bound on Differentially Private Continual Observation

05:10

Constant Matters: Fine-grained Error Bound on Differentially Private Continual Observation

Watch later

Favorite

Hendrik Fichtenberger, …

ICML 2023 2 years ago

Evolving Semantic Prototype Improves Generative Zero-Shot Learning

05:23

Evolving Semantic Prototype Improves Generative Zero-Shot Learning

Watch later

Favorite

Shiming Chen, …

ICML 2023 2 years ago

Estimating the Contamination Factor's Distribution in Unsupervised Anomaly Detection

04:45

Estimating the Contamination Factor's Distribution in Unsupervised Anomaly Detection

Watch later

Favorite

Lorenzo Perini, …

ICML 2023 2 years ago

Parameter-Level Soft-Masking for Continual Learning

05:03

Parameter-Level Soft-Masking for Continual Learning

Watch later

Favorite

Tatsuya Konishi, …

ICML 2023 2 years ago

Coarse-to-Fine: a Hierarchical Diffusion Model for Molecule Generation in 3D

04:59

Coarse-to-Fine: a Hierarchical Diffusion Model for Molecule Generation in 3D

Watch later

Favorite

ICML 2023 2 years ago