Mostafa Dehghani, Josip Djolonga, Basil Mustafa, Piotr Padlewski, Jonathan Heek, Justin Gilmer, Andreas Steiner, Mathilde Caron, Robert Geirhos, Ibrahim Alabdulmohsin, Rodolphe Jenatton, Lucas Beyer, Michael Tschannen, Anurag Arnab, Xiao Wang, Carlos Riquelme, Matthias Minderer, Joan Puigcerver, Utku Evci, Manoj Kumar, Sjoerd van Steenkiste, Gamaleldin F. Elsayed, Aravindh Mahendran, Fisher Yu, Avital Oliver, Fantine Huot, Jasmijn Bastings, Mark Collier, Alexey Gritsenko, Vighnesh N Birodkar, Cristina Vasconcelos, Yi Tay, Thomas Mensink, Alexander Kolesnikov, Filip Pavetic, Dustin Tran, Thomas Kipf, Mario Lucic, Xiaohua Zhai, Daniel Keysers, Jeremiah Harmsen, Neil Houlsby · Scaling Vision Transformers to 22 Billion Parameters · SlidesLive

Kategorie

CS

Přihlásit se Kontaktujte nás

Další

Živý přenos začne již brzy!

Živý přenos již skončil.

Prezentace ještě nebyla nahrána!

SlidesLive

title: Scaling Vision Transformers to 22 Billion Parameters

0:00 / 0:00

Nahlásit chybu
Nastavení
Playlisty
Záložky
Titulky Off
Rychlost přehrávání
Kvalita

Nastavení
Debug informace
Server sl-yoda-v2-stream-009-alpha.b-cdn.net
Velikost titulků Střední

Záložky

Server
sl-yoda-v2-stream-009-alpha.b-cdn.net
sl-yoda-v2-stream-009-beta.b-cdn.net
1766500541.rsc.cdn77.org
1441886916.rsc.cdn77.org

Titulky
Off
English

Rychlost přehrávání

Kvalita

Velikost titulků
Velké
Střední
Malé

Mode
Video Slideshow
Audio Slideshow
Slideshow
Video

Scaling Vision Transformers to 22 Billion Parameters

Scaling Vision Transformers to 22 Billion Parameters

24. července 2023

Řečníci

Mostafa Dehghani

Řečník · 0 sledujících

Josip Djolonga

Řečník · 0 sledujících

Basil Mustafa

Řečník · 0 sledujících

O prezentaci

The scaling of Transformers has driven break-through capabilities for language models. At present, the largest large language models (LLMs) contain upwards of 100B parameters. Vision Transformers (ViT) have introduced the same architecture to image and video modeling, but these have not yet been successfully scaled to nearly the same degree; the largest dense ViT contains 4B parameters (Chen et al., 2022). We present a recipe for highly efficient and stable training of a 22B-parameter ViT (ViT-2…

Organizátor

ICML 2023

Účet · 657 sledujících

Baví vás formát? Nechte SlidesLive zachytit svou akci!

Profesionální natáčení a streamování po celém světě.

Sdílení

Doporučená videa

Prezentace na podobné téma, kategorii nebo přednášejícího

Minding Language Models' (Lack of) Theory of Mind: A Plug-and-Play Multi-Character Belief Tracker

09:56

Minding Language Models' (Lack of) Theory of Mind: A Plug-and-Play Multi-Character Belief Tracker

Zhlédnout později

Oblíbené

Melanie Sclar, …

ICML 2023 2 years ago

On the Global Convergence of Risk-Averse Policy Gradient Methods with Expected Conditional Risk Measures

05:05

On the Global Convergence of Risk-Averse Policy Gradient Methods with Expected Conditional Risk Measures

Zhlédnout později

Oblíbené

ICML 2023 2 years ago

Emergent learning that outperforms global objectives

24:36

Emergent learning that outperforms global objectives

Zhlédnout později

Oblíbené

Timoleon Moraitis

ICML 2023 2 years ago

The Pulse of Ethical Machine Learning in Health

1:06:51

The Pulse of Ethical Machine Learning in Health

Zhlédnout později

Oblíbené

Marzyeh Ghassemi

ICML 2023 2 years ago

The Pulse of Ethical Machine Learning in Health

29:54

The Pulse of Ethical Machine Learning in Health

Zhlédnout později

Oblíbené

Marzyeh Ghassemi

ICML 2023 2 years ago

Variational Sparse Inverse Cholesky Approximation for Latent Gaussian Processes via Double Kullback-Leibler Minimization

05:15

Variational Sparse Inverse Cholesky Approximation for Latent Gaussian Processes via Double Kullback-Leibler Minimization

Zhlédnout později

Oblíbené

ICML 2023 2 years ago