Sebastian Jaszczur, Aakanksha Chowdhery, Afroz Mohiuddin, Łukasz Kaiser, Wojciech Gajewski, Henryk Michalewski, Jonni Kanerva · Sparse is Enough in Scaling Transformers · SlidesLive

Categories

EN

Log in Talk to sales

Next

Livestream will start soon!

Livestream has already ended.

Presentation has not been recorded yet!

SlidesLive

title: Sparse is Enough in Scaling Transformers

0:00 / 0:00

Report Issue
Settings
Playlists
Bookmarks
Subtitles Off
Playback rate
Quality

Settings
Debug information
Server sl-yoda-v2-stream-008-alpha.b-cdn.net
Subtitles size Medium

Bookmarks

Server
sl-yoda-v2-stream-008-alpha.b-cdn.net
sl-yoda-v2-stream-008-beta.b-cdn.net
1159783934.rsc.cdn77.org
1511376917.rsc.cdn77.org

Subtitles
Off
English

Playback rate

Quality

Subtitles size
Large
Medium
Small

Mode
Video Slideshow
Audio Slideshow
Slideshow
Video

Sparse is Enough in Scaling Transformers

Sparse is Enough in Scaling Transformers

Dec 6, 2021

Speakers

Sebastian Jaszczur

Speaker · 0 followers

Aakanksha Chowdhery

Speaker · 0 followers

Afroz Mohiuddin

Speaker · 0 followers

About

Large Transformer models yield impressive results on many tasks, but are expensive to train, or even fine-tune, and so slow at decoding that their use and study becomes out of reach. We address this problem by leveraging sparsity. We study sparse variants for all layers in the Transformer and propose Scaling Transformers, a family of next generation Transformer models that use sparse layers to scale efficiently and decode much faster than the standard Transformer as we scale up the model size. S…

Organizer

NeurIPS 2021

Account · 1.9k followers

About NeurIPS 2021

Neural Information Processing Systems (NeurIPS) is a multi-track machine learning and computational neuroscience conference that includes invited talks, demonstrations, symposia and oral and poster presentations of refereed papers. Following the conference, there are workshops which provide a less formal setting.

Like the format? Trust SlidesLive to capture your next event!

Professional recording and live streaming, delivered globally.

Sharing

Recommended Videos

Presentations on similar topic, category or speaker

A²-Net: Learning Attribute-Aware Hash Codes for Large-Scale Fine-Grained Image Retrieval

11:15

A²-Net: Learning Attribute-Aware Hash Codes for Large-Scale Fine-Grained Image Retrieval

Watch later

Favorite

Xiu-Shen Wei, …

NeurIPS 2021 3 years ago

Appendix: Proofs and Derivations

19:22

Appendix: Proofs and Derivations

Watch later

Favorite

NeurIPS 2021 3 years ago

Temporal Transductive Inference for Few-Shot Video Object Segmentation

02:54

Temporal Transductive Inference for Few-Shot Video Object Segmentation

Watch later

Favorite

Mennatullah Siam, …

NeurIPS 2021 3 years ago

SLOE: A Faster Statistical Inference in High-Dimensional Logistic Regression

08:58

SLOE: A Faster Statistical Inference in High-Dimensional Logistic Regression

Watch later

Favorite

Steve Yadlowsky, …

NeurIPS 2021 3 years ago

Self-Supervised GANs with Label Augmentation

08:52

Self-Supervised GANs with Label Augmentation

Watch later

Favorite

NeurIPS 2021 3 years ago

Towards mental time travel: a hierarchical memory for RL

05:21

Towards mental time travel: a hierarchical memory for RL

Watch later

Favorite

Andrew Lampinen, …

NeurIPS 2021 3 years ago