Mike Lewis, Younes Belkada, Luke Zettlemoyer · LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale · SlidesLive

Categories

EN

Log in Talk to sales

Next

Livestream will start soon!

Livestream has already ended.

Presentation has not been recorded yet!

SlidesLive

title: LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale

0:00 / 0:00

Report Issue
Settings
Playlists
Bookmarks
Subtitles Off
Playback rate
Quality

Settings
Debug information
Server sl-yoda-v2-stream-003-alpha.b-cdn.net
Subtitles size Medium

Bookmarks

Server
sl-yoda-v2-stream-003-alpha.b-cdn.net
sl-yoda-v2-stream-003-beta.b-cdn.net
1544410162.rsc.cdn77.org
1005514182.rsc.cdn77.org

Subtitles
Off
English

Playback rate

Quality

Subtitles size
Large
Medium
Small

Mode
Video Slideshow
Audio Slideshow
Slideshow
Video

LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale

LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale

Nov 28, 2022

Speakers

Mike Lewis

Speaker · 1 follower

Younes Belkada

Speaker · 0 followers

Luke Zettlemoyer

Speaker · 5 followers

About

Large language models have been widely adopted but require significant GPU memory for inference and finetuning. We develop methods for Int8 matrix multiplication for transformer multi-layer perceptron (MLP) and attention projection layers, which cut the required memory for inference by half while retaining full precision performance. With our method, a 16/32-bit checkpoint can be loaded, converted to Int8, and used immediately without performance degradation – no post-quantization training is re…

Organizer

NeurIPS 2022

Account · 958 followers

Like the format? Trust SlidesLive to capture your next event!

Professional recording and live streaming, delivered globally.

Sharing

Recommended Videos

Presentations on similar topic, category or speaker

Panel Discussion: Challenges and lessons learned in deploying ML time series models

54:56

Panel Discussion: Challenges and lessons learned in deploying ML time series models

Watch later

Favorite

Danielle Belgrave, …

NeurIPS 2022 2 years ago

AZ-whiteness test: a test for signal uncorrelation on spatio-temporal graphs

04:35

AZ-whiteness test: a test for signal uncorrelation on spatio-temporal graphs

Watch later

Favorite

Daniele Zambon, …

NeurIPS 2022 2 years ago

3DOS: Towards Open Set 3D Learning: Benchmarking and Understanding Semantic Novelty Detection on Pointclouds

05:24

3DOS: Towards Open Set 3D Learning: Benchmarking and Understanding Semantic Novelty Detection on Pointclouds

Watch later

Favorite

Antonio Alliegro, …

NeurIPS 2022 2 years ago

Do we still need inductive biases after Transformer language models?

29:44

Do we still need inductive biases after Transformer language models?

Watch later

Favorite

NeurIPS 2022 2 years ago

Pyramid Dynamic Inference: Encouraging Faster Inference via Early Exit Boosting

05:00

Pyramid Dynamic Inference: Encouraging Faster Inference via Early Exit Boosting

Watch later

Favorite

Ershad Banijamali, …

NeurIPS 2022 2 years ago

MyoChallenge: Goal: Learn Physiological Dexterity

11:20

MyoChallenge: Goal: Learn Physiological Dexterity

Watch later

Favorite

Yuval Tassa, …

NeurIPS 2022 2 years ago