Dan Y. Fu, Simran Arora, Jessica Grogan, Isys Johnson, Evan Sabri Eyuboglu, Armin W. Thomas, Benjamin Spector, Michael Poli, Atri Rudra, Christopher Re · Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture · SlidesLive

Categories

EN

Log in Talk to sales

Next

Livestream will start soon!

Livestream has already ended.

Presentation has not been recorded yet!

SlidesLive

title: Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture

0:00 / 0:00

Report Issue
Settings
Playlists
Bookmarks
Subtitles Off
Playback rate
Quality

Settings
Debug information
Server sl-yoda-v3-stream-001-alpha.b-cdn.net
Subtitles size Medium

Bookmarks

Server
sl-yoda-v3-stream-001-alpha.b-cdn.net
sl-yoda-v3-stream-001-beta.b-cdn.net
1148202645.rsc.cdn77.org
1784416251.rsc.cdn77.org

Subtitles
Off
English

Playback rate

Quality

Subtitles size
Large
Medium
Small

Mode
Video Slideshow
Audio Slideshow
Slideshow
Video

Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture

Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture

Dec 15, 2023

Speakers

Dan Y. Fu

Speaker · 0 followers

Simran Arora

Speaker · 0 followers

Jessica Grogan

Speaker · 0 followers

About

Machine learning models are increasingly being scaled in both sequence length and model dimension to reach longer contexts and better performance. However, existing architectures such as Transformers scale quadratically along both these axes. We ask: are there performant architectures that can scale sub-quadratically along sequence length and model dimension? We introduce Monarch Mixer (M2), a new architecture that uses the same sub-quadratic primitive along both sequence length and model dimens…

Organizer

NeurIPS 2023

Account · 645 followers

Like the format? Trust SlidesLive to capture your next event!

Professional recording and live streaming, delivered globally.

Sharing

Recommended Videos

Presentations on similar topic, category or speaker

Knowledge Distillation Performs Partial Variance Reduction

04:55

Knowledge Distillation Performs Partial Variance Reduction

Watch later

Favorite

Mher Safaryan, …

NeurIPS 2023 16 months ago

HYTREL: Hypergraph-enhanced Tabular Data Representation Learning

05:12

HYTREL: Hypergraph-enhanced Tabular Data Representation Learning

Watch later

Favorite

NeurIPS 2023 16 months ago

Cross-modal Prompts: Adapting Large Pre-trained Models for Audio-Visual Downstream Tasks

05:01

Cross-modal Prompts: Adapting Large Pre-trained Models for Audio-Visual Downstream Tasks

Watch later

Favorite

Haoyi Duan, …

NeurIPS 2023 16 months ago

Semantic Image Synthesis with Unconditional Generator

04:55

Semantic Image Synthesis with Unconditional Generator

Watch later

Favorite

JungWoo Chae, …

NeurIPS 2023 16 months ago

Federated Learning with Client Subsampling, Data Heterogeneity, and Unbounded Smoothness: A New Algorithm and Lower Bounds

05:13

Federated Learning with Client Subsampling, Data Heterogeneity, and Unbounded Smoothness: A New Algorithm and Lower Bounds

Watch later

Favorite

Michael Crawshaw, …

NeurIPS 2023 16 months ago

From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion

04:51

From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion

Watch later

Favorite

Robin San Roman, …

NeurIPS 2023 16 months ago