Next
Livestream will start soon!
Livestream has already ended.
Presentation has not been recorded yet!
  • title: Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture
      0:00 / 0:00
      • Report Issue
      • Settings
      • Playlists
      • Bookmarks
      • Subtitles Off
      • Playback rate
      • Quality
      • Settings
      • Debug information
      • Server sl-yoda-v3-stream-001-alpha.b-cdn.net
      • Subtitles size Medium
      • Bookmarks
      • Server
      • sl-yoda-v3-stream-001-alpha.b-cdn.net
      • sl-yoda-v3-stream-001-beta.b-cdn.net
      • 1148202645.rsc.cdn77.org
      • 1784416251.rsc.cdn77.org
      • Subtitles
      • Off
      • English
      • Playback rate
      • Quality
      • Subtitles size
      • Large
      • Medium
      • Small
      • Mode
      • Video Slideshow
      • Audio Slideshow
      • Slideshow
      • Video
      My playlists
        Bookmarks
          00:00:00
            Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture
            • Settings
            • Sync diff
            • Quality
            • Settings
            • Server
            • Quality
            • Server

            Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture

            Dec 15, 2023

            Speakers

            DYF

            Dan Y. Fu

            Speaker · 0 followers

            SA

            Simran Arora

            Speaker · 0 followers

            JG

            Jessica Grogan

            Speaker · 0 followers

            About

            Machine learning models are increasingly being scaled in both sequence length and model dimension to reach longer contexts and better performance. However, existing architectures such as Transformers scale quadratically along both these axes. We ask: are there performant architectures that can scale sub-quadratically along sequence length and model dimension? We introduce Monarch Mixer (M2), a new architecture that uses the same sub-quadratic primitive along both sequence length and model dimens…

            Organizer

            N2
            N2

            NeurIPS 2023

            Account · 645 followers

            Like the format? Trust SlidesLive to capture your next event!

            Professional recording and live streaming, delivered globally.

            Sharing

            Recommended Videos

            Presentations on similar topic, category or speaker

            Knowledge Distillation Performs Partial Variance Reduction
            04:55

            Knowledge Distillation Performs Partial Variance Reduction

            Mher Safaryan, …

            N2
            N2
            NeurIPS 2023 16 months ago

            Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%

            HYTREL: Hypergraph-enhanced  Tabular Data Representation Learning
            05:12

            HYTREL: Hypergraph-enhanced Tabular Data Representation Learning

            Pei Chen, …

            N2
            N2
            NeurIPS 2023 16 months ago

            Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%

            Cross-modal Prompts: Adapting Large Pre-trained Models for Audio-Visual Downstream Tasks
            05:01

            Cross-modal Prompts: Adapting Large Pre-trained Models for Audio-Visual Downstream Tasks

            Haoyi Duan, …

            N2
            N2
            NeurIPS 2023 16 months ago

            Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%

            Semantic Image Synthesis with Unconditional Generator
            04:55

            Semantic Image Synthesis with Unconditional Generator

            JungWoo Chae, …

            N2
            N2
            NeurIPS 2023 16 months ago

            Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%

            Federated Learning with Client Subsampling, Data Heterogeneity, and Unbounded Smoothness: A New Algorithm and Lower Bounds
            05:13

            Federated Learning with Client Subsampling, Data Heterogeneity, and Unbounded Smoothness: A New Algorithm and Lower Bounds

            Michael Crawshaw, …

            N2
            N2
            NeurIPS 2023 16 months ago

            Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%

            From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion
            04:51

            From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion

            Robin San Roman, …

            N2
            N2
            NeurIPS 2023 16 months ago

            Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%

            Interested in talks like this? Follow NeurIPS 2023