Next
Livestream will start soon!
Livestream has already ended.
Presentation has not been recorded yet!
  • title: Scissorhands: Exploiting the Persistence of Importance Hypothesis for Cache Compression at Test Time
      0:00 / 0:00
      • Report Issue
      • Settings
      • Playlists
      • Bookmarks
      • Subtitles Off
      • Playback rate
      • Quality
      • Settings
      • Debug information
      • Server sl-yoda-v2-stream-008-alpha.b-cdn.net
      • Subtitles size Medium
      • Bookmarks
      • Server
      • sl-yoda-v2-stream-008-alpha.b-cdn.net
      • sl-yoda-v2-stream-008-beta.b-cdn.net
      • 1159783934.rsc.cdn77.org
      • 1511376917.rsc.cdn77.org
      • Subtitles
      • Off
      • English
      • Playback rate
      • Quality
      • Subtitles size
      • Large
      • Medium
      • Small
      • Mode
      • Video Slideshow
      • Audio Slideshow
      • Slideshow
      • Video
      My playlists
        Bookmarks
          00:00:00
            Scissorhands: Exploiting the Persistence of Importance Hypothesis for Cache Compression at Test Time
            • Settings
            • Sync diff
            • Quality
            • Settings
            • Server
            • Quality
            • Server

            Scissorhands: Exploiting the Persistence of Importance Hypothesis for Cache Compression at Test Time

            Dec 10, 2023

            Speakers

            ZL

            Zichang Liu

            Řečník · 0 sledujících

            AD

            Aditya Desai

            Řečník · 0 sledujících

            FL

            Fangshuo Liao

            Řečník · 0 sledujících

            About

            Large language models(LLMs) have sparked a new wave of exciting AI applications. Hosting these models at scale requires significant memory resources. One crucial memory bottleneck for the deployment stems from the context window. It is commonly recognized that model weights are memory hungry; however, the size of key-value embedding stored during the generation process (KV cache) can easily surpass the model size. The enormous size of the KV cache puts constraints on the inference batch size, wh…

            Organizer

            N2
            N2

            NeurIPS 2023

            Účet · 646 sledujících

            Like the format? Trust SlidesLive to capture your next event!

            Professional recording and live streaming, delivered globally.

            Sharing

            Recommended Videos

            Presentations on similar topic, category or speaker

            SOL: Sampling-based Optimal Linear bounding of arbitrary scalar functions
            04:40

            SOL: Sampling-based Optimal Linear bounding of arbitrary scalar functions

            Yuriy Biktairov, …

            N2
            N2
            NeurIPS 2023 16 months ago

            Pro uložení prezentace do věčného trezoru hlasovalo 0 diváků, což je 0.0 %

            Mindstorms in Natural Language-Based Societies of Mind
            09:07

            Mindstorms in Natural Language-Based Societies of Mind

            Mingchen Zhuge, …

            N2
            N2
            NeurIPS 2023 16 months ago

            Pro uložení prezentace do věčného trezoru hlasovalo 0 diváků, což je 0.0 %

            Bayesian Metric Learning for Uncertainty Quantification in Image Retrieval
            04:41

            Bayesian Metric Learning for Uncertainty Quantification in Image Retrieval

            Frederik Warburg, …

            N2
            N2
            NeurIPS 2023 16 months ago

            Pro uložení prezentace do věčného trezoru hlasovalo 0 diváků, což je 0.0 %

            A benchmark of categorical encoders for binary classification
            04:24

            A benchmark of categorical encoders for binary classification

            Federico Matteucci, …

            N2
            N2
            NeurIPS 2023 16 months ago

            Pro uložení prezentace do věčného trezoru hlasovalo 0 diváků, což je 0.0 %

            Streaming Algorithms and Lower Bounds for Estimating Correlation Clustering Cost
            04:40

            Streaming Algorithms and Lower Bounds for Estimating Correlation Clustering Cost

            Vihan Shah, …

            N2
            N2
            NeurIPS 2023 16 months ago

            Pro uložení prezentace do věčného trezoru hlasovalo 0 diváků, což je 0.0 %

            Panel Discussion
            37:48

            Panel Discussion

            Danielle Belgrave, …

            N2
            N2
            NeurIPS 2023 16 months ago

            Pro uložení prezentace do věčného trezoru hlasovalo 0 diváků, což je 0.0 %

            Interested in talks like this? Follow NeurIPS 2023