Zichang Liu, Aditya Desai, Fangshuo Liao, Weitao Wang, Victor Xie, Zhaozhuo Xu, Anastasios Kyrillidis, Anshumali Shrivastava · Scissorhands: Exploiting the Persistence of Importance Hypothesis for Cache Compression at Test Time · SlidesLive

Categories

EN

Log in Talk to sales

Next

Livestream will start soon!

Livestream has already ended.

Presentation has not been recorded yet!

SlidesLive

title: Scissorhands: Exploiting the Persistence of Importance Hypothesis for Cache Compression at Test Time

0:00 / 0:00

Report Issue
Settings
Playlists
Bookmarks
Subtitles Off
Playback rate
Quality

Settings
Debug information
Server sl-yoda-v2-stream-008-alpha.b-cdn.net
Subtitles size Medium

Bookmarks

Server
sl-yoda-v2-stream-008-alpha.b-cdn.net
sl-yoda-v2-stream-008-beta.b-cdn.net
1159783934.rsc.cdn77.org
1511376917.rsc.cdn77.org

Subtitles
Off
English

Playback rate

Quality

Subtitles size
Large
Medium
Small

Mode
Video Slideshow
Audio Slideshow
Slideshow
Video

Scissorhands: Exploiting the Persistence of Importance Hypothesis for Cache Compression at Test Time

Scissorhands: Exploiting the Persistence of Importance Hypothesis for Cache Compression at Test Time

Dec 10, 2023

Speakers

Zichang Liu

Řečník · 0 sledujících

Aditya Desai

Řečník · 0 sledujících

Fangshuo Liao

Řečník · 0 sledujících

About

Large language models(LLMs) have sparked a new wave of exciting AI applications. Hosting these models at scale requires significant memory resources. One crucial memory bottleneck for the deployment stems from the context window. It is commonly recognized that model weights are memory hungry; however, the size of key-value embedding stored during the generation process (KV cache) can easily surpass the model size. The enormous size of the KV cache puts constraints on the inference batch size, wh…

Organizer

NeurIPS 2023

Účet · 646 sledujících

Like the format? Trust SlidesLive to capture your next event!

Professional recording and live streaming, delivered globally.

Sharing

Recommended Videos

Presentations on similar topic, category or speaker

SOL: Sampling-based Optimal Linear bounding of arbitrary scalar functions

04:40

SOL: Sampling-based Optimal Linear bounding of arbitrary scalar functions

Zhlédnout později

Oblíbené

Yuriy Biktairov, …

NeurIPS 2023 16 months ago

Mindstorms in Natural Language-Based Societies of Mind

09:07

Mindstorms in Natural Language-Based Societies of Mind

Zhlédnout později

Oblíbené

Mingchen Zhuge, …

NeurIPS 2023 16 months ago

Bayesian Metric Learning for Uncertainty Quantification in Image Retrieval

04:41

Bayesian Metric Learning for Uncertainty Quantification in Image Retrieval

Zhlédnout později

Oblíbené

Frederik Warburg, …

NeurIPS 2023 16 months ago

A benchmark of categorical encoders for binary classification

04:24

A benchmark of categorical encoders for binary classification

Zhlédnout později

Oblíbené

Federico Matteucci, …

NeurIPS 2023 16 months ago

Streaming Algorithms and Lower Bounds for Estimating Correlation Clustering Cost

04:40

Streaming Algorithms and Lower Bounds for Estimating Correlation Clustering Cost

Zhlédnout později

Oblíbené

Vihan Shah, …

NeurIPS 2023 16 months ago

Panel Discussion

37:48

Panel Discussion

Zhlédnout později

Oblíbené

Danielle Belgrave, …

NeurIPS 2023 16 months ago