Samuel Kaufman, Phitchaya Phothilimtha, Yanqi Zhou, Charith Mendis, Sudip Roy, Amit Sabne, Mike Burrows · A Learned Performance Model for a Deep Learning Accelerator · SlidesLive

Kategorie

CS

Přihlásit se Nezávazná poptávka

A Learned Performance Model for a Deep Learning Accelerator

4. Duben 2021

Řečníci

O prezentaci

Accurate hardware performance models are critical to efficient code generation. They can be used by compilers to make heuristic decisions, by superoptimizers as a minimization objective, or by autotuners to find an optimal configuration for a specific program. However, they are difficult to develop because contemporary processors are complex, and the recent proliferation of deep learning accelerators has increased the development burden. We demonstrate a method of learning performance models from a corpus of tensor computation graph programs for a heavily-used deep learning accelerator. We train a neural network over kernel-level sub-graphs from the corpus and find that the learned model outperforms a heavily-optimized analytical performance model used in the production XLA compiler on the tile-size selection task. We contribute a brand new performance model for the XLA fusion autotuner, which reduces tuning time on the hardware accelerator.

Organizátor

O organizátorovi (MLSys 2021)

The Conference on Machine Learning and Systems targets research at the intersection of machine learning and systems. The conference aims to elicit new connections amongst these fields, including identifying best practices and design principles for learning systems, as well as developing novel learning methods and theory tailored to practical machine learning workflows.

Uložení prezentace

Měla by být tato prezentace uložena po dobu 1000 let?

Jak ukládáme prezentace

Sdílení

Doporučená videa

Prezentace na podobné téma, kategorii nebo přednášejícího