A Learned Performance Model for a Deep Learning Accelerator

4. Duben 2021

Řečníci

O prezentaci

Accurate hardware performance models are critical to efficient code generation. They can be used by compilers to make heuristic decisions, by superoptimizers as a minimization objective, or by autotuners to find an optimal configuration for a specific program. However, they are difficult to develop because contemporary processors are complex, and the recent proliferation of deep learning accelerators has increased the development burden. We demonstrate a method of learning performance models from a corpus of tensor computation graph programs for a heavily-used deep learning accelerator. We train a neural network over kernel-level sub-graphs from the corpus and find that the learned model outperforms a heavily-optimized analytical performance model used in the production XLA compiler on the tile-size selection task. We contribute a brand new performance model for the XLA fusion autotuner, which reduces tuning time on the hardware accelerator.

Organizátor

O organizátorovi (MLSys 2021)

The Conference on Machine Learning and Systems targets research at the intersection of machine learning and systems. The conference aims to elicit new connections amongst these fields, including identifying best practices and design principles for learning systems, as well as developing novel learning methods and theory tailored to practical machine learning workflows.

Uložení prezentace

Měla by být tato prezentace uložena po dobu 1000 let?

Jak ukládáme prezentace

Pro uložení prezentace do věčného trezoru hlasovalo 0 diváků, což je 0.0 %

Sdílení

Doporučená videa

Prezentace na podobné téma, kategorii nebo přednášejícího

Zajímají Vás podobná videa? Sledujte MLSys 2021