Oral: A Learned Performance Model for Tensor Processing Units

Apr 4, 2021

Speakers

About

Accurate hardware performance models are critical to efficient code generation. They can be used by compilers to make heuristic decisions, by superoptimizers as a minimization objective, or by autotuners to find an optimal configuration for a specific program. However, they are difficult to develop because contemporary processors are complex, and the recent proliferation of deep learning accelerators has increased the development burden. We demonstrate a method of learning performance models from a corpus of tensor computation graph programs for a heavily-used deep learning accelerator. We train a neural network over kernel-level sub-graphs from the corpus and find that the learned model outperforms a heavily-optimized analytical performance model used in the production XLA compiler on the tile-size selection task. We contribute a brand new performance model for the XLA fusion autotuner, which reduces tuning time on the hardware accelerator.

Organizer

Categories

About MLSys 2021

The Conference on Machine Learning and Systems targets research at the intersection of machine learning and systems. The conference aims to elicit new connections amongst these fields, including identifying best practices and design principles for learning systems, as well as developing novel learning methods and theory tailored to practical machine learning workflows.

Store presentation

Should this presentation be stored for 1000 years?

How do we store presentations

Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%

Sharing

Recommended Videos

Presentations on similar topic, category or speaker

Interested in talks like this? Follow MLSys 2021