Jinwon Lee · Kernel and Graph Optimization for DL Model Execution · SlidesLive

Kategorien

DE

Anmelden Kostenvoranschlag

Kernel and Graph Optimization for DL Model Execution

Dez 13, 2019

Sprecher:innen

Über

There is increasing demand to deploy diverse deep learning models on edge devices. However, fully optimizing the execution of such models on resource-constrained HWs (e.g., CPU, DSP, NPU) is intrinsically challenging and often requires significant manual efforts. In this talk, we introduce our Morpheus team’s efforts to address these challenges. First, we optimize the performance of DL model execution in kernel level (e.g., a convolution operator). From a large number of possible kernel configurations (e.g., tiling, unrolling, vectorizations), the fastest kernel is quickly identified through machine learning algorithms we developed while binary codes are automatically generated by TVM or Halide compilers. Second, we further optimize the performance of DL model execution in graph-level (e.g., end-to-end network). Since each kernel or operator is often connected as a graph in deep learning models, the compute scheduling of such graphs significantly affects the end-to-end performance, especially memory I/O. We solve two problems in this context. First, for potentially complex topologies on edge devices with limited total memory, we solve the minimum memory usage problem, thus characterizing and enabling deployment of all feasible networks on a given device. Second, for any hardware with combined Tightly Coupled Memory (TCM) and more expensive external memory (e.g. DRAM), we solve the minimum external memory access problem, which optimizes hardware usage efficiency in I/O-bound conditions. For both problems we show efficient algorithms that are complete solutions, and improved results over heuristic methods. Finally, we will discuss our future directions to optimize deep learning model execution.

Organisator

Kategorien

Über NIPS 2019

Neural Information Processing Systems (NeurIPS) is a multi-track machine learning and computational neuroscience conference that includes invited talks, demonstrations, symposia and oral and poster presentations of refereed papers. Following the conference, there are workshops which provide a less formal setting.

Präsentation speichern

Soll diese Präsentation für 1000 Jahre gespeichert werden?

Wie speichern wir Präsentationen?

Freigeben

Empfohlene Videos

Präsentationen, deren Thema, Kategorie oder Sprecher:in ähnlich sind