Jinwon Lee · Kernel and Graph Optimization for DL Model Execution · SlidesLive

Categories

EN

Log in Get an estimate

Kernel and Graph Optimization for DL Model Execution

Dec 13, 2019

Speakers

About

There is increasing demand to deploy diverse deep learning models on edge devices. However, fully optimizing the execution of such models on resource-constrained HWs (e.g., CPU, DSP, NPU) is intrinsically challenging and often requires significant manual efforts. In this talk, we introduce our Morpheus team’s efforts to address these challenges. First, we optimize the performance of DL model execution in kernel level (e.g., a convolution operator). From a large number of possible kernel configurations (e.g., tiling, unrolling, vectorizations), the fastest kernel is quickly identified through machine learning algorithms we developed while binary codes are automatically generated by TVM or Halide compilers. Second, we further optimize the performance of DL model execution in graph-level (e.g., end-to-end network). Since each kernel or operator is often connected as a graph in deep learning models, the compute scheduling of such graphs significantly affects the end-to-end performance, especially memory I/O. We solve two problems in this context. First, for potentially complex topologies on edge devices with limited total memory, we solve the minimum memory usage problem, thus characterizing and enabling deployment of all feasible networks on a given device. Second, for any hardware with combined Tightly Coupled Memory (TCM) and more expensive external memory (e.g. DRAM), we solve the minimum external memory access problem, which optimizes hardware usage efficiency in I/O-bound conditions. For both problems we show efficient algorithms that are complete solutions, and improved results over heuristic methods. Finally, we will discuss our future directions to optimize deep learning model execution.

Organizer

Categories

About NIPS 2019

Neural Information Processing Systems (NeurIPS) is a multi-track machine learning and computational neuroscience conference that includes invited talks, demonstrations, symposia and oral and poster presentations of refereed papers. Following the conference, there are workshops which provide a less formal setting.

Like the format? Trust SlidesLive to capture your next event!

Professional recording and live streaming, delivered globally.

Sharing

Recommended Videos

Presentations on similar topic, category or speaker