Putting the “Machine” Back in Machine Learning: The Case for Hardware-ML Model Co-design

by · Dec 13, 2019 · 1,101 views ·

NIPS 2019

Machine learning (ML) applications have entered and impacted our lives unlike any other technology advance from the recent past. Indeed, almost every aspect of how we live or interact with others relies on or uses ML for applications ranging from image classification and object detection, to processing multi‐modal and heterogeneous datasets. While the holy grail for judging the quality of a ML model has largely been serving accuracy, and only recently its resource usage, neither of these metrics translate directly to energy efficiency, runtime, or mobile device battery lifetime. This talk will uncover the need for building accurate, platform‐specific power and latency models for convolutional neural networks (CNNs) and efficient hardware-aware CNN design methodologies, thus allowing machine learners and hardware designers to identify not just the best accuracy NN configuration, but also those that satisfy given hardware constraints. Our proposed modeling framework is applicable to both high‐end and mobile platforms and achieves 88.24% accuracy for latency, 88.34% for power, and 97.21% for energy prediction. Using similar predictive models, we demonstrate a novel differentiable neural architecture search (NAS) framework, dubbed Single-Path NAS, that uses one single-path over-parameterized CNN to encode all architectural decisions based on shared convolutional kernel parameters. Single-Path NAS achieves state-of-the-art top-1 ImageNet accuracy (75.62%), outperforming existing mobile NAS methods for similar latency constraints (∼80ms) and finds the final configuration up to 5,000× faster compared to prior work. Combined with our quantized CNNs (Flexible Lightweight CNNs or FLightNNs) that customize precision level in a layer-wise fashion and achieve almost iso-accuracy at 5-10x energy reduction, such a modeling, analysis, and optimization framework is poised to lead to true co-design of hardware and ML model, orders of magnitude faster than state of the art, while satisfying both accuracy and latency or energy constraints.