Apr 4, 2021
Speaker · 0 followers
Speaker · 0 followers
Speaker · 0 followers
Speaker · 0 followers
Speaker · 0 followers
Speaker · 0 followers
Speaker · 0 followers
Speaker · 0 followers
Speaker · 0 followers
Data accesses between on- and off-chip memories account for a large fraction of overall energy consumption during inference with deep learning networks. On-chip memory compression can greatly reduce this energy cost as long as it balances the simplicity and low cost of the compression/decompression implementation and with its \emph{effectiveness} in data size reduction. We present Boveda, a simple and effective on-chip lossless memory compression technique for fixed-point precision networks. It reduces data widths by exploiting the value distribution deep learning applications naturally exhibit. Boveda can increase the effective on-chip capacity, reduce off-chip traffic, and/or achieve a desired performance/energy target while using smaller on-chip memories. Boveda can be placed after any memory block in the on-chip memory hierarchy and can work with \textul{any} data-parallel processing units such as the vector-like or the tensorcore units of modern graphics processors, systolic arrays such as that used in the Tensor Processing Unit, and units that process sparse tensors such as those used in the SCNN accelerator. To demonstrate the potential of Boveda, we implement it over (i) SCNN, a state-of-the-art accelerator for sparse networks, (ii) a Tensorcore-like architecture, and (iii) TPU. Boveda reduces memory footprint by 34\% for SCNN and sparse models on top of zero compression. For dense models, Boveda improves compression by 47\%. We also present a prototype FPGA implementation.Data accesses between on- and off-chip memories account for a large fraction of overall energy consumption during inference with deep learning networks. On-chip memory compression can greatly reduce this energy cost as long as it balances the simplicity and low cost of the compression/decompression implementation and with its \emph{effectiveness} in data size reduction. We present Boveda, a simple and effective on-chip lossless memory compression technique for fixed-point precision networks. It…
Account · 159 followers
Category · 10.8k presentations
The Conference on Machine Learning and Systems targets research at the intersection of machine learning and systems. The conference aims to elicit new connections amongst these fields, including identifying best practices and design principles for learning systems, as well as developing novel learning methods and theory tailored to practical machine learning workflows.
Professional recording and live streaming, delivered globally.
Presentations on similar topic, category or speaker
Chunxing Yin, …
Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%
Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%
Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%
Atli Kosson, …
Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%
Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%
Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%