Dec 6, 2021
Inspired by BatchNorm, there has been an explosion of normalization layers in deep learning. While recent works show BatchNorm's success can be attributed to a multitude of beneficial properties, it is currently unclear if these properties can help us understand the behavior of alternative normalization techniques as well. More importantly, does the existence/absence of specific properties justify the success/failure of a normalization layer? To resolve these questions, we conduct a thorough analysis of nine recently proposed normalization layers for deep neural networks (DNNs). By evaluating the existence of known properties of BatchNorm in randomly initialized networks that use these alternative normalization layers, we show a general understanding of normalization methods in deep learning can be easily developed. Further, benchmarking the performance of these networks in different configurations of model architecture, batch-size, and learning rate, we find the existence of beneficial properties of BatchNorm in a normalization layer is often highly predictive of the layer's impact on model performance. Overall, our analysis takes us a step closer towards developing a unified understanding of normalization techniques in deep learning and provides a compass for systematically exploring the vast design space of DNN normalization layers.
Neural Information Processing Systems (NeurIPS) is a multi-track machine learning and computational neuroscience conference that includes invited talks, demonstrations, symposia and oral and poster presentations of refereed papers. Following the conference, there are workshops which provide a less formal setting.
Professional recording and live streaming, delivered globally.
Presentations on similar topic, category or speaker