Jul 12, 2020
We propose an efficient algorithm to train a very deep and thin network with theoretic guarantee. Our method is motivated by model compression, and consists of three stages. In the first stage, we widen the deep thin network and train it until convergence. In the second stage, we use this well trained deep wide network to warm up or initialize the original deep thin network. In the last stage, we train this well initialized deep thin network until convergence. The key ingredient of our method is its second stage, in which the thin network is gradually warmed up by imitating the intermediate outputs of the wide network from bottom to top. We establish theoretical guarantee using mean field analysis. We show that our method is provably more efficient than directly training a deep thin network from scratch. We also conduct empirical evaluations on image classification and language modeling. By training with our approach, ResNet50 can outperform ResNet101 which is normally trained as in the literature, and BERT_BASE can be comparable with BERT_LARGE.
The International Conference on Machine Learning (ICML) is the premier gathering of professionals dedicated to the advancement of the branch of artificial intelligence known as machine learning. ICML is globally renowned for presenting and publishing cutting-edge research on all aspects of machine learning used in closely related areas like artificial intelligence, statistics and data science, as well as important application areas such as machine vision, computational biology, speech recognition, and robotics. ICML is one of the fastest growing artificial intelligence conferences in the world. Participants at ICML span a wide range of backgrounds, from academic and industrial researchers, to entrepreneurs and engineers, to graduate students and postdocs.
Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%
Presentations on similar topic, category or speaker