Jul 24, 2023
Speaker · 0 followers
We propose to minimize a generic differentiable loss function with L_1 penalty with a redundant reparametrization and straightforward stochastic gradient descent. Our proposal is the direct generalization of a series of previous ideas that the L_1 penalty may be equivalent to a differentiable reparametrization with weight decay. We prove that the proposed method, spred, is an exact solver of L_1 and that the reparametrization trick is completely “benign" for a generic nonconvex function. Practically, we demonstrate the usefulness of the method in (1) training sparse neural networks to perform gene selection tasks, which involves finding relevant features in a very high dimensional space, and (2) neural network compression task, to which previous attempts at applying the L_1-penalty have been unsuccessful. Conceptually, our result bridges the gap between the sparsity in deep learning and conventional statistical learning.We propose to minimize a generic differentiable loss function with L_1 penalty with a redundant reparametrization and straightforward stochastic gradient descent. Our proposal is the direct generalization of a series of previous ideas that the L_1 penalty may be equivalent to a differentiable reparametrization with weight decay. We prove that the proposed method, spred, is an exact solver of L_1 and that the reparametrization trick is completely “benign" for a generic nonconvex function. Practic…
Professional recording and live streaming, delivered globally.
Presentations on similar topic, category or speaker
Yao Zhao, …
Yang Cai, …