SIGUA: Forgetting May Make Learning with Noisy Labels More Robust

Jul 12, 2020



Recent studies reveal that deep networks gradually memorize individual data while fitting distributions of data. Hence, when facing corrupted labels, all existing methods inevitably suffer from generalization degeneration and have to be early stopped. In this paper, we propose a versatile approach called scaled stochastic gradient ascent (S2GA) to deal with corrupted labels. S2GA comes from sample selection and goes beyond: in every mini-batch, it uses gradient decent on good data, while it uses scaled stochastic gradient ascent on bad data rather than drops those data, where the goodness and badness are w.r.t. a base learning method. It is advantageous over early stopping, since it can continue to fit distributions of data and it has the ability of actively forgetting individual data that is memorized by mistakes. We demonstrate via experiments that S2GA robustifies two representative base learning methods, and the performance boost is often significant.



