Dynamic Data Fault Localization for Deep Neural Networks

Dec 7, 2023

Speakers

About

The rich datasets have empowered various deep learning (DL) applications, leading to remarkable success in many fields. However, accompanying with these benefits, data faults hidden in the datasets could result in DL applications behaving unpredictably and even cause massive monetary and life losses. To alleviate this problem, in this paper, we propose a dynamic data fault localization approach, namely DFauLo, to locate the mislabeled and noisy data in the deep learning datasets. DFauLo is inspired by the conventional mutation-based code fault localization, but utilizes the differences between DNN mutants to amplify and identify the potential data faults. Specifically, it first generates multiple DNN model mutants of the original trained DNN model, extracts features from these mutants, and maps the extracted features into a suspiciousness score indicating the probability of the given data being a data fault. Moreover, DFauLo is the first dynamic data fault localization technique, prioritizing the suspected data based on user feedback, and providing the generalizability to unseen data faults during training. To validate DFauLo, we extensively evaluate it on 26 cases with various fault types, data types, and model structures. We also evaluate DFauLo on three widely-used benchmark datasets. The results show that DFauLo outperforms the state-of-the-art techniques in almost all cases and locates hundreds of different types of real data faults in benchmark datasets.

Organizer

Categories

Store presentation

Should this presentation be stored for 1000 years?

How do we store presentations

Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%

Sharing

Recommended Videos

Presentations on similar topic, category or speaker

Interested in talks like this? Follow ESEC-FSE