DistXplore: Distribution-Guided Testing for Evaluating and Enhancing Deep Learning Systems

Dec 5, 2023

Speakers

About

Deep learning (DL) models are trained on sampled data, where the distribution of training data differs from that of real-world data (\emph{i.e.}, the distribution shift), which reduces the model robustness. Various testing techniques have been proposed, including distribution-unaware and distribution-aware methods. However, distribution-unaware testing lacks effectiveness by not explicitly considering the distribution of test cases and may generate redundant errors (within the same distribution). Distribution-aware testing techniques primarily focus on generating test cases that follow the training distribution, missing out-of-distribution data that may also be valid and should be considered in the testing process. In this paper, we propose a novel distribution-guided approach for generating \textit{valid} test cases with \textit{diverse} distributions, which can better evaluate the model robustness (\emph{i.e.}, generating hard-to-detect errors) and enhance the model robustness (\emph{i.e.}, enriching training data). Unlike existing testing techniques that optimize individual test cases, \textit{DistXplore} optimizes test suites that represent specific distributions. To evaluate and enhance the model robustness, we design two metrics: \textit{distribution difference}, which maximizes the similarity in distribution between two different classes of data to generate hard-to-detect errors, and \textit{distribution diversity}, which generates test cases with diverse distributions to enhance the model robustness by enriching the training data. To evaluate the effectiveness of \textit{DistXplore} in model evaluation and model enhancement, we compare \textit{DistXplore} with 9 state-of-the-art baselines on 8 models across 4 datasets. The evaluation results show that \textit{DistXplore} not only detects a larger number of errors (\emph{e.g.}, 2X+ on average), but also identifies more hard-to-detect errors (\emph{e.g.}, 12.1%+ on average)\; Furthermore, \textit{DistXplore} achieves a higher improvement in empirical robustness (\emph{e.g.}, 5.3% more accuracy improvement than the baselines on average).

Organizer

Categories

Store presentation

Should this presentation be stored for 1000 years?

How do we store presentations

Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%

Sharing

Recommended Videos

Presentations on similar topic, category or speaker

Interested in talks like this? Follow ESEC-FSE