Dec 5, 2023
Deep learning (DL) models are trained on sampled data, where the distribution of training data differs from that of real-world data (\emph{i.e.}, the distribution shift), which reduces the model robustness. Various testing techniques have been proposed, including distribution-unaware and distribution-aware methods. However, distribution-unaware testing lacks effectiveness by not explicitly considering the distribution of test cases and may generate redundant errors (within the same distribution). Distribution-aware testing techniques primarily focus on generating test cases that follow the training distribution, missing out-of-distribution data that may also be valid and should be considered in the testing process. In this paper, we propose a novel distribution-guided approach for generating \textit{valid} test cases with \textit{diverse} distributions, which can better evaluate the model robustness (\emph{i.e.}, generating hard-to-detect errors) and enhance the model robustness (\emph{i.e.}, enriching training data). Unlike existing testing techniques that optimize individual test cases, \textit{DistXplore} optimizes test suites that represent specific distributions. To evaluate and enhance the model robustness, we design two metrics: \textit{distribution difference}, which maximizes the similarity in distribution between two different classes of data to generate hard-to-detect errors, and \textit{distribution diversity}, which generates test cases with diverse distributions to enhance the model robustness by enriching the training data. To evaluate the effectiveness of \textit{DistXplore} in model evaluation and model enhancement, we compare \textit{DistXplore} with 9 state-of-the-art baselines on 8 models across 4 datasets. The evaluation results show that \textit{DistXplore} not only detects a larger number of errors (\emph{e.g.}, 2X+ on average), but also identifies more hard-to-detect errors (\emph{e.g.}, 12.1%+ on average)\; Furthermore, \textit{DistXplore} achieves a higher improvement in empirical robustness (\emph{e.g.}, 5.3% more accuracy improvement than the baselines on average).
Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%
Presentations on similar topic, category or speaker