Jul 24, 2023
Explanation methods aim to interpret the behavior of machine learning models and thus build trust between users and models. However, recent work has shown the vulnerability of explanation methods to adversarial perturbations which may cause security concerns in high-stakes domains. In this paper, we investigate when we should pay attention to robust explanations and what they cost. We prove that the robustness of the explanation is determined by the model's robustness to be explained; thus, we can have robust explanations for free for a robust model. To have robust explanations for a non-robust model, composing the original model with a kernel is proved to be an effective way that returns strictly more robust explanations. Nevertheless, we argue that this also incurs robustness-faithfulness trade-off, that is when an explanation becomes more robust, it might also become less faithful which an explanation method is desired to be. This argument holds for any model. We are the first to introduce this trade-off and theoretically prove its existence for SmoothGrad. Theoretical findings are verified by empirical evidence on six state-of-the-art explanation methods and four backbones.Explanation methods aim to interpret the behavior of machine learning models and thus build trust between users and models. However, recent work has shown the vulnerability of explanation methods to adversarial perturbations which may cause security concerns in high-stakes domains. In this paper, we investigate when we should pay attention to robust explanations and what they cost. We prove that the robustness of the explanation is determined by the model's robustness to be explained; thus, we c…
Professional recording and live streaming, delivered globally.
Presentations on similar topic, category or speaker