Jul 24, 2023
Speaker · 0 followers
Speaker · 0 followers
Speaker · 0 followers
Speaker · 1 follower
Speaker · 1 follower
Speaker · 1 follower
Speaker · 0 followers
In a backdoor attack, an adversary adds maliciously constructed (“backdoor”) examples into a training set to make the resulting modelvulnerable to manipulation. Defending against such attacks—e.g., by finding and removing the backdoor examples—typically involves viewing these examples as outliers and using techniques from robust statistics to detect and remove them.In this work, we present a new perspective on this task, that is, without structural information on the training data distribution,backdoor attacks are indistinguishable from naturally-occuring features in the data—and thus impossible to “detect” in a general sense.Then, to circumvent this impossibility, we assume that a backdoor attack corresponds to the strongest feature in the training data.Under this assumption—which we make formal—we develop a new framework for detecting backdoor attacks. Our framework naturally gives rise to a detection algorithm that comes with theoretical guarantees, and is effective in practice.In a backdoor attack, an adversary adds maliciously constructed (“backdoor”) examples into a training set to make the resulting modelvulnerable to manipulation. Defending against such attacks—e.g., by finding and removing the backdoor examples—typically involves viewing these examples as outliers and using techniques from robust statistics to detect and remove them.In this work, we present a new perspective on this task, that is, without structural information on the training data distribution,b…
Professional recording and live streaming, delivered globally.
Presentations on similar topic, category or speaker
Chenlin Meng, …
Ailin Deng, …
Man Zhou, …