Stateful Detection of Black-Box Adversarial Attacks

TLDR

This paper develops a defense designed to detect the process of generating adversarial examples, and introduces query blinding, a new class of attacks designed to bypass defenses that rely on such a defense approach.

Abstract

The problem of adversarial examples, evasion attacks on machine learning classifiers, has proven extremely difficult to solve. This is true even in the black-box threat model, as is the case in many practical settings. Here, the classifier is hosted as a remote service and the adversary does not have direct access to the model parameters. This paper argues that in such settings, defenders have a larger space of actions than previously studied. Specifically, we deviate from the implicit assumption made by prior work that a defense must be a stateless function that operates on individual examples, and evaluate the space of stateful defenses. We develop a defense designed to detect the process of generating adversarial examples. By keeping a history of the past queries, a defender can try to identify when a sequence of queries appears to be for the purpose of generating an adversarial example. We then introduce query blinding, a new class of attacks designed to bypass defenses that rely on such a defense approach. We believe that expanding the study of adversarial examples from stateless classifiers to stateful systems is not only more realistic for many black-box settings, but also gives the defender a much-needed advantage in responding to the adversary.