Аннотация
Carefully crafted, often imperceptible, adversarial perturbations have been
shown to cause state-of-the-art models to yield extremely inaccurate outputs,
rendering them unsuitable for safety-critical application domains. In addition,
recent work has shown that constraining the attack space to a low frequency
regime is particularly effective. Yet, it remains unclear whether this is due
to generally constraining the attack search space or specifically removing high
frequency components from consideration. By systematically controlling the
frequency components of the perturbation, evaluating against the top-placing
defense submissions in the NeurIPS 2017 competition, we empirically show that
performance improvements in both the white-box and black-box transfer settings
are yielded only when low frequency components are preserved. In fact, the
defended models based on adversarial training are roughly as vulnerable to low
frequency perturbations as undefended models, suggesting that the purported
robustness of state-of-the-art ImageNet defenses is reliant upon adversarial
perturbations being high frequency in nature. We do find that under
$\ell_ınfty$ $\epsilon=16/255$, the competition distortion bound, low
frequency perturbations are indeed perceptible. This questions the use of the
$\ell_ınfty$-norm, in particular, as a distortion metric, and, in turn,
suggests that explicitly considering the frequency space is promising for
learning robust models which better align with human perception.
Описание
[1903.00073] On the Effectiveness of Low Frequency Perturbations
Линки и ресурсы
тэги
сообщество