Abstract

We show that there exists an inherent tension between the goal of adversarial robustness and that of standard generalization. Specifically, training robust models may not only be more resource-consuming, but also lead to a reduction of standard accuracy. We demonstrate that this trade-off between the standard accuracy of a model and its robustness to adversarial perturbations provably exists even in a fairly simple and natural setting. These findings also corroborate a similar phenomenon observed in practice. Further, we argue that this phenomenon is a consequence of robust classifiers learning fundamentally different feature representations than standard classifiers. These differences, in particular, seem to result in unexpected benefits: the representations learned by robust models tend to align better with salient data characteristics and human perception.

Description

[1805.12152] Robustness May Be at Odds with Accuracy

Links and resources

URL:
BibTeX key:
tsipras2018robustness
search on:

Comments and Reviews  
(0)

There is no review or comment yet. You can write one!

Tags


Cite this publication