I am Kshitij Chandna, a graduate of NYU, and my MS concentrated on Data Analytics and Machine Learning. I am interested in computer vision, especially in adversarial machine learning. During my Computer Vision class, I worked on a project dealing with adversarial images; due to the positive results, I decided to submit my paper for ECCV 2023 and was able to publish it.
Improving Adversarial Robustness by Penalizing Natural Accuracy Current techniques in deep learning are still unable to train adversarially robust classifiers which perform as well as non-robust ones. In this work, we continue to study the space of loss functions, and show that the choice of loss can affect robustness in highly nonintuitive ways. Specifically, we demonstrate that a surprising choice of loss function can, in fact improve adversarial robustness against some attacks. Our loss function encourages accuracy on adversarial examples, and explicitly penalizes accuracy on natural examples. This is inspired by the theoretical and empirical works suggesting a fundamental tradeoff between standard accuracy and adversarial robustness. Our method, NAturally Penalized (NAP) loss, achieves 61.5% robust accuracy on CIFAR-10 with ε=8/255ε=8/255 perturbations in ℓ∞ℓ∞ (against a PGD-60 adversary with 20 random restarts). This improves over the standard PGD defense by over 3%, against other loss functions proposed in the literature. Although TRADES performs better on CIFAR-10 against Auto-Attack, our approach gets better results on CIFAR-100. Our results thus suggest that significant robustness gains are possible by revisiting training techniques, even without additional data.
Modern deep learning is now mature enough to achieve high test accuracy on
many image classification tasks. Here, a long line of research
has arrived at a certain combination of techniques that work well for image
recognition, including architectures (ReLUs, Convolutions, ResNet), optimization
algorithm (SGD and variants, with tuned learning-rate schedules), model size,
data-augmentation, regularization, normalization, batch size, and loss function.
Many of these choices are not unique and we do not have a complete understanding
of why these choices are working best in practice. For example, standard
classification, our true objective is a small 0/1 test loss, but we often optimize
Cross Entropy train loss. We could instead optimize ℓ2 train loss (or any other
surrogate loss), but in practice, we find optimizing Cross Entropy often performs
better1 Similarly, very large, deep networks perform much better than smaller
ones in practice, even though these networks have more than enough capacity
to “overfit” the train set and should be performing worse by classical statistical
intuition. The optimizer is also poorly understood: In practice we
use SGD with learning rates much higher than optimization theory prescribes;
and moreover, “accelerated” methods that optimize faster – such as Adam
– sometimes generalize worse. Nevertheless, despite our incomplete theoretical
understanding, the research community has converged on a methodology which
performs very well for standard classification.
However, the field of adversarially robust classification is not as mature, and
has not yet converged on a training methodology that performs well. The goal of
adversarial robustness is to learn classifiers that are robust to small adversarial
perturbations of the input. Here, it is not clear if the various design choices
that we converged to for standard classification are still the best choices for
robust classification.
Indeed, current advances in adversarial robustness have come through modifying
the training procedure, loss function, architecture,
data generation activation function, pre-training and leveraging
unlabeled data. This research area is not nearly as mature as standard
classification and there are still potentially large robustness gains from rethinking
elements of the deep learning methodology.
In this work, we focus on the choice of loss function, and show that an unconventional
choice of loss function can in fact, significantly improve adversarial
robustness. Concretely, our loss function includes two terms: one which encourages
accuracy on adversarial examples and one which explicitly penalizes accuracy on
natural examples. This is inspired by the empirical and theoretical observations
that there may be a tradeoff between standard accuracy and adversarial accuracy
for trained models. Intuitively, our loss function penalizes standard accuracy
if it is “too good to be true” – i.e., much higher than the adversarial accuracy.
This attempts to forcibly trade off standard accuracy for improved adversarial
accuracy, and in practice, it yields significant gains over existing methods.
The observation that choice of loss affects adversarial robustness is not novel
to our work, and our loss function shares components of existing methods such
as TRADES and MART. Many of these methods are motivated as
“regularizers”, which encourage the network on adversarial inputs to behave
similarly to natural inputs. Our method is conceptually fundamentally different
in explicitly penalizing the classifier’s correct behavior on natural inputs. See
Section 3 for further comparison and discussion with existing methods.
1 Certain losses are theoretically justified for simpler models, for example, as proper
scoring rules or for margin maximizing reasons. But these justifications do not
provably hold for overparameterized models such as modern deep networks.
Our Contribution. Our main contribution is demonstrating that the impact of
loss function on robustness is both large and under-explored. We show that an
“unnatural” loss function, which explicitly penalizes natural accuracy, can, in fact
improve state-of-the-art adversarial robustness: achieving 61.5% robust accuracy
on CIFAR-10 with ε = 8/255 perturbations in ℓ∞, when evaluated against a
60-step PGD attacker with 20 random restarts.
We view our work as showing that the space of reasonable loss functions is perhaps
larger than expected and that large robustness gains can still be attained in this
space. We also present preliminary insights into what properties of our loss cause
it to perform well.
Chandna, Kshitij. "Improving Adversarial Robustness by Penalizing Natural Accuracy." In Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part I, pp. 517-533. Cham: Springer Nature Switzerland, 2023.
Comments