Adversarial Machine Learning

Kshitij Chandna
Feb 21, 2023
4 min read

I am Kshitij Chandna, a graduate of NYU, and my MS concentrated on Data Analytics and Machine Learning. I am interested in computer vision, especially in adversarial machine learning. During my Computer Vision class, I worked on a project dealing with adversarial images; due to the positive results, I decided to submit my paper for ECCV 2023 and was able to publish it.

Improving Adversarial Robustness by Penalizing Natural Accuracy Current techniques in deep learning are still unable to train adversarially robust classifiers which perform as well as non-robust ones. In this work, we continue to study the space of loss functions, and show that the choice of loss can affect robustness in highly nonintuitive ways. Specifically, we demonstrate that a surprising choice of loss function can, in fact improve adversarial robustness against some attacks. Our loss function encourages accuracy on adversarial examples, and explicitly penalizes accuracy on natural examples. This is inspired by the theoretical and empirical works suggesting a fundamental tradeoff between standard accuracy and adversarial robustness. Our method, NAturally Penalized (NAP) loss, achieves 61.5% robust accuracy on CIFAR-10 with ε=8/255ε=8/255 perturbations in ℓ∞ℓ∞ (against a PGD-60 adversary with 20 random restarts). This improves over the standard PGD defense by over 3%, against other loss functions proposed in the literature. Although TRADES performs better on CIFAR-10 against Auto-Attack, our approach gets better results on CIFAR-100. Our results thus suggest that significant robustness gains are possible by revisiting training techniques, even without additional data.

Modern deep learning is now mature enough to achieve high test accuracy on

many image classification tasks. Here, a long line of research

has arrived at a certain combination of techniques that work well for image

recognition, including architectures (ReLUs, Convolutions, ResNet), optimization

algorithm (SGD and variants, with tuned learning-rate schedules), model size,

data-augmentation, regularization, normalization, batch size, and loss function.

Many of these choices are not unique and we do not have a complete understanding

of why these choices are working best in practice. For example, standard

classification, our true objective is a small 0/1 test loss, but we often optimize

Cross Entropy train loss. We could instead optimize ℓ2 train loss (or any other

surrogate loss), but in practice, we find optimizing Cross Entropy often performs

better1 Similarly, very large, deep networks perform much better than smaller

ones in practice, even though these networks have more than enough capacity

to “overfit” the train set and should be performing worse by classical statistical

intuition. The optimizer is also poorly understood: In practice we

use SGD with learning rates much higher than optimization theory prescribes;

and moreover, “accelerated” methods that optimize faster – such as Adam

– sometimes generalize worse. Nevertheless, despite our incomplete theoretical

understanding, the research community has converged on a methodology which

performs very well for standard classification.

However, the field of adversarially robust classification is not as mature, and

has not yet converged on a training methodology that performs well. The goal of

adversarial robustness is to learn classifiers that are robust to small adversarial

perturbations of the input. Here, it is not clear if the various design choices

that we converged to for standard classification are still the best choices for

robust classification.

Indeed, current advances in adversarial robustness have come through modifying

the training procedure, loss function, architecture,

data generation activation function, pre-training and leveraging

unlabeled data. This research area is not nearly as mature as standard

classification and there are still potentially large robustness gains from rethinking

elements of the deep learning methodology.

In this work, we focus on the choice of loss function, and show that an unconventional

choice of loss function can in fact, significantly improve adversarial

robustness. Concretely, our loss function includes two terms: one which encourages

accuracy on adversarial examples and one which explicitly penalizes accuracy on

natural examples. This is inspired by the empirical and theoretical observations

that there may be a tradeoff between standard accuracy and adversarial accuracy

for trained models. Intuitively, our loss function penalizes standard accuracy

if it is “too good to be true” – i.e., much higher than the adversarial accuracy.

This attempts to forcibly trade off standard accuracy for improved adversarial

accuracy, and in practice, it yields significant gains over existing methods.

The observation that choice of loss affects adversarial robustness is not novel

to our work, and our loss function shares components of existing methods such

as TRADES and MART. Many of these methods are motivated as

“regularizers”, which encourage the network on adversarial inputs to behave

similarly to natural inputs. Our method is conceptually fundamentally different

in explicitly penalizing the classifier’s correct behavior on natural inputs. See

Section 3 for further comparison and discussion with existing methods.

1 Certain losses are theoretically justified for simpler models, for example, as proper

scoring rules or for margin maximizing reasons. But these justifications do not

provably hold for overparameterized models such as modern deep networks.

Our Contribution. Our main contribution is demonstrating that the impact of

loss function on robustness is both large and under-explored. We show that an

“unnatural” loss function, which explicitly penalizes natural accuracy, can, in fact

improve state-of-the-art adversarial robustness: achieving 61.5% robust accuracy

on CIFAR-10 with ε = 8/255 perturbations in ℓ∞, when evaluated against a

60-step PGD attacker with 20 random restarts.

We view our work as showing that the space of reasonable loss functions is perhaps

larger than expected and that large robustness gains can still be attained in this

space. We also present preliminary insights into what properties of our loss cause

it to perform well.

Chandna, Kshitij. "Improving Adversarial Robustness by Penalizing Natural Accuracy." In Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part I, pp. 517-533. Cham: Springer Nature Switzerland, 2023.

Adversarial Machine Learning

Recent Posts

Comments