A repository for training provably robust neural networks by optimizing convex outer bounds on the adversarial polytope. Created by Eric Wong and Zico Kolter. Link to the original arXiv paper. The method has been further extended to be fully modular, scalable, and use cascades to improve robust error. Check out our new paper on arXiv: Scaling provable adversarial defenses.
models/
folder in the
repository. You can install this repository with
pip install convex_adversarial
If you wish to have the version of code that reflects the first paper, use
pip install convex_adversal=0.2
, or clone the 0.2 release on Github.
The package contains the following functions:
robust_loss(net, epsilon, X, y, l1_proj=None, l1_type='exact', bounded_input=False, size_average=True)
computes a robust loss function for a given ReLU network net
and l1
radius epsilon
for examples X
and their labels y
. You can use
this as a drop in replacement for, say, nn.CrossEntropyLoss()
, and is
equivalent to the objective of Equation 14 in the original paper.
To use the scalable version, specify a projection dimension with l1_proj
and set l1_type
to median
. robust_loss_parallel
computes the same objective as robust_loss
, but
only for a single example and usingdual_net = DualNetwork(net, X, epsilon, l1_proj=None, l1_type='exact', bounded_input=False)
is a PyTorch module that computes the layer-wise upper and lower bounds for
all activations in the network. This is useful if you are only interested
in the bounds and not the robust loss, and corresponds to Algorithm
1 in the paper. dual_net(c)
is the module's forward pass which computes the lower
bound on the primal problem described in the paper for a given
objective vector c. This corresponds to computing objective of Theorem 1 in
the paper (Equation 5). While networks are capable of representing highly complex functions. For example, with today's networks it is an easy task to achieve 99% accuracy on the MNIST digit recognition dataset, and we can quickly train a small network that can accurately predict that the following image is a 7.
However, the versatility of neural networks comes at a cost: these networks are highly susceptible to small perturbations, or adversarial attacks (e.g. the fast gradient sign method and projected gradient descent)! While most of us can recognize that the following image is still a 7, the same network that could correctly classify the above image instead classifies the following image as a 3.
While this is a relatively harmless example, one can easily think of situations where such adversarial perturbations can be dangerous and costly (e.g. autonomous driving).
Robust networks are networks that are trained to protect against any sort of adversarial perturbation. Specifically, for any seen training example, the network is robust if it is impossible to cause the network to incorrectly classify the example by adding a small perturbation.
The short version: we use the dual of a convex relaxation of the network over the adversarial polytope to lower bound the output. This lower bound can be expressed as another deep network with the same model parameters, and optimizing this lower bound allows us to guarantee robustness of the network.
The long version: see our original paper, Provable defenses against adversarial examples via the convex outer adversarial polytope.
For our updated version which is scalable, modular, and achieves even better robust performance, see our new paper, Scaling provable adversarial defenses.
We illustrate the power of training robust networks in the following two scenarios: 2D toy case for a visualization, and on the MNIST dataset. More experiments are in the paper.
To illustrate the difference, consider a binary classification task on 2D space, separating red dots from blue dots. Optimizing a neural network in the usual fashion gives us the following classifier on the left, and our robust method gives the classifier on the right. The squares around each example represent the adversarial region of perturbations.
For the standard classifier, a number of the examples have perturbation regions that contain both red and blue. These examples are susceptible to adversarial attacks that will flip the output of the neural network. On the other hand, the robust network has all perturbation regions fully contained in the either red or blue, and so this network is robust: we are guaranteed that there is no possible adversarial perturbation to flip the label of any example.
As mentioned before, it is easy to fool networks trained on the MNIST dataset when using attacks such as the fast gradient sign method (FGS) and projected gradient descent (PGD). We observe that PGD can almost always fool the MNIST trained network.
Base error | FGS error | PGD Error | Robust Error | |
---|---|---|---|---|
Original | 1.1% | 50.0% | 81.7% | 100% |
Robust | 1.8% | 3.9% | 4.1% | 5.8% |
On the other hand, the robust network is significantly less affected by these attacks. In fact, when optimizing the robust loss, we can additionally calculate a robust error which gives an provable upper bound on the error caused by any adversarial perturbation. In this case, the robust network has a robust error of 5.8%, and so we are guaranteed that no adversarial attack can ever get an error rate of larger than 5.8%. In comparison, the robust error of the standard network is 100%. More results on HAR, Fashion-MNIST, and SVHN can be found in the paper. Results for the scalable version with random projections on residual networks and on the CIFAR10 dataset can be found in our second paper.
The package currently has dual operators for the following constrained input
spaces and layers. These are defined in dual_inputs.py
and dual_layers.py
.
InfBall
: L-infinity ball constraint on the inputInfBallBounded
: L-infinity ball constraint on the input, with additional
bounding box constraints (works for [0,1] box constraints). InfBallProj
: L-infinity ball constraint using Cauchy random projectionsInfBallProjBounded
: L-infinity ball constraint using Cauchy random
projections, with additional bounding box constraints (works for [0,1] box
constraints)L2Ball
: L-2 ball constraint on the inputL2BallProj
: L-2 ball constraint using Normal random projectionsDualLinear
: linear, fully connected layersDualConv2d
: 2d convolutional layersDualReshape
: reshaping layers, e.g. flattening dimensionsDualReLU
: ReLU activationsDualReLUProj
: ReLU activations using Cauchy random projectionsDualDense
: Dense layers, for skip connectionsDualBatchNorm2d
: 2d batch-norm layers, assuming a fixed mean and varianceIdentity
: Identity operator, e.g. for some ResNet skip connectionsDue to the modularity of the implementation, it is easy to extend the methodology to additional dual layers. A dual input or dual layer can be implemented by filling in the following signature:
class DualObject(nn.Module, metaclass=ABCMeta):
@abstractmethod
def __init__(self):
""" Initialize a dual layer by initializing the variables needed to
compute this layer's contribution to the upper and lower bounds.
In the paper, if this object is at layer i, this is initializing `h'
with the required cached values when nu[i]=I and nu[i]=-I.
"""pass
@abstractmethod
def apply(self, dual_layer):
""" Advance cached variables initialized in this class by the given
dual layer. """
raise NotImplementedError
@abstractmethod
def bounds(self):
""" Return this layers contribution to the upper and lower bounds. In
the paper, this is the `h' upper bound where nu is implicitly given by
c=I and c=-I. """
raise NotImplementedError
@abstractmethod
def objective(self, *nus):
""" Return this layers contribution to the objective, given some
backwards pass. In the paper, this is the `h' upper bound evaluated on a
the given nu variables.
If this is layer i, then we get as input nu[k] through nu[i].
So non-residual layers will only need nu[-1] and nu[-2]. """
raise NotImplementedError
class DualLayer(DualObject):
@abstractmethod
def forward(self, *xs):
""" Given previous inputs, apply the affine layer (forward pass) """
raise NotImplementedError
@abstractmethod
def T(self, *xs):
""" Given previous inputs, apply the transposed affine layer
(backward pass) """
raise NotImplementedError
To create sequential PyTorch modules with skip connections, we provide a
generalization of the PyTorch module nn.Sequential
. Specifically, we have a
DenseSequential
module that is identical to nn.Sequential
but also takes
in Dense' modules. The
Dense' modules consist of m
layers, and applies
these m
layers to the last m
outputs of the network.
As an example, the
following is a simple two layer network with a single skip connection.
The first layer is identical to a normal nn.Conv2d
layer. The second layer has
a skip connection from the layer with 16 filters and also a normal convolutional
layer from the previous layer with 32 filters.
residual_block = DenseSequential([
Dense(nn.Conv2d(16,32,...)),
nn.ReLU(),
Dense(nn.Conv2d(16,32,...), None, nn.Conv2d(32,32,...))
])