Detecting Adversarial Examples via Neural Fingerprinting


This is code that implements Neural Fingerprinting, a technique to detect adversarial examples.

This accompanies the paper Detecting Adversarial Examples via Neural Fingerprinting, Sumanth Dathathri(*), Stephan Zheng(*), Richard Murray and Yisong Yue, 2018 (* = equal contribution), which can be found here:

If you use this code or work, please cite:

  title  = {Detecting Adversarial Examples via Neural Fingerprinting},
  author={Dathathri, Sumanth and Zheng, Stephan and Murray, Richard and Yue, Yisong},
  year   = {2018}
  eprint = {1803.03870}
  ee     = {}

To clone the repository, run:

git clone
cd neural-fingerprinting


Neural Fingerprinting achieves near-perfect detection rates on MNIST, CIFAR and MiniImageNet-20.

ROC-AUC scores

roc_cifar.png ROC curves for detection of different attacks on CIFAR.

Requirements and Installation

We have tested this codebase with the following dependencies (we cannot guarantee compatibility with other versions).

To install these dependencies, run:

# PyTorch: find detailed instructions on [](
pip install torch
pip install torchvision

# TF: find detailed instructions on [](
pip install keras
pip install tensorflow-gpu

# nn_transfer
git clone
cd nn-transfer
pip install .

pip install sklearn

This codebase relies on third-party implementations for adversarial attacks and code to transfer generated attacks from Tensorflow to PyTorch.


To train and evaluate models with fingerprints, use the launcher script, which contains example calls to run the code.

The flags that can be set for the launcher are:

./ dataset train attack eval grid num_dx eps epoch_for_eval


For instance, the following command trains a convolutional neural network for MNIST with 10 fingerprints with epsilon = 0.1, and evaluates the model after 10 epochs of training:

./ mnist train attack eval nogrid 10 0.1 10

Running training, attacks and evaluation

  1. To train a model with fingerprints:

mkdir -p $LOGDIR
mkdir -p $DATADIR


python $NAME/ \
--batch-size 128 \
--test-batch-size 128 \
--epochs $NUM_EPOCHS \
--lr 0.01 \
--momentum 0.9 \
--seed 0 \
--log-interval 10 \
--log-dir $LOGDIR \
--data-dir $DATADIR \
--eps=$EPS \
--num-dx=$NUMDX \
--num-class=10 \
  1. Creating adversarial attacks for the model after 10 epochs of training:
python $NAME/ \
--attack "all" \
--ckpt $LOGDIR/ckpt/state_dict-ep_$EPOCH.pth \
--log-dir $ADV_EX_DIR \
--batch-size 128
  1. Evaluating model
mkdir -p $EVAL_LOGDIR

python $NAME/ \
--batch-size 128 \
--epochs 100 \
--lr 0.001 \
--momentum 0.9 \
--seed 0 \
--log-interval 10 \
--ckpt $LOGDIR/ckpt/state_dict-ep_$EPOCH.pth \
--log-dir $EVAL_LOGDIR \
--fingerprint-dir $LOGDIR \
--adv-ex-dir $ADV_EX_DIR \
--data-dir $DATADIR \
--eps=$eps \
--num-dx=$numdx \
--num-class=10 \