Pretrained image and video models for Pytorch (Work in progress)

The goal of this repo is:

Updates specific to this fork: This repo is my own personal fork of this popular model zoo for PyTorch. Since my work focuses on action recognition in videos, I plan to accumulate standard model architectures trained on the popular video datasets such as Moments in Time, Kinetics, Something-Something, etc., as well models specifically designed for action recognition. For example, you can load 3DResNet50 pretrained on Moments in Time with the following:

model_name = 'resnet3d50'
model = pretorched.__dict__[model_name](num_classes=339, pretrained='moments')
model.eval()

Not every architecture will be trained on every dataset, but I will do the best I can to include all that I have accumulated. I will try to maintain the same API where appropriate, but may decided to make modifications to specifically handle multi-frame nature of videos.

News:

Summary

Installation

  1. python3 with anaconda
  2. pytorch with/out CUDA

Install from repo

  1. git clone https://github.com/alexandonian/pretorched-x.git
  2. cd pretorched-x
  3. python setup.py install

Quick examples

import pretorched
print(pretorched.model_names)
> ['fbresnet152', 'bninception', 'resnext101_32x4d', 'resnext101_64x4d', 'inceptionv4', 'inceptionresnetv2', 'alexnet', 'densenet121', 'densenet169', 'densenet201', 'densenet161', 'resnet18', 'resnet34', 'resnet50', 'resnet101', 'resnet152', 'inceptionv3', 'squeezenet1_0', 'squeezenet1_1', 'vgg11', 'vgg11_bn', 'vgg13', 'vgg13_bn', 'vgg16', 'vgg16_bn', 'vgg19_bn', 'vgg19', 'nasnetalarge', 'nasnetamobile', 'cafferesnet101', 'senet154',  'se_resnet50', 'se_resnet101', 'se_resnet152', 'se_resnext50_32x4d', 'se_resnext101_32x4d', 'cafferesnet101', 'polynet', 'pnasnet5large']
print(pretorched.pretrained_settings['nasnetalarge'])
> {'imagenet': {'url': 'http://pretorched-x.csail.mit.edu/models/nasnetalarge-a1897284.pth', 'input_space': 'RGB', 'input_size': [3, 331, 331], 'input_range': [0, 1], 'mean': [0.5, 0.5, 0.5], 'std': [0.5, 0.5, 0.5], 'num_classes': 1000}, 'imagenet+background': {'url': 'http://pretorched-x.csail.mit.edu/models/nasnetalarge-a1897284.pth', 'input_space': 'RGB', 'input_size': [3, 331, 331], 'input_range': [0, 1], 'mean': [0.5, 0.5, 0.5], 'std': [0.5, 0.5, 0.5], 'num_classes': 1001}}
model_name = 'nasnetalarge' # could be fbresnet152 or inceptionresnetv2
model = pretorched.__dict__[model_name](num_classes=1000, pretrained='imagenet')
model.eval()

Note: By default, models will be downloaded to your $HOME/.torch folder. You can modify this behavior using the $TORCH_MODEL_ZOO variable as follow: export TORCH_MODEL_ZOO="/local/pretorched

import torch
import pretorched.utils as utils

load_img = utils.LoadImage()

# transformations depending on the model
# rescale, center crop, normalize, and others (ex: ToBGR, ToRange255)
tf_img = utils.TransformImage(model)

path_img = 'data/cat.jpg'

input_img = load_img(path_img)
input_tensor = tf_img(input_img)         # 3x400x225 -> 3x299x299 size may differ
input_tensor = input_tensor.unsqueeze(0) # 3x299x299 -> 1x3x299x299
input = torch.autograd.Variable(input_tensor,
    requires_grad=False)

output_logits = model(input) # 1x1000
output_features = model.features(input) # 1x14x14x2048 size may differ
output_logits = model.logits(output_features) # 1x1000

Few use cases

Compute imagenet logits

$ python examples/imagenet_logits.py -h
> nasnetalarge, resnet152, inceptionresnetv2, inceptionv4, ...
$ python examples/imagenet_logits.py -a nasnetalarge --path_img data/cat.png
> 'nasnetalarge': data/cat.png' is a 'tiger cat'

Compute imagenet evaluation metrics

$ python examples/imagenet_eval.py /local/common-data/imagenet_2012/images -a nasnetalarge -b 20 -e
> * Acc@1 92.693, Acc@5 96.13

Evaluation on imagenet

Accuracy on validation set (single model)

Results were obtained using (center cropped) images of the same size than during the training process.

Model Version Acc@1 Acc@5
PNASNet-5-Large Tensorflow 82.858 96.182
PNASNet-5-Large Our porting 82.736 95.992
NASNet-A-Large Tensorflow 82.693 96.163
NASNet-A-Large Our porting 82.566 96.086
SENet154 Caffe 81.32 95.53
SENet154 Our porting 81.304 95.498
PolyNet Caffe 81.29 95.75
PolyNet Our porting 81.002 95.624
InceptionResNetV2 Tensorflow 80.4 95.3
InceptionV4 Tensorflow 80.2 95.3
SE-ResNeXt101_32x4d Our porting 80.236 95.028
SE-ResNeXt101_32x4d Caffe 80.19 95.04
InceptionResNetV2 Our porting 80.170 95.234
InceptionV4 Our porting 80.062 94.926
DualPathNet107_5k Our porting 79.746 94.684
ResNeXt101_64x4d Torch7 79.6 94.7
DualPathNet131 Our porting 79.432 94.574
DualPathNet92_5k Our porting 79.400 94.620
DualPathNet98 Our porting 79.224 94.488
SE-ResNeXt50_32x4d Our porting 79.076 94.434
SE-ResNeXt50_32x4d Caffe 79.03 94.46
Xception Keras 79.000 94.500
ResNeXt101_64x4d Our porting 78.956 94.252
Xception Our porting 78.888 94.292
ResNeXt101_32x4d Torch7 78.8 94.4
SE-ResNet152 Caffe 78.66 94.46
SE-ResNet152 Our porting 78.658 94.374
ResNet152 Pytorch 78.428 94.110
SE-ResNet101 Our porting 78.396 94.258
SE-ResNet101 Caffe 78.25 94.28
ResNeXt101_32x4d Our porting 78.188 93.886
FBResNet152 Torch7 77.84 93.84
SE-ResNet50 Caffe 77.63 93.64
SE-ResNet50 Our porting 77.636 93.752
DenseNet161 Pytorch 77.560 93.798
ResNet101 Pytorch 77.438 93.672
FBResNet152 Our porting 77.386 93.594
InceptionV3 Pytorch 77.294 93.454
DenseNet201 Pytorch 77.152 93.548
DualPathNet68b_5k Our porting 77.034 93.590
CaffeResnet101 Caffe 76.400 92.900
CaffeResnet101 Our porting 76.200 92.766
DenseNet169 Pytorch 76.026 92.992
ResNet50 Pytorch 76.002 92.980
DualPathNet68 Our porting 75.868 92.774
DenseNet121 Pytorch 74.646 92.136
VGG19_BN Pytorch 74.266 92.066
NASNet-A-Mobile Tensorflow 74.0 91.6
NASNet-A-Mobile Our porting 74.080 91.740
ResNet34 Pytorch 73.554 91.456
BNInception Our porting 73.522 91.560
VGG16_BN Pytorch 73.518 91.608
VGG19 Pytorch 72.080 90.822
VGG16 Pytorch 71.636 90.354
VGG13_BN Pytorch 71.508 90.494
VGG11_BN Pytorch 70.452 89.818
ResNet18 Pytorch 70.142 89.274
VGG13 Pytorch 69.662 89.264
VGG11 Pytorch 68.970 88.746
SqueezeNet1_1 Pytorch 58.250 80.800
SqueezeNet1_0 Pytorch 58.108 80.428
Alexnet Pytorch 56.432 79.194

Notes:

Beware, the accuracy reported here is not always representative of the transferable capacity of the network on other tasks and datasets. You must try them all! :P

Reproducing results

Please see Compute imagenet validation metrics

Documentation

Available models

NASNet*

Source: TensorFlow Slim repo

FaceBook ResNet*

Source: Torch7 repo of FaceBook

There are a bit different from the ResNet* of torchvision. ResNet152 is currently the only one available.

Caffe ResNet*

Source: Caffe repo of KaimingHe

Inception*

Source: TensorFlow Slim repo and Pytorch/Vision repo for inceptionv3

BNInception

Source: Trained with Caffe by Xiong Yuanjun

ResNeXt*

Source: ResNeXt repo of FaceBook

DualPathNetworks

Source: MXNET repo of Chen Yunpeng

The porting has been made possible by Ross Wightman in his PyTorch repo.

As you can see here DualPathNetworks allows you to try different scales. The default one in this repo is 0.875 meaning that the original input size is 256 before croping to 224.

'imagenet+5k' means that the network has been pretrained on imagenet5k before being finetuned on imagenet1k.

Xception

Source: Keras repo

The porting has been made possible by T Standley.

SENet*

Source: Caffe repo of Jie Hu

PNASNet*

Source: TensorFlow Slim repo

PolyNet

Source: Caffe repo of the CUHK Multimedia Lab

TorchVision

Source: Pytorch/Vision repo

(inceptionv3 included in Inception*)

Model API

Once a pretrained model has been loaded, you can use it that way.

Important note: All image must be loaded using PIL which scales the pixel values between 0 and 1.

model.input_size

Attribut of type list composed of 3 numbers:

Example:

model.input_space

Attribut of type str representating the color space of the image. Can be RGB or BGR.

model.input_range

Attribut of type list composed of 2 numbers:

Example:

model.mean

Attribut of type list composed of 3 numbers which are used to normalize the input image (substract "color-channel-wise").

Example:

model.std

Attribut of type list composed of 3 numbers which are used to normalize the input image (divide "color-channel-wise").

Example:

model.features

/!\ work in progress (may not be available)

Method which is used to extract the features from the image.

Example when the model is loaded using fbresnet152:

print(input_224.size())            # (1,3,224,224)
output = model.features(input_224)
print(output.size())               # (1,2048,1,1)

# print(input_448.size())          # (1,3,448,448)
output = model.features(input_448)
# print(output.size())             # (1,2048,7,7)

model.logits

/!\ work in progress (may not be available)

Method which is used to classify the features from the image.

Example when the model is loaded using fbresnet152:

output = model.features(input_224)
print(output.size())               # (1,2048, 1, 1)
output = model.logits(output)
print(output.size())               # (1,1000)

model.forward

Method used to call model.features and model.logits. It can be overwritten as desired.

Note: A good practice is to use model.__call__ as your function of choice to forward an input to your model. See the example bellow.

# Without model.__call__
output = model.forward(input_224)
print(output.size())      # (1,1000)

# With model.__call__
output = model(input_224)
print(output.size())      # (1,1000)

model.last_linear

Attribut of type nn.Linear. This module is the last one to be called during the forward pass.

Example when the model is loaded using fbresnet152:

print(input_224.size())            # (1,3,224,224)
output = model.features(input_224)
print(output.size())               # (1,2048,1,1)
output = model.logits(output)
print(output.size())               # (1,1000)

# fine tuning
dim_feats = model.last_linear.in_features # =2048
nb_classes = 4
model.last_linear = nn.Linear(dim_feats, nb_classes)
output = model(input_224)
print(output.size())               # (1,4)

# features extraction
model.last_linear = pretrained.utils.Identity()
output = model(input_224)
print(output.size())               # (1,2048)

Reproducing

Hand porting of ResNet152

th pretrainedmodels/fbresnet/resnet152_dump.lua
python pretrainedmodels/fbresnet/resnet152_load.py

Automatic porting of ResNeXt

https://github.com/clcarwin/convert_torch_to_pytorch

Hand porting of NASNet, InceptionV4 and InceptionResNetV2

https://github.com/alexandonian/tensorflow-model-zoo.torch

Acknowledgement

Thanks to the deep learning community and especially to the contributers of the pytorch ecosystem.