HBONet

Official implementation of our HBONet architecture as described in HBONet: Harmonious Bottleneck on Two Orthogonal Dimensions (ICCV'19) by Duo Li, Aojun Zhou and Anbang Yao on ILSVRC2012 benchmark with PyTorch framework.

We integrate our HBO modules into the state-of-the-art MobileNetV2 backbone as a reference case. Baseline models of MobileNetV2 counterparts are available in my repository mobilenetv2.pytorch.

Requirements

Dependencies

Pretrained models

The following statistics are reported on the ILSVRC2012 validation set with single center crop testing.

HBONet with a spectrum of width multipliers (Table 2)

Architecture MFLOPs Top-1 / Top-5 Acc. (%)
HBONet 1.0 305 73.1 / 91.0
HBONet 0.8 205 71.3 / 89.7
HBONet 0.5 96 67.0 / 86.9
HBONet 0.35 61 62.4 / 83.7
HBONet 0.25 37 57.3 / 79.8
HBONet 0.1 14 41.5 / 65.7

HBONet 0.8 with a spectrum of input resolutions (Table 3)

Architecture MFLOPs Top-1 / Top-5 Acc. (%)
HBONet 0.8 224x224 205 71.3 / 89.7
HBONet 0.8 192x192 150 70.0 / 89.2
HBONet 0.8 160x160 105 68.3 / 87.8
HBONet 0.8 128x128 68 65.5 / 85.9
HBONet 0.8 96x96 39 61.4 / 83.0

HBONet 0.35 with a spectrum of input resolutions (Table 4)

Architecture MFLOPs Top-1 / Top-5 Acc. (%)
HBONet 0.35 224x224 61 62.4 / 83.7
HBONet 0.35 192x192 45 60.9 / 82.6
HBONet 0.35 160x160 31 58.6 / 80.7
HBONet 0.35 128x128 21 55.2 / 78.0
HBONet 0.35 96x96 12 50.3 / 73.8

HBONet with different width multipliers and different input resolutions (Table 5)

Architecture MFLOPs Top-1 / Top-5 Acc. (%)
HBONet 0.5 224x224 98 67.7 / 87.4
HBONet 0.6 192x192 108 67.3 / 87.3

HBONet 0.25 variants with different down-sampling and up-sampling rates (Table 6)

Architecture MFLOPs Top-1 / Top-5 Acc. (%)
HBONet(2x) 0.25 44 58.3 / 80.6
HBONet(4x) 0.25 45 59.3 / 81.4
HBONet(8x) 0.25 45 58.2 / 80.4

Taking HBONet 1.0 as an example, pretrained models can be easily imported using the following lines and then finetuned for other vision tasks or utilized in resource-aware platforms. (To create variant models in Table 5 & 6, it is necessary to make slight modifications following the instructions in the docstrings of the model file in advance.)

from models.imagenet import hbonet

net = hbonet()
net.load_state_dict(torch.load('pretrained/hbonet_1_0.pth'))

Usage

Training

Configuration to reproduce our reported results, totally the same as mobilenetv2.pytorch for fair comparison.

python imagenet.py \
    -a hbonet \
    -d <path-to-ILSVRC2012-data> \
    --epochs 150 \
    --lr-decay cos \
    --lr 0.05 \
    --wd 4e-5 \
    -c <path-to-save-checkpoints> \
    --width-mult <width-multiplier> \
    --input-size <input-resolution> \
    -j <num-workers>

Test

python imagenet.py \
    -a hbonet \
    -d <path-to-ILSVRC2012-data> \
    --weight <pretrained-pth-file> \
    --width-mult <width-multiplier> \
    --input-size <input-resolution> \
    -e

Citations

If you find our work useful in your research, please consider citing:

@InProceedings{Li_2019_ICCV,
author = {Li, Duo and Zhou, Aojun and Yao, Anbang},
title = {HBONet: Harmonious Bottleneck on Two Orthogonal Dimensions},
booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
month = {Oct},
year = {2019}
}