Conditional Probability Models for Deep Image Compression

[Paper] [Citation]

TensorFlow implementation of Conditional Probability Models for Deep Image Compression, published in CVPR 2018.


Notes about naming in code vs. paper


Plot Kodak

NOTE: Inference only works on CUDA-compatible GPUs.

To do inference, use the following command

python ../ckpts MODEL_ID DATASET --save_ours

where MODEL_ID is one of

and DATASET is either the path to a directory of png files or an escaped glob (e.g., some/images/\*/\*.jpg). All images readable with PIL should be supported.

This will save outputs in ckpts/MODEL_ID\ DATASET/imgs and display the mean bpp and MS-SSIM on console. Detailed measures per image are written to ckpts/MODEL_ID\ DATASET/measures.csv. Note that some images may be padded.

Encoding to bitstream

By default, will use cross entropy to estimate the actual bitrate. In our experiments, this is very close to the real bitrate (<0.1\% difference for most images). But to evaluate this yourself, you can use

python ../ckpts MODEL_ID DATASET --save_ours --real_bpp

which will use an arithmetic encoder to write the symbols of an image to a file, count the number of bits, and then decode the bits to restore the symbols. We note that this is not optimized at all (images from the Kodak validation set take ~350s to encode and ~200s to decode). For a practical implementation, the following should be done:


The plot above was created using

python ../ckpts 0515_1103,0515_1309,0515_1310 kodak --style mean --ids A B C --latex

For reference, the curve corresponding to our model in Fig. 1 in the paper can be reproduced with the following data:

# bpp -> MS-SSIM on Kodak
    (0.1265306, 0.9289356),
    (0.1530612, 0.9417454),
    (0.1795918, 0.9497924),
    (0.2061224, 0.9553684),
    (0.2326531, 0.9598574),
    (0.2591837, 0.9636625),
    (0.2857143, 0.9668663),
    (0.3122449, 0.9695684),
    (0.3387755, 0.9718446),
    (0.3653061, 0.9738012),
    (0.3918367, 0.9755308),
    (0.4183673, 0.9770696),
    (0.4448980, 0.9784622),
    (0.4714286, 0.9797252),
    (0.4979592, 0.9808753),
    (0.5244898, 0.9819255),
    (0.5510204, 0.9828875),
    (0.5775510, 0.9837722),
    (0.6040816, 0.9845877),
    (0.6306122, 0.9853407),
    (0.6571429, 0.9860362),
    (0.6836735, 0.9866768),
    (0.7102041, 0.9872690),
    (0.7367347, 0.9878184),
    (0.7632653, 0.9883268),
    (0.7897959, 0.9887977),
    (0.8163265, 0.9892346),
    (0.8428571, 0.9896379)]


If you want to train on the ImageNet dataset as described in the paper, follow the steps below (Prepare ImageNET). After doing this, you can pass --dataset_train imgnet_train --dataset_test imgnet_test to (make sure you set $RECORDS_ROOT for this, see below). Otherwise, set --dataset_train and --dataset_test to an escaped glob matching images files (e.g. some/images/\*/\*.jpg).

python ae_configs/cvpr/AE_CONFIG pc_configs/cvpr/PC_CONFIG \
        --dataset_train TRAIN_DATASET \
        --dataset_test TEST_DATASET \
        --log_dir_root LOG_DIR_ROOT

where AE_CONFIG and PC_CONFIG are one of the configs in the respective folders. The models in ckpts where obtained with the following configs:

Various options are available for, such as --restore to continue training from a previous checkpoint. See python -h.

Prepare ImageNET

The following instructions assume that you have the following tools installed:

Note that creating all records will likely take several hours. Note that the following was tesed using zsh.

1. Get ImageNET, in the proper format

You need to download ILSVRC2012_img_train.tar and ILSVRC2012_img_val.tar (a good resource is the Inception script). For the following instructions, we assume both tar files are located in a directory data.

    # in data/

    mkdir train val

    pushd train
    tar xvf ../ILSVRC2012_img_train.tar

    pushd val
    tar xvf ../ILSVRC2012_img_val.tar

This will unpack to 1000 .tar containers into train/ and 50000 .JPEG images into val/. Now, we need to extract the training images. This may take a while depending on your setup.

    # in data/
    pushd train
    find . -name "n*.tar" | parallel -j64 'mkdir -vp {/.} && tar xf {} -C {/.}'

2. Downsample

We downsample each image to have 256 pixels on the shorter side, by executing the following command in data/. Again, this is very time-consuming, so if you have access to some CPU cluster, it might make sense to run it there.

    # in data/
    find . -name "*.JPEG" |  parallel -j64 convert -verbose {} -resize "256x256^" {}

3. Create records

Now it's time to pack the images into TF record files. We will save them in data/records/:

    # in data/
    mkdir -p records/train
    mkdir -p records/val

    pushd train
    find . -name "*.JPEG" | parallel --bar -j64 -N 1250 \
        'OUTPUT_PATH=$(printf "../records/train/train-%05d.tfrecord" {#});' \
        'python -m fjcommon tf_records mk_img_rec {} -o $OUTPUT_PATH --feature_key image/encoded'

    pushd val
    find . -name "*.JPEG" | parallel --bar -j16 -N 1250 \
        'OUTPUT_PATH=$(printf "../records/val/val-%05d.tfrecord" {#});' \
        'python -m fjcommon tf_records mk_img_rec {} -o $OUTPUT_PATH --feature_key image/encoded'


Make sure the following environment variable is set before running

    export RECORDS_ROOT=path_to_data/records


If you use this code for your research, please cite this paper:

    Author = {Mentzer, Fabian and Agustsson, Eirikur and Tschannen, Michael and Timofte, Radu and Van Gool, Luc},
    Booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
    Title = {Conditional Probability Models for Deep Image Compression},
    Year = {2018}}