This library provides a unified test bench for evaluating graph neural network (GNN) models on the transductive node classification task. The framework provides a simple interface for running different models on several datasets while using multiple train/validation/test splits. In addition, the framework allows to automatically perform hyperparameter tuning for all the models using random search.

This framework uses Sacred as a backend for keeping track of experimental results, and all the GNN models are implemented in TensorFlow. The current version only supports training models on GPUs. This package was tested on Ubuntu 16.04 LTS with Python 3.6.6.

Table of contents

  1. Installation
  2. Running experiments
    1. General structure
    2. Configuring experiments
    3. Creating jobs
    4. Running jobs
    5. Retrieving and aggregating results
    6. Cleaning up the database
  3. GNN models
  4. Datasets
  5. Extending the framework
    1. Adding new models
    2. Adding new datasets
    3. Adding new metrics
  6. Cite


  1. Install MongoDB. The implementation was tested with MongoDB 3.6.4.

    This framework will automatically create a database called pending and databases with the names provided in the experiment configuration files. Make sure these databases do not exist or are empty before running your experiments.

  2. Install Python dependencies from the requirements.txt file. When using conda this can be done as

    cd gnn-benchmark/
    while read requirement; do conda install --yes $requirement; done < requirements.txt

    All required packages except Sacred are available via conda. Sacred can be installed with

    pip install sacred==0.7.3
  3. Install the gnnbench package

    pip install -e .  # has to be run in the directory with file, i.e. in gnn-benchmark/

Running experiments with gnnbench

General structure

Performing experiments with gnnbench consists of four steps:

  1. Define configuration of the experiments using YAML files.

  2. Create jobs. Based on the configuration files defined in the previous step, a list of jobs to be performed is created and saved to the database. Each job is represented as a record in the MongoDB database.

  3. Spawn worker threads. Each thread retrieves one job from the database at a time and runs it.

  4. Retrieve results. The results are retrieved from the database, aggregated and stored in CSV format.

Configuring experiments

The framework supports two types of experiments:

Creating jobs

Use the scripts/ script to generate jobs (represented by records in the pending database) based on the YAML configuration file. The script should be called as

python -c CONFIG_FILE --op {fixed,search,status,clear,reset}

The -c CONFIG_FILE argument contains the path to the YAML file defining the experiment.

You can perform different operations by passing different options to the --op argument.

Running jobs

You can run jobs by spawning worker threads with scripts/ The script works by retrieving pending jobs (i.e. records) from the pending database and executing them in a subprocess. Example usage

python scripts/ -c configs/fixed_configs.conf.yaml --gpu 0

You can run experiments on multiple GPUs in parallel by spawning multiple workers (e.g. using separate tmux sessions or panes) and passing different values for the --gpu parameter. In theory, it should be possible to run multiple workers on a single GPU, but we haven't tested that and cannot guarantee that it will work.

Retrieving and aggregating the results

Use the scripts/ script to retrieve results from the database and aggregate them. The script takes the following command line arguments:

Example usage:

python scripts/ -c configs/fixed_configs.conf.yaml -o results/

Cleaning up the database

If you want to clean up the database, you should run the following commands. You should replace CONFIG_FILE with the path to the YAML config of the experiment that you are running.

  1. To stop all running experiments, simply kill all the running processes.
  2. Reset the running status of all experiments to False
    python scripts/ -c CONFIG_FILE --reset
  3. Delete all pending jobs from the database
    python scripts/ -c CONFIG_FILE --clear
  4. Delete all finished jobs from the database
    python scripts/ -c CONFIG_FILE --clear

GNN models

The framework contains the implementations of the following models (located in gnnbench/models/ directory)


Following attributed graph datasets are currently included (located in the gnnbench/data/ directory)

Each graph (dataset) is represented as an N x N adjacency matrix A, an N x D attribute matrix X, and a vector of node labels y of length N. We store the datasets as npz archives. See gnnbench/data/ and Adding new datasets for information about reading and saving data in this format.

Extending the framework

You can extend this framework by adding your own models, datasets and metrics functions.

Adding new models

All models need to extend the GNNModel class defined in gnnbench/models/

Each model needs to be defined in a separate file.

In order for the framework to instantiate the model correctly, you also need to define a Sacred Ingredient for the model, which in general looks as follows:

MODEL_INGREDIENT = Ingredient('model')
def build_model(graph_adj, node_features, labels, dataset_indices_placeholder,
                train_feed, trainval_feed, val_feed, test_feed,
                model_specific_param1, model_specific_param2, ...):
    # needed if the model uses dropout
    dropout = tf.placeholder(dtype=tf.float32, shape=[])
    train_feed[dropout] = dropout_prob
    trainval_feed[dropout] = False
    val_feed[dropout] = False
    test_feed[dropout] = False

    return MyModel(node_features, graph_adj, labels, dataset_indices_placeholder,

The parameters coming after test_feed can then be configured in the model config file mymodel.conf.yaml. In this config file the parameter model_name must be the same as the file name in which the model is defined (case-insensitive). Have a look at the existing implementations (e.g. GCN) for an example of this method.

To run experiments with the new model, create the YAML configuration file for the model (e.g. config/mymodel.conf.yaml). Then, add this file to the list of models to the experiment config YAML file

    - ...
    - "config/mymodel.conf.yaml"

and follow the instructions for running experiments.

Adding new datasets

To add a new dataset, convert your data to the SparseGraph format and save it to an npz file

from import SparseGraph, save_sparse_graph_to_npz

# Load the adjacency matrix A, attribute matrix X and labels vector y
# A - scipy.sparse.csr_matrix of shape [num_nodes, num_nodes]
# X - scipy.sparse.csr_matrix or np.ndarray of shape [num_nodes, num_attributes]
# y - np.ndarray of shape [num_nodes]

mydataset = SparseGraph(adj_matrix=A, attr_matrix=X, labels=y)
save_sparse_graph_to_npz('path/to/mydataset.npz', mydataset)

To run experiments on the new dataset, add the dataset to the YAML configuration of the experiment:

    - ...
    - "path/to/mydataset.npz"

Adding new metrics

To add a new metric add a new function to the file. The function must have the following signature:

def new_metric(ground_truth, predictions):
    """Description of the new amazing metric.

    ground_truth : np.ndarray, shape [num_samples]
        True labels.
    predicted : np.ndarray, shape [num_samples]
        Predicted labels.

    score : float
        Value of metric for the given predictions.

Then add the metric to the YAML config file for the experiment

    - ...
    - "new_metric"


Please cite our paper if you use this code or the newly introduced datasets in your own work:

  title={Pitfalls of Graph Neural Network Evaluation},
  author={Shchur, Oleksandr and Mumme, Maximilian and Bojchevski, Aleksandar and G{\"u}nnemann, Stephan},
  journal={Relational Representation Learning Workshop, NeurIPS 2018},