Multi-Task Graph Autoencoders

This is a Keras implementation of the symmetrical autoencoder architecture with parameter sharing for the tasks of link prediction and semi-supervised node classification, as described in the following:

Tran, Phi Vu. Learning to Make Predictions on Graphs with Autoencoders. Proceedings of the 5th IEEE International Conference on Data Science and Advanced Analytics (2018). Full oral paper.

Tran, Phi Vu. Multi-Task Graph Autoencoders. NIPS 2018 Workshop on Relational Representation Learning. Short poster paper.

schematic

Requirements

The code is tested on Ubuntu 16.04 with the following components:

Software

Hardware

Datasets

Citation networks from Thomas Kipf and Max Welling. 2016. Semi-Supervised Classification with Graph Convolutional Networks:

Collaboration and social networks from Wang et al. 2016. Structural Deep Network Embedding:

Miscellaneous networks from Aditya Krishna Menon and Charles Elkan. 2011. Link Prediction via Matrix Factorization:

For custom graph datasets, the following are required:

For an example of how to prepare the input dataset, take a look at the load_citation_data() function in utils_gcn.py.

Usage

For training and evaluation, execute the following bash commands in the same directory where the code resides:

# Set the PYTHONPATH environment variable
$ export PYTHONPATH="/path/to/this/repo:$PYTHONPATH"

# Train the autoencoder model for network reconstruction
# using only latent features learned from local graph topology.
$ python train_reconstruction.py <dataset_str> <gpu_id>

# Train the autoencoder model for link prediction using
# only latent features learned from local graph topology.
$ python train_lp.py <dataset_str> <gpu_id>

# Train the autoencoder model for link prediction using
# both latent graph features and available explicit node features.
$ python train_lp_with_feats.py <dataset_str> <gpu_id>

# Train the autoencoder model for the multi-task
# learning of both link prediction and semi-supervised
# node classification, simultaneously.
$ python train_multitask_lpnc.py <dataset_str> <gpu_id>

The flag <dataset_str> refers to one of the following nine supported dataset strings: protein, metabolic, conflict, powergrid, cora, citeseer, pubmed, arxiv-grqc, blogcatalog. The flag <gpu_id> denotes the GPU device ID, 0 by default if only one GPU is available.

Citation

If you find this work useful, please cite the following:


@inproceedings{Tran-LoNGAE:2018,
  author={Tran, Phi Vu},
  title={Learning to Make Predictions on Graphs with Autoencoders},
  booktitle={5th IEEE International Conference on Data Science and Advanced Analytics},
  year={2018}
}