# Graph Neural Networks

This is a PyTorch library to implement graph neural networks. Any questions, comments or suggestions, please e-mail Fernando Gama at fgama@seas.upenn.edu and/or Luana Ruiz at rubruiz@seas.upenn.edu. An in-depth tutorial on a source localization example can be found here.

Whenever using any part of this code, please cite the following paper

F. Gama, A. G. Marques, G. Leus, and A. Ribeiro, "Convolutional Neural Network Architectures for Signals Supported on Graphs," IEEE Trans. Signal Process., vol. 67, no. 4, pp. 1034–1049, Feb. 2019.

We note that some specific architectures have specific paper citation to adequately acknowledge the respective contributors.

Other papers on GNNs by the authors are

E. Isufi, F. Gama, and A. Ribeiro, "EdgeNets: Edge Varying Graph Neural Networks," submitted to IEEE Trans. Pattern Analysis and Mach. Intell.

F. Gama, E. Isufi, G. Leus, and A. Ribeiro, "Graphs, Convolutions, and Neural Networks," submitted to IEEE Signal Process. Mag.

L. Ruiz, F. Gama, and A. Ribeiro, "Gated Graph Recurrent Neural Networks," submitted to IEEE Trans. Signal Process.

F. Gama, J. Bruna, and A. Ribeiro, "Stability Properties of Graph Neural Networks," submitted to IEEE Trans. Signal Process.

F. Gama, E. Tolstaya, and A. Ribeiro, "Graph Neural Networks for Decentralized Controllers," arXiv:2003.10280v1 [cs.LG], 23 March 2020.

L. Ruiz, F. Gama, A. G. Marques, and A. Ribeiro, "Invariance-Preserving Localized Activation Functions for Graph Neural Networks," IEEE Trans. Signal Process., vol. 68, no. 1, pp. 127-141, Jan. 2020.

F. Gama, J. Bruna, and A. Ribeiro, "Stability of Graph Scattering Transforms," in 33rd Conf. Neural Inform. Process. Syst. Vancouver, BC: Neural Inform. Process. Syst. Foundation, 8-14 Dec. 2019.

F. Gama, A. G. Marques, A. Ribeiro, and G. Leus, "MIMO Graph Filters for Convolutional Networks," in 19th IEEE Int. Workshop Signal Process. Advances in Wireless Commun. Kalamata, Greece: IEEE, 25-28 June 2018, pp. 1–5.

F. Gama, G. Leus, A. G. Marques, and A. Ribeiro, "Convolutional Neural Networks via Node-Varying Graph Filters," in 2018 IEEE Data Sci. Workshop. Lausanne, Switzerland: IEEE, 4-6 June 2018, pp. 220–224.

## Introduction

We consider data supported by an underlying graph with N nodes. We describe the graph in terms of an N x N matrix S that respects the sparsity of the graph. That is, the element (i,j) of matrix S can be nonzero, if and only if, i=j or (j,i) is an edge of the graph. Examples of such matrices are the adjacency matrix, the graph Laplacian, the Markov matrix, and many normalized counterparts. In general, we refer to this matrix S as the graph shift operator (GSO). This code supports extension to a tensor GSO whenever we want to assign a vector weight to each edge, instead of a scalar weight.

To describe the N-dimensional data x as supported by the graph, we assume that each element of x represents the data value at each node, i.e. the i-th element [x]i = xi represents the data value at node i. We thus refer to x as a graph signal. To effectively relate the graph signal x (which is an N-dimensional vector) to the underlying graph support, we use the GSO matrix S. In fact, the linear operation Sx represents an exchange of information with neighbors. When computing Sx, each node interacts with its one-hop neighbors and computes a weighted average of the information in these neighbors. More precisely, if we denote by Ni the set of neighbors of node i, we see that the output of the operation Sx at node i is given by

$[\mathbf{S}\mathbf{x}]_{i} = \sum_{j=1}^{N} [\mathbf{S}]_{ij} [\mathbf{x}]_{j} = \sum_{j \in \ccalN_{i}} s_{ij} x_{j}$

due to the sparsity pattern of the matrix S where the only nonzero elements are those where there is an edge (j,i) connecting the nodes. We note that the use of the GSO allows for a very simple and straightforward way of explicitly relating the information between different nodes, following the support specified by the given graph. We can extend the descriptive power of graph signals by assining an F-dimensional vector to each node, instead of a simple scalar. Each element f of this vector is refered to as feature. Then, the data can be thought of as a collection of F graph signals xf, for each f=1,...,F, where each graph signal xf represents the value of the specific feature f across all nodes. Describing the data as a collection of F graph signals, as opposed to a collection of N vectors of dimension F, allows us to exploit the GSO to easily relate the data with the underlying graph support (as discussed for the case of a scalar graph signal).

A graph neural network is an information processing architecture that regularizes the linear transform of neural networks to take into account the support of the graph. In its most general description, we assume that we have a cascade of L layers, where each layer l takes as input the previous signal, which is a graph signal described by Fl-1 features, and process it through a bank of Fl Fl-1 linear operations that exploit the graph structure S to obtain Fl output features, which are processed by an activation function σl. Namely, for layer l, the output is computed as

$\mathbf{x}_{\ell}^{f} = \sigma_{\ell} \left(\sum_{g=1}^{F_{\ell-1}} \mathbf{H}_{\ell}^{fg}(\mathbf{S}) \mathbf{x}_{\ell-1}^{g} \right)$

where the linear operators Hlfg(S) represent graph filters which are linear transforms that exploit the underlying graph structure (typically, by means of local exchanges only, and access to partial information). There are several choices of graph filters that give rise to different architectures (the most popular choice being the linear shift-invariant graph filters, which give rise to graph convolutions), many of which can be found in the ensuing library. The operation of pooling, and the extension of the activation functions to include local neighborhoods, can also be found in this library.

## Code

The library is written in Python3, drawing heavily from numpy, and with neural network models that are defined and trained within the PyTorch framework.

### Dependencies

The required packages are `os`, `numpy`, `matplotlib`, `pickle`, `datetime`, `scipy.io`, `copy`, `torch`, `scipy`, `math`, and `sklearn`. Additionally, to handle specific datasets listed below, the following are also required `hdf5storage`, `urllib`, `zipfile`, `gzip` and `shutil`; and to handle tensorboard visualization, also include `glob`, `torchvision`, `operator` and `tensorboardX`.

### Datasets

The different datasets involved graph data that are available in this library are the following ones.

1. Authorship attribution dataset, available under `datasets/authorshipData` (note that the available .rar files have to be uncompressed into the `authorshipData.mat` to be able to use that dataset with the provided code). When using this dataset, please cite

S. Segarra, M. Eisen, and A. Ribeiro, "Authorship attribution through function word adjacency networks," IEEE Trans. Signal Process., vol. 63, no. 20, pp. 5464–5478, Oct. 2015.

2. The MovieLens-100k dataset. When using this dataset, please cite

F. M. Harper and J. A. Konstan, "The MovieLens datasets: History and Context", ACM Trans. Interactive Intell. Syst., vol. 5, no. 4, pp. 19:(1-19), Jan. 2016.

3. A source localization dataset. This source localization problem generates synthetic data at execution time. This data can be generated on synthetic graphs such as the Small World graph or the Stochastic Block Model. It can also generate synthetic data, on a real Facebook graph. When using the Facebook graph, please cite

J. McAuley and J. Leskovec, "Learning to discover social circles in Ego networks," in 26th Neural Inform. Process. Syst. Stateline, TX: NeurIPS Foundation, 3-8 Dec. 2012.

4. A flocking dataset. The problem of flocking consists on controlling a robot swarm, initially flying at random, arbitrary velocities, to fly together at the same velocity while avoiding collisions with each other. The task is to do so in a distributed and decentralized manner, where each agent (each robot) can compute its control action at every time instat relying only on information obtained from communications with immediate neighbors. The dataset is synthetic in that it generates different sample trajectories with random initializations. When using this dataset, please cite F. Gama, E. Tolstaya, and A. Ribeiro, "[Graph Neural Networks for Decentralized Controllers](http://arxiv.org/abs/2003.10280)," _arXiv:2003.10280v1 [cs.LG],_ 23 March 2020. ### Libraries The libraries found here are split into two directories: `Modules/` and `Utils/`. * `Modules.architectures` contains the implementation of several standard architectures (as `nn.Module` subclasses) so that they can be readily initialized and trained. Details are provided in the [next section](#architectures). * `Modules.architecturesTime` contains the implementation of several standard architectures (as `nn.Module` subclasses) that handle time-dependent topologies, so that they can be readily initialized and trained. Details are provided in the [next section](#architectures). * `Modules.evaluation` contains functions that act as intermediaries between the model and the data in order to evaluate a trained architecture. * `Modules.loss` contains a wrapper for the loss function so that it can adapt to multiple scenarios. * `Modules.model` defines a `Model` that binds together the three basic elements to construct a machine learning model: the (neural network) architecture, the loss function and the optimizer. Additionally, it assigns a training handler and an evaluator. It assigns a name to the model and a directory where to save the trained parameters of the architecture, as well. It is the basic class that can train and evaluate a model and also offers methods to save and load parameters. * `Modules.training` contains classes that handle the training of each model, acting as an intermediary between the data and the specific architecture within the model being trained. * `Utils.dataTools` loads each of the datasets described [above](#datasets) as classes with several functionalities particular to each dataset. All the data classes do have two methods: `.getSamples` to gather the corresponding samples to training, validation or testing sets, and `.evaluate` that compute the corresponding evaluation measure. * `Utils.graphML` is the main library containing the implementation of all the possible graph neural network layers (as `nn.Module` subclasses). This library is the analogous of the `torch.nn` layer, but for graph-based operations. It contains the definition of the basic layers that need to be put together to build a graph neural network. Details are provided in the [next section](#architectures). * `Utils.graphTools` defines the `Graph` class that handles graph-structure information, and offers several other tools to handle graphs. * `Utils.miscTools` defines some miscellaneous functions. * `Utils.visualTools` contains all the relevant classes and functions to handle visualization in tensorboard. ### Architectures In what follows, we describe several ways of parameterizing the filters _Hlfg(S)_ that are implemented in this library. * ___Convolutional Graph Neural Networks (via Selection)___. The most popular graph neural network (GNN) is that one that parameterizes _Hlfg(S)_ by a linear shift-invariant graph filter, giving rise to a __graph convolution__. The `nn.Module` subclass that implements the graph filter (convolutional) layer can be found in `Utils.graphML.GraphFilter`. This layer is the basic linear layer in the Selection GNN architecture (which also adds the pointwise activation function and the zero-padding pooling operation), which is already implemented in `Modules.architectures.SelectionGNN` and shown in several examples. For more details on this graph convolutional layer or its architecture, and whenever using it, please cite the following paper F. Gama, A. G. Marques, G. Leus, and A. Ribeiro, "[Convolutional Neural Network Architectures for Signals Supported on Graphs](http://ieeexplore.ieee.org/document/8579589)," _IEEE Trans. Signal Process._, vol. 67, no. 4, pp. 1034–1049, Feb. 2019. The `Modules.architectures.SelectionGNN` also has a flag called `coarsening` that allows for the pooling to be done in terms of graph coarsening, following the Graclus algorithm. This part of the code was mainly adapted to PyTorch from this repository. For more details on graph coarsening, and whenever using the `SelectionGNN` with graph coarsening pooling, please cite the following [paper](http://papers.nips.cc/paper/6081-convolutional-neural-networks-on-graphs-with-fast-localized-spectral-filtering.pdf). Also note that by setting the number of filter taps (`nFilterTaps`) to `2` on every layer leads to this [architecture](http://openreview.net/forum?id=SJU4ayYgl). Finally, this other [architecture](https://openreview.net/forum?id=ryGs6iA5Km) is obtained by setting the number of filter taps to `1` for each number of designed fully-connected layers, and then setting it to `2` to complete the corresponding _GIN layer_. There is one further implementation that is entirely local (i.e. it only involves operations exchanging information with one-hop neighbors). This implementation essentially replaces the last fully-connected layer by a readout layer that only operates on the features obtained at the node. The implementation is dubbed `LocalGNN` and is used in the `MovieLens` example. * ___Convolutional Graph Neural Networks (via Spectrum)___. The spectral GNN is an early implementation of the convolutional GNN in the graph frequency domain. It does not scale to large graphs due to the cost of the eigendecomposition of the GSO. The spectral filtering layer is implemented as a `nn.Module` subclass in `Utils.graphML.SpectralGF` and the corresponding architecture with these linear layers, together with pointwise nonlinearities is implemented in `Modules.architectures.SpectralGNN`. For more details on the spectral graph filtering layer or its architecture, and whenever using it, please cite J. Bruna, W. Zaremba, A. Szlam, and Y. LeCun, "[Spectral networks and deep locally connected networks on graphs](http://openreview.net/forum?id=DQNsQf-UsoDBa)," in _Int. Conf. Learning Representations 2014_. Banff, AB: Assoc. Comput. Linguistics, 14-16 Apr. 2014, pp. 1–14. * ___Convolutional Graph Neural Networks (via Aggregation)___. An alternative way to implementing a graph convolution is by means of building an aggregation sequence on each node. Instead of thinking of the graph signal as being diffused through the graph and each diffusion being weighed separately (as is the case of a GCNN via Selection), we think of the signal as being aggregated at each node, by means of successive communications with the one-hop neighbors, and each communication is being weighed by a separate filter tap. The key point is that these aggregation sequences exhibit a regular structure that simultaneously take into account the underlying graph support, since each contiguous element in the sequence represents a contiguous neighborhood. Once we have a regular sequence we can go ahead and apply a regular CNN to process its information. This idea is called an Aggregation GNN and is implemented in `Modules.architectures.AggregationGNN`, since it relies on regular convolution and pooling already defined on `torch.nn`. A more sophisticated and powerful variant of the Aggregation GNN, called the __Multi-Node Aggregation GNN__ is also available on `Modules.architectures.MultiNodeAggregationGNN`. For more details on the Aggregation GNN, and whenever using it, please cite the following paper F. Gama, A. G. Marques, G. Leus, and A. Ribeiro, "[Convolutional Neural Network Architectures for Signals Supported on Graphs](http://ieeexplore.ieee.org/document/8579589)," _IEEE Trans. Signal Process._, vol. 67, no. 4, pp. 1034–1049, Feb. 2019. * ___Node-Variant Graph Neural Networks___. Parameterizing _Hlfg(S)_ with a node-variant graph filter (as opposed to a shift-invariant graph filter), a non-convolutional graph neural network architecture can be built. A node-variant graph filter, essentially lets each node learn its own weight for each neighborhood information. In order to allow this architecture to scale (so that the number of learnable parameters does not depend on the size of the graph), we offer a hybrid node-variant GNN approach as well. The graph filtering layer using node-variant graph filters is defined in `Utils.graphML.NodeVariantGF` and an example of an architecture using these filters for the linear operation, combined with pointwise activation functions and zero-padding pooling, is available in `Modules.architectures.NodeVariantGNN`. For more details on node-variant GNNs, and whenever using these filters or architecture, please cite the following paper E. Isufi, F. Gama, and A. Ribeiro, "[EdgeNets: Edge Varying Graph Neural Networks](http://arxiv.org/abs/2001.07620)," submitted to _IEEE Trans. Pattern Analysis and Mach. Intell._ * ___ARMA Graph Neural Networks___. A convolutional architecture that is very flexible and with enlarged descriptive power. It replaces the graph convolution with a FIR filter (i.e. the use of a polynomial of the shift operator) by an ratio of polynomials. This architecture offers a good trade-off between number of paramters and selectivity of learnable filters. The edge-variant graph filter layer can be found in `Utils.graphML.EdgeVariantGF`. An example of an architecture with ARMA graph filters as the linear layer, and pointwise activation functions and zero-padding pooling is available in `Modules.architectures.ARMAfilterGNN`. A `Local` version of this architecture is also available. For more details on ARMA GNNs, and whenever using these filters or architecture, please cite the following paper E. Isufi, F. Gama, and A. Ribeiro, "[EdgeNets: Edge Varying Graph Neural Networks](http://arxiv.org/abs/2001.07620)," submitted to _IEEE Trans. Pattern Analysis and Mach. Intell._ * ___Edge-Variant Graph Neural Networks___. The most general parameterization that we can make of a linear operation that also takes into account the underlying graph support, is to let each node weigh each of their neighbors' information differently. This is achieved by means of an edge-variant graph filter. Certainly, the edge-variant graph filter has a number of parameters that scales with the number of edges, so a hybrid approach is available. The edge-variant graph filter layer can be found in `Utils.graphML.GraphFilterARMA`. An example of an architecture with edge-variant graph filters as the linear layer, and pointwise activation functions and zero-padding pooling is available in `Modules.architectures.EdgeVariantGNN`. A `Local` version of this architecture is also available. For more details on edge-variant GNNs, and whenever using these filters or architecture, please cite the following paper E. Isufi, F. Gama, and A. Ribeiro, "[EdgeNets: Edge Varying Graph Neural Networks](http://arxiv.org/abs/2001.07620)," submitted to _IEEE Trans. Pattern Analysis and Mach. Intell._ * ___Graph Attention Networks___. A particular case of edge-variant graph filters (that predates the use of more general edge-variant filters) and that has been shown to be successful is the graph attention network (commonly known as GAT). The original implementation of GATs can be found in this repository. In this library, we offer a PyTorch adaptation of this code (which was originally written for TensorFlow). The GAT parameterizes the edge-variant graph filter by taking into account both the graph support and the data, yielding an architecture with a number of parameters that is independent of the size of the graph. The graph attentional layer can be found in `Utils.graphML.GraphAttentional`, and an example of this architecture in `Modules.architectures.GraphAttentionNetwork`. For more details on GATs, and whenever using this code, please cite the following paper P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Liò, and Y. Bengio, "[Graph Attention Networks](http://openreview.net/forum?id=rJXMpikCZ)," in _6th Int. Conf. Learning Representations_. Vancouver, BC: Assoc. Comput. Linguistics, 30 Apr.-3 May 2018, pp. 1–12. * ___Local Activation Functions___. Local activation functions exploit the irregular neighborhoods that are inherent to arbitrary graphs. Instead of just applying a pointwise (node-wise) activation function, using a local activation function that carries out a nonlinear operation within a neighborhood has been shown to be effective as well. The corresponding architecture is named `LocalActivationGNN` and is available under `Modules/architectures.py`. In particular, in this code, the __median activation function__ is implemented in `Utils.graphML.MedianLocalActivation` and the __max activation function__ is implemented in `Utils.graphML.MaxLocalActivation`. For more details on local activation function, and whenever using these operational layers, please cite the following papers L. Ruiz, F. Gama, A. G. Marques, and A. Ribeiro, "[Invariance-Preserving Localized Activation Functions for Graph Neural Networks](https://ieeexplore.ieee.org/document/8911416)," _IEEE Trans. Signal Process._, vol. 68, no. 1, pp. 127-141, Jan. 2020. * ___Time-Varying Architectures___. The Selection and Aggregation GNNs have a version adapted to handling time-varying graph signals as well as time-varying shift operators, acting with a unit-delay between communication with neighbors. These architectures can be found in `architecturesTime.LocalGNN_DB` and `architecturesTime.AggregationGNN_DB`. For more details on these architectures, please see (and if use, please cite) F. Gama, E. Tolstaya, and A. Ribeiro, "[Graph Neural Networks for Decentralized Controllers](http://arxiv.org/abs/2003.10280)," _arXiv:2003.10280v1 [cs.LG],_ 23 March 2020. E. Tolstaya, F. Gama, J. Paulos, G. Pappas, V. Kumar, and A. Ribeiro, "[Learning Decentralized COntrollers for Robot Swarms with Graph Neural Networks](http://arxiv.org/abs/1903.10527)," in _Conf. Robot Learning 2019._ Osaka, Japan: Int. Found. Robotics Res., 30 Oct.-1 Nov. 2019. * ___Graph Recurrent Neural Networks___. A graph RNN approximates a time-varying graph process with a hidden Markov model, where the hidden state is learned from data. In a graph RNN all linear transforms involved are graph filters that respect the graph. This is a highly flexible architecture that exploits the graph structure as well as the time-dependencies present in data. The architecture can be found in `architecturesTime.GraphRecurrentNN_DB`. For more details please see, and when using this architecture please cite, L. Ruiz, F. Gama, and A. Ribeiro, "[Gated Graph Recurrent Neural Networks](http://arxiv.org/abs/2002.01038)," submitted to _IEEE Trans. Signal Process._ ### Examples We have included an in-depth [tutorial](tutorial.ipynb) `tutorial.ipynb` on a [Jupyter Notebook](http://jupyter.org/). We have also included other examples involving all the four datasets presented [above](#datasets), with examples of all the architectures [just](#architectures) discussed. * [Tutorial](tutorial.ipynb): `tutorial.ipynb`. The tutorial covers the basic mathematical formulation for the graph neural networks, and considers a small synthetic problem of source localization. It implements the Aggregation and Selection GNN (both zero-padding and graph coarsening). This tutorial explain, in-depth, all the elements intervening in the setup, training and evaluation of the models, that serves as skeleton for all the other examples. * [Source Localization](sourceLocGNN.py): `sourceLocGNN.py`. This example deals with the source localization problem on a 100-node, 5-community random-generated SBM graph. It can consider multiple graph and data realizations to account for randomness in data generation. Implementations of Selection and Aggregation GNNs with different node sampling criteria are presented. * [MovieLens](movieGNN.py): `movieGNN.py`. This example has the objective of predicting the rating some user would give to a movie, based on the movies it has ranked before (following the MovieLens-100k dataset). In this case we present a one- and two-layer Selection GNN with no-padding and the one- and two-layer local implementation available at `LocalGNN`. * [Authorship Attribution](authorshipGNN.py): `authorshipGNN.py`. This example addresses the problem of authorship attribution, by which a text has to be assigned to some author according to their styolmetric signature (based on the underlying word adjacency network; details here). In this case, we test different local activation functions (median, max, and pointwise). * [Flocking](flockingGNN.py): `flockingGNN.py`. This is an example of controlling a robot swarm to fly together at the same velocity while avoiding collisions. It is a synthetic dataset where time-dependent architectures can be tested. In particular, we test the use of a linear filter, a Local GNN, an Aggregation GNN and a GRNN, considering, not only samples of the form (S_t, x_t), for each t, but also delayed communications where the information observed from further away neighbors is actually delayed. ## Version * ___0.3 (May 2, 2020):___ Added the time-dependent architectures that handle (graph, graph signal) batch data as well as delayed communications. These architectures can be found in `architecturesTime.py`. A new synthetic dataset has also been added, namely, that used in the Flocking problem. Made the `Model` class to be the central handler of all the machine learning model. Training multiple models has been dropped in favor of training through the method offered in the `Model` class. Trainers and evaluators had to been added to be effective intermediaries between the architectures and the data, especially in problems that are not classification ones (i.e. regression -interpolation- in the movie recommendation setting, and imitation learning in the flocking problem). This should give flexibility to carry over these architectures to new problems, as well as make prototyping easier since training and evaluating has been greatly simplified. Minor modifications and eventual bug fixes have been made here and there. * ___0.2 (Dec 16, 2019):___ Added new architecture: `LocalActivationGNN` and `LocalGNN`. Added new loss module to handle the logic that gives flexibility to the loss function. Moved the ordering from external to the architecture, to internal to it. Added two new methods: `.splitForward()` and `.changeGSO()` to separate the output from the graph layers and the MLP, and to change the GSO from training to test time, respectively. Class `Model` does not keep track of the order anymore. Got rid of `MATLAB(R)` support. Better memory management (do not move the entire dataset to memory, only the batch). Created methods to normalize dat aand change data type. Deleted the 20News dataset which is not supported anymore. Added the method `.expandDims()` to the `data` for increased flexibility. Changed the evaluate method so that it is always a decreasing function. Totally revamped the `MovieLens` class. Corrected a bug on the `computeNeighborhood()` function (thanks to Bianca Iancu, A (dot) Iancu-1 (at) student (dot) tudelft (dot) nl and Gabriele Mazzola, G (dot) Mazzola (at) student (dot) tudelft (dot) nl for spotting it). Corrected bugs on device handling of local activation functions. Updated tutorial. * ___0.1 (Jul 12, 2019):___ First released (beta) version of this graph neural network library. Includes the basic convolutional graph neural networks (selection -zero-padding and graph coarsening-, spectral, aggregation), and some non-convolutional graph neural networks as well (node-variant, edge-variant and graph attention networks). It also inlcudes local activation functions (max and median). In terms of examples, it considers the source localization problem (both in the tutorial and in a separate example), the movie recommendation problem, the authorship attribution problem and the text categorization problems. In terms of structure, it sets the basis for data handling and training of multiple models.