Text Categorization

This repository contains the source code and other helper files for my undergraduate thesis titled "Graph Convolutional Neural Networks for Text Categorization" under the supervision of Prof. Xavier Bresson at Nanyang Technological University, Singapore.

There are a total of three benchmark models and three deep learning models implemented in this repository for text classification:

  1. baseline.py: Linear SVC & Multinomial Naive Bayes
  2. mlp.py: Multilayer Perceptron
  3. cnn_fchollet.py: F. Chollet CNN (based on this 2016 blog post)
  4. cnn_ykim.py: Y. Kim CNN (based on Y. Kim, 2014)
  5. graph_cnn.py: Graph CNN (based on M. Defferrard et al., 2017)

The above models were tested on three datasets — Rotten Tomatoes Sentence Polarity Dataset, 20 Newsgroups & RCV1. The code used to preprocess the datasets can be found here and the performance of the models on these datasets can be found here.