SemEval2019/Task4 Team Bertha-von-Suttner submission

This is the code for the SemEval 2019 Task 4, Hyperpartisan News Detection submitted by team Bertha von Suttner:

All are members of the GATE team of the University of Sheffield Natural Language Processing group

The model created with this was the winning entry, see the public leaderboard (sort by accuracy column, descending): https://www.tira.io/task/hyperpartisan-news-detection/dataset/pan19-hyperpartisan-news-detection-by-article-test-dataset-2018-12-07/

A blog article on the GATE blog briefly describes the approach taken.

If you wish to see the code as it was prepared for the SemEval 2019 task, then refer to the semval-2019 tag in the git repo.

Preparation / Requirements

Once spaCy is installed, you also need to install its en_core_web_sm model. Like this:

python -m spacy download en_core_web_sm

Once NLTK is installed, you also need to install its stopwords data:

python -m nltk.downloader stopwords

Preparation steps:

Training

Run the following steps

Application