UKPLab/coling2018_fake-news-challenge

2010-07-07_ukp_banner

aiphes_logo - small tud_weblogo

Repository of the COLING 2018 paper: A Retrospective Analysis of the Fake News Challenge Stance Detection Task

BibTeX:

@inproceedings{tubiblio105434,
        month = {Juni},
            year = {2018},
        booktitle = {Proceedings of the 27th International Conference on Computational Linguistics (COLING 2018)},
        title = {A Retrospective Analysis of the Fake News Challenge Stance-Detection Task},
        author = {Andreas Hanselowski and Avinesh P.V.S. and Benjamin Schiller and Felix Caspelherr and Debanjan * Chaudhuri and Christian M. Meyer and Iryna Gurevych},
            url = {http://tubiblio.ulb.tu-darmstadt.de/105434/}
    }

Introduction

The repository was originaly developed as a part of the Fake News Challenge Stage 1 (FNC-1 http://www.fakenewschallenge.org/) by team Athene: Andreas Hanselowski, Avinesh PVS, Benjamin Schiller and Felix Caspelherr. In the project, we worked in close collaboration with Debanjan Chaudhuri.

Prof. Dr. Iryna Gurevych, AIPHES-Ubiquitous Knowledge Processing (UKP) Lab, TU-Darmstadt, Germany

It was further developed and enhanced by Felix Caspelherr in scope of his master thesis. The code was additionaly modified and extended for the submission to the "27th International Conference on Computational Linguistics (COLING 2018)": "A Retrospective Analysis of the Fake News Challenge Stance Detection Task"

Requirements

Software dependencies
```
python >= 3.4 (tested with 3.4)
```

Installation

Install required python packages.
```
python3.4 -m pip install -r requirements.txt --upgrade
python3.4 -m pip install --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.9.0rc0-cp34-cp34m-linux_x86_64.whl
```
alternatively you can set up an anaconda environment based on "anaconda_env_FNC_challenge.yml" by executing following command in the fnc folder:
```
conda env create -f anaconda_env_FNC_challenge.yml
```
(Note: If you use a higher CUDA version, you might have to use a newer version of tensorflow.)

Parts of the Natural Language Toolkit (NLTK) might need to be installed manually.

python3.4 -c "import nltk; nltk.download('stopwords'); nltk.download('punkt'); nltk.download('wordnet'); nltk.download('cmudict');"

Copy Word2Vec GoogleNews-vectors-negative300.bin.gz in folder
```
[project_name]/data/embeddings/google_news/ 
```
(folders have to be created)
Download Paraphrase Database: Lexical XL Paraphrases 1.0 and extract it to the ppdb folder.
```
gunzip ppdb-1.0-xl-lexical.gz [project_name]/data/ppdb/
```
(folders have to be created)

To use the Stanford-parser an instance has to be started in parallel: Download Stanford CoreNLP, extract anywhere and execute following command:

wget http://nlp.stanford.edu/software/stanford-corenlp-full-2016-10-31.zip
java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9020

In order to reproduce the the results of the experiments mentioned in our COLING paper "A Retrospective Analysis of the Fake News Challenge Stance Detection Task", please modify fnc/settings.py to match the desired experiments. At the current state, the model "voting_mlps_hard" will be trained and tested on the FNC corpus. All combinations of models and features used are listed in the settings file in _featurelist.

Additional notes

Setup tested on Anaconda3 (tensorflow 0.9 gpu version)* Be sure that cuda library is setup correctly.

conda create -n env_python3.4 python=3.4 anaconda
source activate env_python3.4
python3.4 -m pip install -r requirements.txt --upgrade
python3.4 -m pip install --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.9.0rc0-cp34-cp34m-linux_x86_64.whl

To Run

To run execute following steps:

Make sure that an instance of th Stanford CoreNLP server is running (see step 5 above)
run "python pipeline.py -p ftrain"
--> features will be created and saved in the corresponding folder fnc/data/fnc-1/features/[selected corpus]/
--> the model will be saved in /fnc/data/fnc-1/mlp_models/
note down the model name of the trained model and align settings.py
run "python pipeline.py -p ftest" to obtain results (FNC score, F1 scores and confusion matrix)
--> Result-scores will appended to fnc/fnc_results.txt
--> The labeled test-file will be saved to fnc/data/fnc-1/fnc_results/

Note: The "stanford features" and topic features may take several hours to be computed.

For more details

python pipeline.py --help         

    e.g.: python pipeline.py -p crossv holdout ftrain ftest

    * crossv: runs 10-fold cross validation on train / validation set and prints the results
    * holdout: trains classifier on train and validation set, tests it on holdout set and prints the results
    * ftrain: trains classifier on train/validation/holdout set and saves it to fnc/data/fnc-1/mlp_models/
    * ftest: predicts stances of unlabeled test set based on the model

System description

A more detailed description of the system including the features, which have been used, can be found in the document: system_description_athene.pdf