Experiments Codes for Bi-directional Block Self-attention

News: A time- and memory-efficient self-attention mechanism named as Fast-DiSA has been proposed, which is as fast as multi-head self-attention but uses the multi-dim and positional masks techniques. The codes are released at here

Re-Implementation

Cite this paper using BibTex:

@inproceedings{shen2018biblosan,
Author = {Shen, Tao and Zhou, Tianyi and Long, Guodong and Jiang, Jing and Zhang, Chengqi},
Booktitle = {International Conference on Learning Representations (ICLR)},
Title = {Bi-Directional Block Self-Attention for Fast and Memory-Efficient Sequence Modeling},
Year = {2018}
}

Overall Requirements

This repo includes following parts:

Usage of The Universal Interface for Context Fusion and Sentence Encoding

The codes are stored in directory context_fusion of this repo, and just import the functions from package context_fusion if you want to use them:

from context_fusion.interface import context_fusion_layers, sentence_encoding_models

These two functions share the similar parameter definitions:

Method Str Explanation Context Fusion Sentence Encoding
cnn_kim CNN from Yoon Kim (just for sentence encoding) F T
no_ct No context F T
lstm Bi-LSTM T T
gru Bi-GRU T T
sru Bi-SRU (Simple Recurrent Unit) T T
multi_cnn Multi-window CNN with context info added T T
hrchy_cnn Multi-layer CNN with resConnect + GLU T T
multi_head multi-head attention with attention dropout and positional encoding T T
disa Directional Self-attention(DiSA) T T
block Bi-directional Block Self-Attention(Bi-BloSA) T T

Experiments Codes for Paper

Project Directories:

  1. Directory exp_SNLI --- python project for Stanford Natural Language Inference dataset
  2. Directory exp_SICK --- python project for Sentences Involving Compositional Knowledge dataset
  3. Directory exp_SQuAD_sim --- python project for simplified Stanford Question Answering dataset
  4. Directory exp_SST --- python project for fine-grained and binary Stanford Sentiment Treebank dataset
  5. Directory exp_TREC --- python project for TREC question-type classification dataset
  6. Directory exp_SC --- python project for three sentence classification bench marks: Customer Review, MPQA and SUBJ.

Shared Python Parameters to Run the Experiments Codes

Here, we introduce the shared parameters which appear in all these benchmark projects:

Programming Framework for all Experiments Codes

We first demonstrate the file directory tree of all these projects:

ROOT
--dataset[d]
----glove[d]
----$task_dataset_name$[d]
--src[d]
----model[d]
------template.py[f]
------context_fusion.py[f]
----nn_utils[d]
----utils[d]
------file.py[f]
------nlp.py[f]
------record_log.py[f]
------time_counter.py[f]
----dataset.py[f]
----evaluator.py[f]
----graph_handler.py[f]
----perform_recorder.py[f]
--result[d]
----processed_data[d]
----model[d]
------$model_specific_dir$[d]
--------ckpt[d]
--------log_files[d]
--------summary[d]
--------answer[d]
--configs.py[f]
--$task$_main.py[f]
--$task$_log_analysis.py[f]

Note: The result dir will appear after the first running.

We elaborate on the every files[f] and directory[d] as follows:

./configs.py: perform the parameters parsing and definitions and declarations of global variables, e.g., parameter definition/default value, name(of train/dev/test_data, model, processed_data, ckpt etc.) definitions, directories(of data, result, $model_specific_dir$ etc.) definitions and corresponding paths generation.

./$task$_main.py: this is the main entry python script to run the project;

./$task$_log_analysis.py: this provides a function to analyze the log file of training process.

./dataset/: this is the directory including datasets for current project.

./src: dir including python scripts

./result/: a dir to place the results.

Usage of the Experiments Projects

Python Package requirements

Running a Project

git clone https://github.com/code4review/BiBloSA
cd $project_dir$

Then, based on the parameters introduction and programming framework, please refer to the README.md in the $project_dir$ for data preparation, data processing and network training.

Note that due to many projects this repo includes, it is inevitable that there are some wrong when I organize the projects into this repo. If you confront some bugs or errors when running the codes, please feel free to report them by opening a issues. I will reply it ASAP.

TODO

Acknowledge