BootEA

Source code and datasets for IJCAI-2018 paper "Bootstrapping Entity Alignment with Knowledge Graph Embedding".

Dataset

We use two datasets, namely DBP15K and DWY100K. DBP15K can be found here while DWY100K is as follows.

Id files of DWY100K

Folder "dataset/DWY100K/" contains the id files of DWY100K.

The subfolder "mapping/0_3" contains the id files used in BootEA and MTransE while the subfolder "sharing/0_3" is for JAPE and IPTransE. The two datasets use 30% reference entity alignment as seeds. Id files in "sharing/0_3" are generated following the idea of parameter sharing that lets the two aligned enitites in seed alignment share the same id, while "mapping/0_3" does not.

The subfolder "mapping/0_3" inculdes the following files:

The subfolder "sharing/0_3" inculdes the following additional files:

Raw data of DWY100K

File "dataset/DWY100K_raw_data.zip" is the raw data of DWY100K, where each entity, relation or attribute is represented by a URI. Each dataset has the following files:

Code

Folder "code" contains all codes of BootEA, in which:

Dependencies

If you fail to install Graph-tool, we suggest you to set "self.heuristic = False" in param.py, which allows BootEA to run using igraph rather than Graph-tool. If you have trouble installing igraph, you can use NetworkX by modifying the code of line 186-189 in train_bp.py and replacing "mwgm_graph_tool" and "mwgm_igraph" with "mwgm_networkx". Note that, igraph and NetworkX are much slower than Graph-tool!

If you have any difficulty or question in running code and reproducing experiment results, please email to zqsun.nju@gmail.com and whu@nju.edu.cn.

Citation

If you use this model or code, please cite it as follows:

@inproceedings{BootEA,
  author    = {Zequn Sun and Wei Hu and Qingheng Zhang and Yuzhong Qu},
  title     = {Bootstrapping Entity Alignment with Knowledge Graph Embedding},
  booktitle = {IJCAI},
  pages     = {4396--4402},
  year      = {2018}
}

Links

The following links point to some recent work that uses our datasets: