Smatch is an evaluation tool for
AMR (Abstract Meaning Representation). It
computes the Smatch score (defined below) of two AMR graphs in terms
of their matching triples (edges) by finding a variable (node) mapping
that maximizes the count, M
, of matching triples, then:
M
is the number of matching triplesT
is the total number of triples in the first AMRG
is the total number of triples in the second AMRP = M/T
R = M/G
F = 2 * (P*R)/(P+R)
For more information, see Cai and Knight, 2013.
This Smatch implementation is tested for Python 3.5 or higher. It is
released on PyPI so you can
install it with pip
:
$ pip install smatch
You can also clone this repository and run the smatch.py
script
directly as it does not need to be installed to be used.
To use the script, run it with at least the -f
option, which takes
two filename arguments:
$ smatch.py -f test.amr gold.amr
Note that the order of these arguments does not matter for the Smatch
score as the F-score is symmetric, but swapping the arguments will
swap the precision and recall. The files contain AMRs separated by a
blank line, with comment lines starting with #
(see
test_input1.txt
for an example).
For other options, try smatch.py --help
.
@inproceedings{cai-knight-2013-smatch,
title = "{S}match: an Evaluation Metric for Semantic Feature Structures",
author = "Cai, Shu and Knight, Kevin",
booktitle = "Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)",
month = aug,
year = "2013",
address = "Sofia, Bulgaria",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/P13-2131",
pages = "748--752",
}
You can help make your research reproducible by including the following information in your writing:
-r
) used, even if unchanged from the default-f
(if reporting precision and recall)The code was mostly developed during 2012 and 2013, and has undergone many fixes and updates. Note that the versions distributed for SemEval-2016 were numbered 2.0–2.0.2, but these predate this repository and the 1.0 series on PyPI. For more details, see the Changelog.
Here are some notable forks of Smatch:
And here are other evaluation metrics for AMR: