TagRec

Towards A Standardized Tag Recommender Benchmarking Framework

Description

The aim of TagRec (please cite) is to provide the community with a simple to use, generic tag-recommender framework written in Java to evaluate novel tag-recommender algorithms with a set of well-known std. IR metrics such as nDCG, MAP, MRR, Precision ([email protected]), Recall ([email protected]), F1-score ([email protected]), Diversity (D), Serendipity (S), User Coverage (UC) and folksonomy datasets such as BibSonomy, CiteULike, LastFM, Flickr, MovieLens or Delicious and to benchmark the developed approaches against state-of-the-art tag-recommender algorithms such as MP, MP_r, MP_u, MP_u,r, CF, APR, FR, GIRP, GIRPTM, etc.

Furthermore, it contains algorithms to process datasets (e.g., p-core pruning, leave-one-out or 80/20 splitting, LDA topic creation and create input files for other recommender frameworks).

The software already contains four novel tag-recommender approaches based on cognitive science theory. The first one (3Layers) (Seitlinger et al, 2013) uses topic information and is based on the ALCOVE/MINERVA2 theories (Krutschke, 1992; Hintzman, 1984). The second one (BLL+C) (Kowald et al., 2014b) uses time information is based on the ACT-R theory (Anderson et al., 2004). The third one (3LT) (Kowald et al., 2015b) is a combination of the former two approaches and integrates the time component on the level of tags and topics. Finally, the fourth one (BLLac+MPr) extends the BLL+C algorithm with semantic correlations (Kowald et al., 2015a).

Apart from this, TagRec also contains algorithms for the personalized recommendation of resources / items in social tagging systems. In this respect TagRec includes a novel algorithm called CIRTT (Lacic et al., 2014) that integrates tag and time information using the BLL-equation coming from the ACT-R theory (Anderson et al, 2004). Furthermore, it contains another novel item-recommender called SUSTAIN+CFu (Seitlinger et al., 2015) that improves user-based CF via integrating the addentional focus of users via the SUSTAIN model (Love et al., 2004).

TagRec was also utilized for the recommendation of hashtags in Twitter (Kowald et al., 2017). Thus, it contains an initial set of algorithms for this task as well. For this, TagRec also contains a connection to the Apache Solr search engine framework. Finally, we also utilized TagRec for predicting and recommendation music preferences. Here, all tag recommendation algorithms can also be used for predicting music artists and genres.

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details. You should have received a copy of the GNU Affero General Public License along with this program. If not, see http://www.gnu.org/licenses/.

Please cite the paper if you use this software in one of your publications.

Download

The source-code can be directly checked-out through this repository. It contains an Eclipse project to edit and build it and an already deployed .jar file for direct execution. Furthermore, the folder structure that is provided in the repository is needed, where csv is the input directory and metrics is the output directory in the data folder. Both of these directories contain subdirectories for the different datasets:

How-to-use

The tagrec .jar uses three parameters:

First the algorithm,

Tag-Recommender:

Resource-Recommender:

Hashtag-Recommender (Kowald et al., 2017):

Data-Processing:

, second the dataset(-directory):

and third the filename (without file extension)

Example: java -jar tagrec.jar bll_c bib bib_sample

Input format

The input-files have to be placed in the corresponding subdirectory and are in csv-format (file extension: .txt) with 5 columns (quotation marks are mandatory):

Example: "0";"13830";"986470059";"deri,web2.0,tutorial,www,conference";""

There are three files needed:

Example: bib_sample_train.txt, bib_sample_test.txt, bib_sample.txt (combination of train and test file)

Output format

The output-file is generated in the corresponding subdirectory and is in csv-format with the following columns:

for k = 1 to 10 (or 20) - each line is one k

Example: 0,5212146123336273;0,16408544726301685;0,22663857529082376 ...

Citation

Kowald, D., Kopeinik, S., & Lex, E. (2017). The TagRec Framework as a Toolkit for the Development of Tag-Based Recommender Systems. In Adjunct Publication of the 25th Conference on User Modeling, Adapation and Personalization (UMAP'2017). ACM.

Bibtex: @inproceedings{kowaldumap2017, author = {Kowald, Dominik and Kopeinik, Simone and Lex, Elisabeth}, title = {The TagRec Framework As a Toolkit for the Development of Tag-Based Recommender Systems}, booktitle = {Adjunct Publication of the 25th Conference on User Modeling, Adaptation and Personalization}, series = {UMAP '17}, year = {2017}, isbn = {978-1-4503-5067-9}, location = {Bratislava, Slovakia}, pages = {23--28}, numpages = {6}, url = {http://doi.acm.org/10.1145/3099023.3099069}, doi = {10.1145/3099023.3099069}, acmid = {3099069}, publisher = {ACM}, address = {New York, NY, USA}, keywords = {hashtag recommendation, recommendation evaluation, recommender framework, recommender systems, tag recommendation} }

Publications

References

Main contact and contributor

Contacts and contributors