MYRA

MYRA is a collection of Ant Colony Optimization (ACO) algorithms for the data mining classification and regression tasks. It includes popular rule induction and decision tree induction algorithms. The algorithms are ready to be used from the command line or can be easily called from your own Java code. They are implemented using a modular architecture, so they can be easily extended to incorporate different procedures and/or use different parameter values.

This repository contains a complete rewrite of the code (by the same author) from the MYRA project hosted at sourceforge. The computational time has been significantly improved — tasks that used to take minutes, now are done in seconds — although it was not possible to maintain backward compatibility. You will find that the overall architecture is very similar, but most of the data structures have changed.

While this repository is a fresh start, the versioning is maintained — version 4.x is the new version of the refactored code. If you are interested in the hierarchical multi-label algorithms (3.x versions), check the sourceforge repository. These algorithms will eventually be refactored into this repository.

Latest Release

Algorithms

The following algorithms are implemented:

Ant-Miner
Main class: myra.classification.rule.impl.AntMiner

The first rule induction ACO classification algorithm. Ant-Miner uses a sequentical covering strategy combined with an ACO search to create a list of rules. Ant-Miner only supports categorical attributes, continuous attributes need to be discretised in a pre-processing step.

cAnt-Miner
Main class: myra.classification.rule.impl.cAntMiner

An extension of Ant-Miner to cope with continuous attributes. It works essentially the same as Ant-Miner, but continuous attributes undergo a dynamic discretisation process when selected to be included in a rule.

cAnt-MinerPB
Main class: myra.classification.rule.impl.cAntMinerPB

cAnt-MinerPB incorporates a new strategy to discover a list of classification rules, which guides the search performed by the ACO algorithm using the quality of a candidate list of rules, instead of a single rule. The main motivation is to avoid the problem of rule interaction derived from the order in which the rules are discovered — i.e., the outcome of a rule affects the rules that can be discovered subsequently since the search space is modified due to the removal of examples covered by previous rules.

Unordered cAnt-MinerPB
Main class: myra.classification.rule.impl.UcAntMinerPB

An extension to the cAnt-MinerPB in order to discover unordered rules to improve the interpretation of individual rules. In an ordered list of rules, the effect (meaning) of a rule depends on all previous rules in the list, since a rule is only used if all previous rules do not cover the example. On the other hand, in an unordered set of rules, an example is shown to all rules and, depending on the conflict resolution strategy, a single rule is used to make a prediction.

Ant-Tree-Miner
Main class: myra.classification.tree.AntTreeMiner

A decision tree induction algorithm that uses an ACO procedure to creates decision trees. Trees are created in a top-down fashion, similar to C4.5 strategy, but instead of using a greedy procedure based on the information gain, it selects decision nodes using an ACO procedure.

Ant-Miner-Reg
Main class: myra.regression.rule.impl.AntMinerReg

The first rule induction ACO regression algorithm. Ant-Miner-Reg uses a sequentical covering strategy combined with an ACO search to create a list of regression rules.

Running the algorithms

All algorihtms can be used in the command line:

java -cp myra-<version>.jar <main class> -f <arff training file>

where <version> is MYRA version number (e.g., 4.5), <main class> is the main class name of the algorithm and <aff training file> is the path to the ARFF file to be used as training data. The minimum requirement to run an algorihtm is a training file. If no training file is specified, a list of options is printed:

[febo@uok myra]$ java -cp myra-4.5.jar myra.classification.rule.impl.cAntMinerPB

Usage: cAntMinerPB -f <arff_training_file> [-t <arff_test_file>] [options]

The minimum required parameter is a training file to build the model from. If a 
test file is specified, the model will be tested at the end of training. The 
results are presented in a confusion matrix. 

The following options are available:

  -c <size>             specify the size of the colony (default 5) 

  -d <method>           specify the discretisation [c45 | mdl] (default mdl) 

  -e <factor>           set the MAX-MIN evaporation factor (default 0.9) 

  -g                    enables the dynamic heuristic computation 

  -h <method>           specify the heuristic method [gain | none] (default 
                        gain) 

  -i <number>           set the maximum number of iterations (default 500) 

  -l <function>         specify the rule list quality function [accuracy | 
                        pessimistic] (default pessimistic) 

  -m <number>           set the minimum number of covered examples per rule 
                        (default 10) 

  -p <method>           specify the rule pruner method [backtrack | greedy] 
                        (default backtrack) 

  -r <function>         specify the rule quality function [laplace | sen_spe] 
                        (default sen_spe) 

  -s <seed>             Random seed value (default current time) 

  -u <percentage>       set the percentage of allowed uncovered examples 
                        (default 0.01) 

  -x <iterations>       set the number of iterations for convergence test 
                        (default 40) 

  --parallel <cores>    enable parallel execution in multiple cores; if no cores 
                        are specified, use all available cores 

Usinng command-line options you can tweak the parameters of an algorithm. Note that when running the algorithm in parallel (--parallel option), there is no guarantee that it will have the same behaviour even if the same seed value is used (-s option), since the thread allocation is not controlled by the code.

Citation Policy

If you publish material based on algorithms from MYRA, please include a reference to the paper describing the algorithm. All papers are available online.

If you also would like to make a reference to the MYRA repository, please include a citation to:

@INPROCEEDINGS{otero:myra,
    author    = "F.E.B. Otero",
    year      = "2017",
    title     = "{MYRA}: {A} {J}ava {A}nt {C}olony {O}ptimization {F}ramework for {C}lassification {A}lgorithms",
    booktitle = {Proceedings of the Genetic and Evolutionary Computation Conference Companion (GECCO '17 Companion)},
    publisher = {ACM Press},
    pages     = {1247--1254},
    year      = {2017}
}
Ant-Miner
@ARTICLE{Parpinelli02datamining,
    author  = {R.S. Parpinelli and H.S. Lopes and A.A. Freitas},
    title   = {Data Mining with an Ant Colony Optimization Algorithm},
    journal = {IEEE Transactions on Evolutionary Computation},
    year    = {2002},
    volume  = {6},
    number  = {4},
    pages   = {321--332}
}
cAnt-Miner
cAnt-MinerPB
Unordered cAnt-MinerPB
Ant-Tree-Miner
Ant-Miner-Reg