This is the "Bayesian network Learning Improved Project" (blip), an open-source Java package that offers a wide range of structure learning algorithms. It is developed my Mauro Scanagatta and it is distributed under the LGPL-3 by IDSIA.
It focuses on score-based learning, mainly the BIC and the BDeu score functions, and allows the user to learn BNs from datasets containing thousands of variables. It provides state-of-the-art algortihms for the following tasks: parent set identification ( BIC ), general structure optimization (WINASOBS-ENT), bounded treewidth structure optimization (KMAX) and structure learning on incomplete data sets (SEM-KMAX).
An R binding is also available: (https://github.com/mauro-idsia/r.blip).
This package implements the algorithms detailed in the following papers:
The process of learning a bounded-treewidth BN is explained by using the "child" network as example.
The format for the initial dataset has to be the same as the file "child-5000.dat", namely a space-separated file containing:
* First line: list of variables names, separated by space; * Second line: list of variables cardinalities, separated by space; * Following lines: list of values taken by the variables in each datapoint, separated by space.
The first step is build the parent sets score cache. The state-of-the-art approach is to use BIC* (for the BIC score):
java -jar blip.jar scorer.is -d data/child-5000.dat -j data/child-5000.jkl -t 10 -b 0
Given the parent sets score cache, now it is time to learn the structure. The state-of-the-art approach is to use WINASOBS (Windows operator applied to ASOBS) with ENT (entropy-based) ordering:
java -jar blip.jar solver.winasobs.adv -smp ent -d data/child-5000.dat -j data/child-5000.jkl -r data/child.wa.res -t 10 -b 0
Given the parent sets score cache, it is possible to learn a structure under a bounded treewidth constraints. The state-of-the-art approach is to use k-max:
For perfoming with k-max:
java -jar blip.jar solver.kmax -w 4 -j data/child-5000.jkl -r data/child-5000.kmax.res -t 10 -b 0
To learn a structure from data containing missing values the state-of-the-art approach is to use SEM-kMAX:
java -jar blip.jar imputation.sem -d data/child-5000-missing.dat -o data/child-5000-imputed.dat -r data/child.res -t 1 -tmp data/tmp -w 6 -b 0
The format of the ".res" file is as follows: each line indicates the parent set assigned to each variable and its score.
For example the line "4: -2797.39 (10,17,18)" indicates that to the variable with index 4 in the dataset are assgined as parents the variables with index (10,17,18). This parent set has score -2797.39 (by default the score function is the BIC).
Using the structure found it is possible to learn the parameters with:
java -jar blip.jar parle -d data/child-5000.dat -r data/child-5000.kmax.res -n data/child-5000.kmax.uai
The final output will be a full Bayesian network in UAI format.