Welcome to GrimoireELK Build StatusCoverage Status

GrimoireELK is the component that interacts with the ElasticSearch database. Its goal is two-fold, first it aims at offering a convenient way to store the data coming from Perceval, second it processes and enriches the data in a format that can be consumed by Kibiter.

The Perceval data is stored in ElasticSearch indexes as raw documents (one per item extracted by Perceval). Those raw documents, which will be referred to as "raw data" in this documentation, include all information coming from the original data source which grants the platform to perform multiple analysis without the need of downloading the same data over and over again. Once raw data is retrieved, a new phase starts where data is enriched according to the data source from where it was collected and stored in ElasticSearch indexes. The enrichment removes information not needed by Kibiter and includes additional information which is not directly available within the raw data. For instance, pair programming information for Git data, time to solve (i.e., close or merge) issues and pull requests for GitHub data, and identities and organization information coming from SortingHat . The enriched data is stored as JSON documents, which embed information linked to the corresponding raw documents to ease debugging and guarantee traceability.

Raw data

Each raw document stored in an ElasticSearch index contains a set of common first level fields, regardless of the data source:

Enriched data

Each enriched index includes one or more types of documents, which are summarized below.

Fields

Each enriched document contains a set of fields, they can be (i) common to all data sources (e.g., metadata fields, time field), (ii) specific to the data source, (iii) related to contributor’s profile information (i.e., identity fields) or (iv) to the project listed in the Mordred projects.json (i.e., project fields).

Metadata fields

Identity fields

Project fields

Time field:

Demography fields:

Extra fields:

Data source specific fields

Details of the fields of each data source is available in the Schema folder.

Running tests

Tests are located in the folder tests. In order to run them, you need to have in your machine:

Then you need to:

The full battery of tests can be executed with run_tests.py. However, it is also possible to execute a sub-set of tests by running the single test files (test_* files in the tests folder)

The tests can be run in combination with the Python package coverage. The steps below show how to do it:

$ pip3 install coveralls
$ cd <path-to-ELK>/tests
$ python3 -m coverage run run_tests.py --source=grimoire_elk 

pycharm-config-run_tests

Coverage will generate a file .coverage in the tests folder, which can be inspected with the following command:

cd <path-to-ELK>/tests
python3 -m coverage report -m

pycharm-config_report

The output will be similar to the following one:

Name                                                                                                                Stmts   Miss  Cover   Missing
--------------------------------------------------------------------------------------------------------------------------------------------------
.../ELK/grimoire_elk/__init__.py                                                                                       4      0   100%
.../ELK/grimoire_elk/_version.py                                                                                       1      0   100%