:warning: This project (Siren "Join") is superseded by the new Siren "FEDERATE" plugin (AKA Vanguard).

Siren Federate is capable of fully distributed (scale with the number of machines) Elasticsearch joins and can even perform joins across multiple backends making JDBC datasources appear as if they were Elasticsearch indexes.

Siren Federate is available for Elasticsearch 5.x, and soon 6.x

For more information and downloads see http://siren.io

(Superseded) The SIREn Join Plugin for Elasticsearch 2.x

This plugin extends Elasticsearch with new search actions and a filter query parser that enables to perform a "Filter Join" between two set of documents (in the same index or in different indexes).

The Filter Join is basically a (left) semi-join between two set of documents based on a common attribute, where the result only contains the attributes of one of the joined set of documents. This join is used to filter one document set based on a second document set, hence its name. It is equivalent to the EXISTS() operator in SQL.

Compatibility

The following table shows the compatibility between releases of Elasticsearch and the SIREn Join plugin:

Elasticsearch SIREn Join
2.4.5 2.4.5
2.4.4 2.4.4
2.4.3 2.4.3
2.4.2 2.4.2-1
2.4.1 2.4.1-1
2.3.5 2.3.5-1
2.3.4 2.3.4-1
2.3.3 2.3.3-1
2.2.0 2.2.0-1
2.1.2 2.1.2
2.1.1 2.1.1
1.7.x 1.0

Installing the Plugin

Online Download

You can use the following command to download the plugin from the online repository:

$ bin/plugin install solutions.siren/siren-join/2.4.4

Offline Download

Manual

Alternatively, you can assemble it via Maven (you must build it as a non-root user):

$ git clone [email protected]:sirensolutions/siren-join.git
$ cd siren-join
$ mvn package

This creates a single Zip file that can be installed using the Elasticsearch plugin command:

$ bin/plugin install file:/PATH-TO-SIRENJOIN-PROJECT/target/releases/siren-join-2.4.4.zip

Interacting with the Plugin

You can now start Elasticsearch and see that our plugin gets loaded:

$ bin/elasticsearch
...
[2013-09-04 17:33:27,443][INFO ][node    ] [Andrew Chord] initializing ...
[2013-09-04 17:33:27,455][INFO ][plugins ] [Andrew Chord] loaded [siren-join], sites []
...

To uninstall the plugin:

$ bin/plugin remove siren-join

Usage

Coordinate Search API

This plugin introduces two new search actions, _coordinate_search that replaces the _search action, and _coordinate_msearch that replaces the _msearch action. Both actions are wrappers around the original elasticsearch actions and therefore supports the same API. One must use these actions with the filterjoin filter, as the filterjoin filter is not supported by the original elaticsearch actions.

Parameters

Example

In this example, we will join all the documents from index1 with the documents of index2. The query first filters documents from index2 and of type type with the query { "terms" : { "tag" : [ "aaa" ] } }. It then retrieves the ids of the documents from the field id specified by the parameter path. The list of ids is then used as filter and applied on the field foreign_key of the documents from index1.

    {
      "bool" : {
        "filter" : {
          "filterjoin" : {
            "foreign_key" : {
              "indices" : ["index2"],
              "types" : ["type"],
              "path" : "id",
              "query" : {
                "terms" : {
                  "tag" : [ "aaa" ]
                }
              }
            }
          }
        }
      }
    }

Response Format

The response returned by the coordinate search API is identical to the response returned by Elasticsearch's search API, but augmented with additional information about the execution of the relational query planning. This additional information is stored within the field named coordinate_search at the root of the response, see example below. The object contains the following parameters:

    {
      "coordinate_search": {
        "actions": [
          {
            "relations": {
              "from": {
                "indices": ["index2"],
                "types": ["type"],
                "field": "id"
              },
              "to": {
                "indices": null,
                "types": null,
                "field": "foreign_key"
              }
            },
            "size": 2,
            "size_in_bytes": 20,
            "is_pruned": false,
            "cache_hit": false,
            "terms_encoding" : "long",
            "took": 313
          }
        ]
      },
    ...
    }

Performance Considerations

Acknowledgement

Part of this plugin is inspired and based on the pull request 3278 submitted by Matt Weber to the Elasticsearch project.


Copyright (c) 2016, SIREn Solutions. All Rights Reserved.