Spark OrientDB Connector

This library allows you to:

leveraging a co-located hybrid Spark/OrientDB cluster.

How to get started

Prerequisites

Download source code and build connector

Clone connector repository with the following git commands:

    git clone https://github.com/metreta/spark-orientdb-connector.git
    cd spark-orientdb-connector
    git checkout tags/<tag_version>

where <tag version> is the version compatible with your Spark and OrientDB clusters you can find in the version list.

Then build the connector with sbt:

    sbt package

Create an sbt project and add dependencies

Create a basic Spark sbt project, add the connector jar you just built to the /lib folder and then add the following dependencies to your build.sbt:

  libraryDependencies ++= Seq(
    "com.orientechnologies" % "orientdb-core" % "2.1.0",
    "com.orientechnologies" % "orientdb-client" % "2.1.0",
    "org.apache.spark" % "spark-core_2.11" % "1.4.0",
    "org.apache.spark" % "spark-graphx_2.11" % "1.4.0",
    "com.tinkerpop.blueprints" % "blueprints-core" % "2.6.0",
    "com.orientechnologies" % "orientdb-graphdb" % "2.1.0",
    "com.orientechnologies" % "orientdb-distributed" % "2.1.0"
    )

Don't forget to choose the appropriate library versions as listed in the version list.

Set up example data in OrientDB

Define a class Person in your OrientDB instance:

CREATE CLASS Person EXTENDS V
CREATE PROPERTY Person.name string
CREATE PROPERTY Person.surname string
CREATE CLASS Friendship EXTENDS E

Insert some data to create a graph:

CREATE VERTEX Person SET name = 'John', surname = 'Doe'
CREATE VERTEX Person SET name = 'Mary', surname = 'Smith'
CREATE VERTEX Person SET name = 'Frank', surname = 'White'
CREATE VERTEX Person SET name = 'Lois', surname = 'Parker'

CREATE EDGE Friendship FROM (SELECT FROM Person WHERE name = 'John' and surname = 'Doe') TO (SELECT FROM Person WHERE name = 'Mary' and surname = 'Smith') 
CREATE EDGE Friendship FROM (SELECT FROM Person WHERE name = 'John' and surname = 'Doe') TO (SELECT FROM Person WHERE name = 'Frank' and surname = 'White')
CREATE EDGE Friendship FROM (SELECT FROM Person WHERE name = 'Frank' and surname = 'White') TO (SELECT FROM Person WHERE name = 'Lois' and surname = 'Parker')

Write an empty Spark app class

package com.mycompany.sparkorientdbdemo

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import com.metreta.spark.orientdb.connector._

object Demo extends App{

// Spark app code

}

Configure Spark context

Create a Spark configuration object and set OrientDB connector specific parameters :

val conf = new SparkConf()
    .setMaster("local[*]")
    .setAppName("OrientDBConnectorTest")
    .set("spark.orientdb.clustermode", "remote")
    .set("spark.orientdb.connection.nodes", "x.x.x.x, y.y.y.y, z.z.z.z") 
    .set("spark.orientdb.protocol", "remote") 
    .set("spark.orientdb.dbname", "connector-test")
    .set("spark.orientdb.port", "2424")
    .set("spark.orientdb.user", "root")
    .set("spark.orientdb.password", "pAzzw0rd")

Now create a SparkContext:

val sc = new SparkContext(conf)

Read and write data

Following OrientDB multi-model approach the connector allows to access OrientDB data from Spark and write them back to OrientDB in two ways:

Use the function orientQuery to get an RDD containing all the entries of an OrientDB class:

    val rddPeople = sc.orientQuery("Person")

rddPeople is an object of OrientClassRDD type containing all the entries from the Person class as OrientDocument objects.

The function saveToOrient writes an RDD of case objects to an OrientDB class:

    rddMyPeople.saveToOrient("Person")

To get a GraphX graph object from an OrientDB database use the orientGraph function:

    val graphPeople = sc.orientGraph()

License

Copyright 2015, Metreta Information Technology s.r.l.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.