Spark Salesforce Library

A library for connecting Spark with Salesforce and Salesforce Wave.

Requirements

This library requires Spark 2.x.

For Spark 1.x support, please check spark1.x branch.

Linking

You can link against this library in your program at the following ways:

Maven Dependency

<dependency>
    <groupId>com.springml</groupId>
    <artifactId>spark-salesforce_2.11</artifactId>
    <version>1.1.3</version>
</dependency>

SBT Dependency

libraryDependencies += "com.springml" % "spark-salesforce_2.11" % "1.1.3"

Using with Spark shell

This package can be added to Spark using the --packages command line option. For example, to include it when starting the spark shell:

$ bin/spark-shell --packages com.springml:spark-salesforce_2.11:1.1.3

Features

Options

Options only supported for fetching Salesforce Objects.

Scala API

// Writing Dataset
// Using spark-csv package to load dataframes
val df = spark.
                read.
                format("com.databricks.spark.csv").
                option("header", "true").
                load("your_csv_location")
df.
   write.
   format("com.springml.spark.salesforce").
   option("username", "your_salesforce_username").
   option("password", "your_salesforce_password_with_secutiry_token"). //<salesforce login password><security token>
   option("datasetName", "your_dataset_name").
   save()

// Reading Dataset
val saql = "q = load \"<dataset_id>/<dataset_version_id>\"; q = foreach q generate  'Name' as 'Name',  'Email' as 'Email';"
val sfWaveDF = spark.
                read.
                format("com.springml.spark.salesforce").
                option("username", "your_salesforce_username").
                option("password", "your_salesforce_password_with_secutiry_token"). //<salesforce login password><security token>
                option("saql", saql)
                option("inferSchema", "true").
                load()

// Reading Salesforce Object
val soql = "select id, name, amount from opportunity"
val sfDF = spark.
                read.
                format("com.springml.spark.salesforce").
                option("username", "your_salesforce_username").
                option("password", "your_salesforce_password_with_secutiry_token"). //<salesforce login password><security token>
                option("soql", soql).
                option("version", "37.0").
                load()

// Update Salesforce Object
// CSV should contain Id column followed other fields to be Updated
// Sample - 
// Id,Description
// 003B00000067Rnx,Superman
// 003B00000067Rnw,SpiderMan
val df = spark.
                read.
                format("com.databricks.spark.csv").
                option("header", "true").
                load("your_csv_location")
df.
   write.
   format("com.springml.spark.salesforce").
   option("username", "your_salesforce_username").
   option("password", "your_salesforce_password_with_secutiry_token"). //<salesforce login password><security token>
   option("sfObject", "Contact").
   save()

Java API

// Writing Dataset
DataFrame df = spark
                    .read()
                    .format("com.databricks.spark.csv")
                    .option("header", "true")
                    .load("your_csv_location");
df.write()
      .format("com.springml.spark.salesforce")
          .option("username", "your_salesforce_username")
          .option("password", "your_salesforce_password_with_secutiry_token") //<salesforce login password><security token>
          .option("datasetName", "your_dataset_name")
          .save();

// Reading Dataset
String saql = "q = load \"<dataset_id>/<dataset_version_id>\"; q = foreach q generate  'Name' as 'Name',  'Email' as 'Email';"
DataFrame sfWaveDF = spark.
          read().
          format("com.springml.spark.salesforce").
          option("username", "your_salesforce_username").
          option("password", "your_salesforce_password_with_secutiry_token"). //<salesforce login password><security token>
          option("saql", saql)
          option("inferSchema", "true").
          load()

// Reading Salesforce Object
String soql = "select id, name, amount from opportunity"
DataFrame sfDF = spark.
          read.
          format("com.springml.spark.salesforce").
          option("username", "your_salesforce_username").
          option("password", "your_salesforce_password_with_secutiry_token"). //<salesforce login password><security token>
          option("soql", soql).
          option("version", "37.0").
          load()      

// Update Salesforce Object
// CSV should contain Id column followed other fields to be Updated
// Sample - 
// Id,Description
// 003B00000067Rnx,Superman
// 003B00000067Rnw,SpiderMan
DataFrame df = spark
                    .read()
                    .format("com.databricks.spark.csv")
                    .option("header", "true")
                    .load("your_csv_location");
df.write().format("com.springml.spark.salesforce")
      .option("username", "your_salesforce_username")
      .option("password", "your_salesforce_password_with_secutiry_token")//<salesforce login password><security token>
      .option("sfObject", "Contact")
      .save();

R API

# Writing Dataset
df <- read.df("your_csv_location", source = "com.databricks.spark.csv", inferSchema = "true")
write.df(df, path="", source='com.springml.spark.salesforce', mode="append", datasetName="your_dataset_name", username="your_salesforce_username", password="your_salesforce_password_with_secutiry_token") #<salesforce login password><security token>

# Reading Dataset
saql <- "q = load \"<dataset_id>/<dataset_version_id>\"; q = foreach q generate  'Name' as 'Name',  'Email' as 'Email';"
sfWaveDF <- read.df(source="com.springml.spark.salesforce", username=your_salesforce_username, password=your_salesforce_password_with_secutiry_token, saql=saql) #<salesforce login password><security token>

# Reading Salesforce Object
soql <- "select id, name, amount from opportunity"
dfDF <- read.df(source="com.springml.spark.salesforce", username=your_salesforce_username, password=your_salesforce_password_with_secutiry_token, soql=soql) #<salesforce login password><security token>

# Update Salesforce Object
# CSV should contain Id column followed other fields to be Updated
# Sample - 
# Id,Description
# 003B00000067Rnx,Superman
# 003B00000067Rnw,SpiderMan
df <- read.df("your_csv_location", source = "com.databricks.spark.csv", header = "true")
write.df(df, path="", source='com.springml.spark.salesforce', mode="append", sfObject="Contacct", username="your_salesforce_username", password="your_salesforce_password_with_secutiry_token") #<salesforce login password><security token>

Metadata Configuration

This library constructs [Salesforce Wave Dataset Metadata] (https://resources.docs.salesforce.com/sfdc/pdf/bi_dev_guide_ext_data_format.pdf) using Metadata Configuration present in resources. User may modifiy the default behaviour. User can modify already defined datatypes or user may add additional datatypes. For example, user can change the scale to 5 for float datatype

Metadata configuration has to be provided in JSON format via "metadataConfig" option. The structure of the JSON is


{
  "<df_data_type>": {
  "wave_type": "<wave_data_type>",
  "precision": "<precision>",
  "scale": "<scale>",
  "format": "<format>",
  "defaultValue": "<defaultValue>"
  }
}

More details on Salesforce Wave Metadata can be found [here] (https://resources.docs.salesforce.com/sfdc/pdf/bi_dev_guide_ext_data_format.pdf)

Sample JSON


{
  "float": {
  "wave_type": "Numeric",
  "precision": "10",
  "scale": "2",
  "format": "##0.00",
  "defaultValue": "0.00"
  }
}

Sample to provide metadata config

This sample is to change the format of the timestamp datatype.

// Default format is yyyy-MM-dd'T'HH:mm:ss.SSS'Z' and 
// the this sample changes to yyyy/MM/dd'T'HH:mm:ss
val modifiedTimestampConfig = """{"timestamp":{"wave_type":"Date","format":"yyyy/MM/dd'T'HH:mm:ss"}}"""
// Using spark-csv package to load dataframes
val df = spark.read.format("com.databricks.spark.csv").
                          option("header", "true").
                          load("your_csv_location")
df.
   write.
    format("com.springml.spark.salesforce").
    option("username", "your_salesforce_username").
    option("password", "your_salesforce_password_with_secutiry_token").
    option("datasetName", "your_dataset_name").
    option("metadataConfig", modifiedTimestampConfig).
    save()

Using this package in databricks

Create Spark Salesforce Package Library

Upload Databricks table into Salesforce Wave

Short Demo Video

Spark Salesforce Package Demo

Note

Salesforce wave does require atleast one "Text" field. So please make sure the dataframe has atleast one string type.

Building From Source

This library is built with SBT, which is automatically downloaded by the included shell script. To build a JAR file simply run sbt/sbt package from the project root. The build configuration includes support for both Scala 2.10 and 2.11.