Contextless ML implementation of Spark ML.
To serve small ML pipelines there is no need to create SparkContext
and use cluster-related features.
In this project we made our implementations for ML Transformer
s. Some of them call context-independent Spark methods.
Instead of using DataFrame
s, we implemented simple LocalData
class to get rid of SparkContext
.
All Transformer
s are rewritten to accept LocalData
.
scalaVersion := "2.11.8"
// Artifact name is depends of what version of spark are you usng for model training:
// spark 2.0.x
libraryDependencies += Seq(
"io.hydrosphere" %% "spark-ml-serving-2_0" % "0.3.0",
"org.apache.spark" %% "spark-mllib" % "2.0.2"
)
// spark 2.1.x
libraryDependencies += Seq(
"io.hydrosphere" %% "spark-ml-serving-2_1" % "0.3.0",
"org.apache.spark" %% "spark-mllib" % "2.1.2"
)
// spark 2.2.x
libraryDependencies += Seq(
"io.hydrosphere" %% "spark-ml-serving-2_2" % "0.3.0",
"org.apache.spark" %% "spark-mllib" % "2.2.0"
)
import io.hydrosphere.spark_ml_serving._
import LocalPipelineModel._
// .... val model = LocalPipelineModel.load("PATH_TO_MODEL") // Load val columns = List(LocalDataColumn("text", Seq("Hello!"))) val localData = LocalData(columns) val result = model.transform(localData) // Transformed result
More examples of different ML models are in [tests](/src/test/scala/io/hydrosphere/spark_ml_serving/LocalModelSpec.scala).