This library is based on the implementation of artificial neural networks in Spark ML. In addition to the multilayer perceptron, it contains new Spark deep learning features that were not yet merged to Spark ML. Currently, they are Stacked Autoencoder and tensor data flow. Highlights of the library:
Clone and compile:
git clone https://github.com/avulanov/scalable-deeplearning.git
cd scalable-deeplearning
sbt assembly (or mvn assembly)
The jar library will be availabe in target
folder. assembly
includes optimized numerical processing library netlib-java. Optionally, one can build package
.
Scaladl uses netlib-java library for optimized numerical processing with native BLAS. All netlib-java classes are included in scaladl.jar. The latter has to be in the classpath before Spark's own libraries because Spark has a subset of netlib. In order to do this, set spark.driver.userClassPathFirst
to true
in spark-defaults.conf
.
If native BLAS libraries are not available at runtime or scaladl is not the first in the classpath, you will see a warning WARN BLAS: Failed to load implementation from:
and reference or pure JVM implementation will be used. Native BLAS library such as OpenBLAS (libopenblas.so
or dll
) or ATLAS (libatlas.so
) should be in the path of all nodes that run Spark. Netlib-java requires the library to be named as libblas.so.3
, and one has to create a symlink. The same is for Windows and libblas3.dll
. Below are the setup details for different platforms. With proper configuration, you will see an info INFO JniLoader: successfully loaded ...netlib-native_system-....
Install native blas library (depending on your distributive):
yum install openblas <OR> apt-get openblas <OR> download and compile OpenBLAS
Create symlink to native BLAS within its folder /your/blas
ln -s libopenblas.so libblas.so.3
Add it to your library path. Make sure there is no other folder with libblas.so.3
in your path.
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/your/blas
Copy the following dlls from MINGW distribution and from OpenBLAS to the folder blas
. Make sure they are all the same 64 or 32 bit. Add that folder to your path
variable.
libquadmath-0.dll // MINGW
libgcc_s_seh-1.dll // MINGW
libgfortran-3.dll // MINGW
libopeblas.dll // OpenBLAS binary
liblapack3.dll // copy of libopeblas.dll
libblas3.dll // copy of libopenblas.dll
Scaldl provides working examples of MNIST classification and pre-training with stacked autoencoder. Examples are in scaladl.examples
package. They can be run via Spark submit:
./spark-submit --class scaladl.examples.MnistClassification --master spark://master:7077 /path/to/scaldl.jar /path/to/mnist-libsvm
Start Spark with this library:
./spark-shell --jars scaladl.jar
Or use it as external dependency for your application.
MNIST classification
import org.apache.spark.ml.scaladl.MultilayerPerceptronClassifier
val train = spark.read.format("libsvm").option("numFeatures", 784).load("mnist.scale").persist()
val test = spark.read.format("libsvm").option("numFeatures", 784).load("mnist.scale.t").persist()
train.count() // materialize data lazy persisted in memory
test.count() // materialize data lazy persisted in memory
val trainer = new MultilayerPerceptronClassifier().setLayers(Array(784, 32, 10)).setMaxIter(100)
val model = trainer.fit(train)
val result = model.transform(test)
Pre-training
import org.apache.spark.ml.scaladl.{MultilayerPerceptronClassifier, StackedAutoencoder}
val train = spark.read.format("libsvm").option("numFeatures", 784).load(mnistTrain).persist()
train.count()
val stackedAutoencoder = new StackedAutoencoder().setLayers(Array(784, 32))
.setInputCol("features")
.setOutputCol("output")
.setDataIn01Interval(true)
.setBuildDecoder(false)
val saModel = stackedAutoencoder.fit(train)
val autoWeights = saModel.encoderWeights
val trainer = new MultilayerPerceptronClassifier().setLayers(Array(784, 32, 10)).setMaxIter(1)
val initialWeights = trainer.fit(train).weights
System.arraycopy(autoWeights.toArray, 0, initialWeights.toArray, 0, autoWeights.toArray.length)
trainer.setInitialWeights(initialWeights).setMaxIter(10)
val model = trainer.fit(train)
Contributions are welcome, in particular in the following areas: