Spark On Angel (SONA), arming Spark with a powerful Parameter Server, which enable Spark to train very big models
Similar to Spark MLlib, Spark on Angel is a standalone machine learning library built on Spark (yet it does not rely on Spark MLlib, Figure 1). SONA was based on RDD APIs and only included model training step in previous versions. In Angel 3.0, we introduce various new features to SONA:
Figure 1: SONA is a another machine learning & graph library on Spark Core |
Figure 2 demonstrate the run time architecture of SONA.
Figure 2: Architecture of SONA |
AngelClient
on Spark driver. AngelClient
is used to start Angel parameter server, create, load, initial and save matrix of the model. PSClient/PSAgent
on Spark executor. Algorithms can pull parameter and push gradient through PSAgent
Task
Compared to previous version, a variety of new algorithms were added on SONA, such as Deep & Cross Network (DCN) and Attention Factorization Machines (AFM). As can be seen from Figure 2, there are significant differences between algorithms on SONA and those on Spark: algorithms on SONA are mainly designated for recommendations and graph embedding, while algorithms on Spark tend to be more general-purpose.
Figure 3: Algorithms comparison of Spark and Angel |
As a result, SONA can serve as a supplement of Spark
Figure 4: Programming Example of SONA |
Figure 4 provides an example of running distributed machine learning algorithms on SONA, including following steps:
SONA supports three types of runtime models: YARN, K8s and Local. The local mode enable it easy to debug. sona quick start