Konduit Serving is a serving system and framework focused on deploying machine learning pipelines to production. The core abstraction is an idea called a "pipeline step". An individual step is meant to perform a task as part of using a machine learning model in a deployment scenario. These steps generally include:
For instance, if you want to run arbitrary Python code for pre-processing purposes,
you can use a
PythonStep. To perform inference on a (mix of) TensorFlow,
Keras, Deeplearning4j (DL4J) or PMML models, use
Konduit Serving also contains functionality for other pre-processing tasks, such as DataVec transform processes or image transforms.
Konduit Serving was built with the goal of providing proper low level interoperability with native math libraries such as TensorFlow and our very own DL4J's core math library libnd4j.
Combining JavaCPP's low-level access to C-like APIs from Java with Java's robust server side application development (Vert.x on top of netty) allows for better access to faster math code in production while minimizing the surface area where native code = more security flaws (mainly in server side networked applications). This allows us to do things like in zero-copy memory access of NumPy arrays or Arrow records for consumption straight from the server without copy or serialization overhead.
When dealing with deep learning, we can handle proper inference on the GPU (batching large workloads).
Extending that to Python SDK, we know when to return a raw Arrow record and return it as a pandas DataFrame!
We also strive to provide a Python-first SDK that makes it easy to integrate Konduit Serving into a Python-first workflow.
Optionally, for the Java community, a Vert.x-based model server and pipeline development framework allow a thin abstraction that is embeddable in a Java microservice.
We want to expose modern standards for monitoring everything from your GPU to your inference time. Visualization can happen with applications such as Grafana or anything that integrates with the Prometheus standard for visualizing data.
Finally, we aim to provide integrations with more enterprise platforms typically seen outside the big data space.
python subdirectory for our Python SDK.
Upon startup, the server loads a
config.yaml file specified by
the user. If the user specifies a YAML file, it is converted to a
that is then loaded by Vert.x.
This gets loaded in to an InferenceConfiguration which contains a list of pipeline steps. Configuring the steps is relative to the implementation.
A small list (but not all!) of possible implementations can be found here.
An individual agent is a Java process that gets managed by a KonduitServingMain.
Outside of the pipeline components, the main configuration is a ServingConfig which contains information such as the expected port to start the server on, and the host to listen on (default localhost).
If you want your model server to listen on the public internet, please use
Port configuration varies relative to your type of packaging. For example, in Docker, it may not matter because the port is already mapped by Docker.
From there, your pipeline may run in to issues such as memory or warm up issues.
When dealing with either, there are generally a few considerations:
Warmup time for Python scripts (sometimes your Python script may require warming up the interpreter). In short, depending on what your Python script does when running the Python server, you may want to consider sending a warmup request to your application to obtain normal usage.
Python path: When using the Python step runner, an additional Anaconda distribution may be required for custom Python script execution. An end-to-end example can be found in the docker directory.
For monitoring, your server has an automatic
/metrics endpoint built in
that is pollable by Prometheus or something that can parse the Prometheus format.
A PID file automatically gets written upon startup. Overload the location with
Logging is done via logback. Depending on your application, you may want to override how the logging works.
This can be done by overriding the default
Configurations can be downloaded from the internet! Vert.x supports different ways of configuring different configuration providers. HTTP (without auth) and file are supported by default. For more on this, please see the official Vert.x docs and bundle your custom configuration provider within the built uberjar. If your favorite configuration provider isn't supported, please file an issue.
Timeouts: Sometimes work execution may take longer. If this is the case,
please consider looking at the
Other Vert.x arguments: Due to this being a Vert.x application at its core, other Vert.x JVM arguments will also work. We specify a few that are important for our specific application (such as file upload directories for binary files) in the KonduitServingMain but allow Vert.x arguments for startup as well.
For your specific application, consider using the built-in monitoring capabilities for both CPU and GPU memory to identify what your ideal Konduit Serving configuration should look like under load.
The core intended workflow is:
Configure a server, setting up:
OutputTypes of variables in your pipeline;
OutputFormatof inputs and outputs for the Konduit Serving instance;
ServingConfigurationcontaining things like host and port information; and
PipelineSteps that represent what steps a deployed pipeline should perform.
Configure a client to connect to the server.
In order to build Konduit Serving, you need to configure:
-D is a JVM argument and and -P is a Maven profile. Below we specify the requirements for each configuration.
Konduit Serving can run on a wide variety of chips including:
Konduit Serving supports Linux, macOS and Windows. Android and iOS (via gluon) are untested but should work (please let us know if you would like to try setting this up!).
Packaging Konduit Serving for a particular operating system typically will depend on the target system's supported chips. For example, we can target Linux with ARM or Intel architecture.
JavaCPP's platform classifier will also work depending only on the targeted chip.
For these concerns, we introduced the
to the build. This is a thin abstraction over JavaCPP's packaging
to handle targeting the right platform automatically.
To further thin out other binaries that may be included (such as opencv),
we may use
-Djavacpp.platform directly. This approach is mainly tested
with Intel chips right now. For other chips, please file an issue.
These arguments are as follows:
Specifying this can optimize the JAR size quite a bit, otherwise you end up with extra operating system-specific binaries in the jar. Initial feedback via GitHub Issues is much appreciated!
Konduit Serving packaging works by including all of the needed dependencies relative to the selected profiles/modules desired for inclusion with the package. Output size of the binary depends on a few core variables:
Many of the packaging options depend on the konduit-serving-distro-bom or Konduit Serving bill of materials module. This module contains all of the module inclusion behavior and all of the various dependencies that end up in the output.
All of the modules rely on building an uberjar and then packaging it in a platform-specific way.
javacpp.platform JVM argument
The modules included are relative to the Maven profiles. Modules are described below:
For now, there are no hosted packages beyond what is working in pip at the moment. Hosted repositories for the packaging formats listed above will be published later.
In order to configure Konduit Serving for your platform, use a Maven-based build profile.
An example running on CPU:
./mvnw -Ppython -Ppmml -Dchip=cpu -Djavacpp.platform=windows-x86_64 -Puberjar clean install -Dmaven.test.skip=true
This will automatically download and setup a Konduit Serving uberjar file (see the uberjar sub-directory)
containing all dependencies needed to run the platform. The output will be in the target directory of the packaging mechanism you specify (Docker, TAR, ..). For example, to build an uberjar, use the
-Puberjar profile, and the output will be found in
Konduit Serving supports customization via 2 ways:
Custom pipeline steps are generally recommended for performance reasons, but depending on scale, a Python step may be sufficient.
Running multiple versions of a Konduit Serving instance with an orchestrations system with load balancing etc will heavily rely on Vert.x functionality. Konduit Serving is fairly small in scope right now.
Depending on what the user is looking to do, we could support some built-in patterns in the future (for example load-balanced Konduit Serving).
Vert.x allows for different patterns that could be implemented in either Vert.x itself or in Kubernetes.
Cluster management is also possible using one of several cluster node managers allowing a concept of node membership. Communication with multiple nodes or processes happens over the Vert.x event bus. Examples can be found here for how to send messages between instances.
A recommended architecture for fault tolerance is to have an API gateway + load balancer setup with multiple versions of the same pipeline on a named endpoint. That named endpoint would represent a load balanced pipeline instance where one of many Konduit Serving instances may be served.
In a proper cluster, you would address each instance (an InferenceVerticle in this case representing a worker)
For configuration, we recommend versioning all of your assets that are needed alongside
config.json in something like a bundle where you can download each versioned asset
with its associated configuration and model and start the associated instances from that.
Reference KonduitServingMain for an example of the single node use case.
We will add clustering support based on these ideas at a later date. Please file an issue if you have specific questions in trying to get a cluster set up.
Every module in this repo is licensed under the terms of the Apache license 2.0, save for
konduit-serving-pmml which is agpl to comply with the JPMML license.