Zipkin Storage: Kafka [EXPERIMENTAL]

Build Status

Kafka-based storage for Zipkin.

                    +----------------------------*zipkin*----------------------------------------------
                    |                                     [ dependency-storage ]--->( dependencies      )
                    |                                                  ^        +-->( autocomplete-tags )
( collected-spans )-|->[ partitioning ]   [ aggregation ]    [ trace-storage ]--+-->( traces            )
  via http, kafka,  |       |                    ^    |         ^      |        +-->( service-names     )
  amq, grpc, etc.   +-------|--------------------|----|---------|------|-------------------------------
                            |                    |    |         |      |
----------------------------|--------------------|----|---------|------|-------------------------------
                            +-->( spans )--------+----+---------|      |
                                                      |         |      |
*kafka*                                               +->( traces )    |
 topics                                               |                |
                                                      +->( dependencies )

-------------------------------------------------------------------------------------------------------

Spans collected via different transports are partitioned by traceId and stored in a partitioned spans Kafka topic. Partitioned spans are then aggregated into traces and then into dependency links, both results are emitted into Kafka topics as well. These 3 topics are used as source for local stores (Kafka Stream stores) that support Zipkin query and search APIs.

Design

Configuration

Use-cases

Replacement for batch-oriented Zipkin dependencies

A limitation of zipkin-dependencies module, is that it requires to be scheduled with a defined frequency. This batch-oriented execution causes out-of-date values until processing runs again.

Kafka-based storage enables aggregating dependencies as spans are received, allowing a (near-)real-time calculation of dependency metrics.

To enable this, other components could be disabled. There is a profile prepared to enable aggregation and search of dependency graphs.

This profile can be enable by adding Java option: -Dspring.profiles.active=kafka-only-dependencies

Docker image includes a environment variable to set the profile:

MODULE_OPTS="-Dloader.path=lib -Dspring.profiles.active=kafka-only-dependencies"

To try out, there is a Docker compose configuration ready to test.

If an existing Kafka collector is in place downstreaming traces into an existing storage, another Kafka consumer group id can be used for zipkin-storage-kafka to consume traces in parallel. Otherwise, you can forward spans from another Zipkin server to zipkin-storage-kafka if Kafka transport is not available.

Building

To build the project you will need Java 8+.

make build

And testing:

make test

If you want to build a docker image:

make docker-build

Run locally

To run locally, first you need to get Zipkin binaries:

make get-zipkin

By default Zipkin will be waiting for a Kafka broker to be running on localhost:19092.

Then run Zipkin locally:

make run-local

To validate storage make sure that Kafka topics are created so Kafka Stream instances can be initialized properly:

make kafka-topics
make zipkin-test

This will start a browser and check a traces has been registered.

It will send another trace after a minute (trace timeout) + 1 second to trigger aggregation and visualize dependency graph.

Run with Docker

If you have Docker available, run:

make run-docker

And Docker image will be built and Docker compose will start.

To test it, run:

make zipkin-test-single
# or
make zipkin-test-distributed

traces

dependencies

Examples

Acknowledgments

This project is inspired in Adrian Cole's VoltDB storage https://github.com/adriancole/zipkin-voltdb

Kafka Streams images are created with https://zz85.github.io/kafka-streams-viz/