This process is implemented as an Apache Spark job. This job parses all traces in the current day in UTC time. This means you should schedule it to run just prior to midnight UTC.
All Zipkin Storage Components are supported, including Cassandra, MySQL and Elasticsearch.
STORAGE_TYPE=cassandra3: requires Cassandra 3.11.3+; tested against the latest patch of 3.11
STORAGE_TYPE=cassandra: requires Cassandra 2.2+; tested against the latest patch of 3.11
STORAGE_TYPE=mysql: requires MySQL 5.6+; tested against MySQL 5.6
STORAGE_TYPE=elasticsearch: requires Elasticsearch 5+; tested against last minor release of 6.x and 7.x
Due to SPARK-26134, Zipkin Dependencies currently requires Java 1.8 or 9 to run.
The quickest way to get started is to fetch the latest released job as a self-contained jar. For example:
$ curl -sSL https://zipkin.io/quickstart.sh | bash -s io.zipkin.dependencies:zipkin-dependencies:LATEST zipkin-dependencies.jar $ STORAGE_TYPE=cassandra3 java -jar zipkin-dependencies.jar
You can also start Zipkin Dependencies via Docker.
$ docker run --env STORAGE_TYPE=cassandra3 --env CASSANDRA_CONTACT_POINTS=host1,host2 openzipkin/zipkin-dependencies
By default, this job parses all traces since midnight UTC. You can parse traces for a different day via an argument in YYYY-mm-dd format, like 2016-07-16.
# ex to run the job to process yesterday's traces on OS/X $ STORAGE_TYPE=cassandra3 java -jar zipkin-dependencies.jar `date -uv-1d +%F` # or on Linux $ STORAGE_TYPE=cassandra3 java -jar zipkin-dependencies.jar `date -u -d '1 day ago' +%F`
zipkin-dependencies applies configuration parameters through environment variables.
The following variables are common to all storage layers:
* `SPARK_MASTER`: Spark master to submit the job to; Defaults to `local[*]` * `ZIPKIN_LOG_LEVEL`: Log level for zipkin-related status; Defaults to INFO (use DEBUG for details) * `SPARK_CONF`: Extend more spark configuration with value in properties format and separated with comma. Such as `spark.executor.heartbeatInterval=600000,spark.network.timeout=600000`
Cassandra is used when
cassandrais compatible with Zipkin's Legacy Cassandra storage component.
cassandra3is compatible with Zipkin's Cassandra v3 storage component.
Here are the variables that apply
* `CASSANDRA_KEYSPACE`: The keyspace to use. Defaults to "zipkin". * `CASSANDRA_CONTACT_POINTS`: Comma separated list of hosts / ip addresses part of Cassandra cluster. Defaults to localhost * `CASSANDRA_LOCAL_DC`: The local DC to connect to (other nodes will be ignored) * `CASSANDRA_USERNAME` and `CASSANDRA_PASSWORD`: Cassandra authentication. Will throw an exception on startup if authentication fails * `CASSANDRA_USE_SSL`: Requires `javax.net.ssl.trustStore` and `javax.net.ssl.trustStorePassword`, defaults to false. * `STRICT_TRACE_ID`: When false, dependency linking only looks at 64 bits of a trace ID, defaults to true.
$ STORAGE_TYPE=cassandra3 CASSANDRA_USERNAME=user CASSANDRA_PASSWORD=pass java -jar zipkin-dependencies.jar
MySQL is used when
STORAGE_TYPE=mysql. The schema is compatible with Zipkin's MySQL storage component.
* `MYSQL_DB`: The database to use. Defaults to "zipkin". * `MYSQL_USER` and `MYSQL_PASS`: MySQL authentication, which defaults to empty string. * `MYSQL_HOST`: Defaults to localhost * `MYSQL_TCP_PORT`: Defaults to 3306 * `MYSQL_USE_SSL`: Requires `javax.net.ssl.trustStore` and `javax.net.ssl.trustStorePassword`, defaults to false.
$ STORAGE_TYPE=mysql MYSQL_USER=root java -jar zipkin-dependencies.jar
Elasticsearch is used when
STORAGE_TYPE=elasticsearch. The schema is compatible with Zipkin's Elasticsearch storage component.
* `ES_INDEX`: The index prefix to use when generating daily index names. Defaults to zipkin. * `ES_DATE_SEPARATOR`: The separator used when generating dates in index. Defaults to '-' so the queried index look like zipkin-yyyy-DD-mm Could for example be changed to '.' to give zipkin-yyyy.MM.dd * `ES_HOSTS`: A comma separated list of elasticsearch hosts advertising http. Defaults to localhost. Add port section if not listening on port 9200. Only one of these hosts needs to be available to fetch the remaining nodes in the cluster. It is recommended to set this to all the master nodes of the cluster. Use url format for SSL. For example, "https://yourhost:8888" * `ES_NODES_WAN_ONLY`: Set to true to only use the values set in ES_HOSTS, for example if your elasticsearch cluster is in Docker. Defaults to false * `ES_USERNAME` and `ES_PASSWORD`: Elasticsearch basic authentication. Use when X-Pack security (formerly Shield) is in place. By default no username or password is provided to elasticsearch.
$ STORAGE_TYPE=elasticsearch ES_HOSTS=host1,host2 java -jar zipkin-dependencies.jar # To override the http port, add it to the host string $ STORAGE_TYPE=elasticsearch ES_HOSTS=host1:9201 java -jar zipkin-dependencies.jar
When using an https endpoint in
ES_HOSTS, you can use the following standard properties to
customize the certificates used for the connection:
To build the job from source and run against a local cassandra, in Spark's standalone mode.
# Build the spark jobs $ ./mvnw -DskipTests clean install $ STORAGE_TYPE=cassandra java -jar ./main/target/zipkin-dependencies*.jar
The jar file produced by this build can also run against spark directly. Before anything else, make sure you are running the same version of spark as used here.
You can use the following command to display what this project is built against:
$ SPARK_VERSION=$(./mvnw help:evaluate -Dexpression=spark.version -q -DforceStdout) $ echo $SPARK_VERSION 2.4.0
Once you've verified your setup is on the correct version, set the
For example, if you are connecting to spark running on the same host:
$ STORAGE_TYPE=cassandra3 SPARK_MASTER=spark://$HOSTNAME:7077 java -jar zipkin-dependencies.jar
Note that the Zipkin team focuses on tracing, not Spark support. If you have Spark cluster related troubleshooting questions, please use their support tools.
When troubleshooting, always set
ZIPKIN_LOG_LEVEL=DEBUG as this output
is important when figuring out why a trace didn't result in a link.
If you set
SPARK_MASTER to something besides local, remember that log
output also ends up in
stderr of the workers.
By default, this job uses the value of system property
java.io.tmpdir as location to store temporary data.
If you're getting
java.io.IOException: No space left on device while processing large sets
of trace data, you can specify a different location that has enough space available using
All artifacts publish to the group ID "io.zipkin.dependencies". We use a common release version for all components.
Snapshots are uploaded to JFrog after commits to master.