Singer logo   Singer

High-performance, reliable and extensible logging agent

Singer is a high performance logging agent for uploading logs to Kafka. It can also be extended to support writing to other message transporters or storage systems.

Singer runs as a standalone process on the service boxes. It monitors the log directories by listening to file system events, and uploads data once it detects new data. Singer guarantees at least one time delivery of log messages.

Key Features:

Detailed design

Please see docs/DESIGN.md on Singer design.

Build

Get Singer code

git clone [git-repo-url] singer
cd singer

Build Singer binary

mvn clean package -pl singer -am -DskipTests

As there is no native support in JDK for file system events monitoring on Mac OSX, some tests that run fine in the Linux environment may fail intermittently on Mac OSX. Please use -DskipTests flag if you want to build Singer on macOS.

Build thift-logger client library

mvn clean package -pl thrift-logger -am

Testing

Singer has a set of unit tests that can be run through mvn test package -pl singer -am.

An end-to-end integration test that can be run through:

mvn clean package -pl singer -am 
singer/src/main/scripts/run_singer_tests.sh

Quick Start

The tutorial directory contains a demo that shows how to run Singer. Please see tutorial/README.md for details.

Usage

Use Singer client library to log data to local disk

Singer uses file inode + offset as the watermark position to track its progress, and writes the watermark info to disk after it writes a batch of messages to kafka. It resumes from the last watermark position after restarting. Because of this, Singer requires that a log stream is a sequence of append-only log files, and uses file renaming for log rotation.

Singer does not handle log streams that use file copy and truncation for log rotation, because Singer cannot use file inode + offset to uniquely identify log messages when a log file is copied and truncated.

For example, we have before rotation:

 ls -li 
   1001    service.log      # service.log with inode 1001

after rotation

 ls -li 

   1001   service.log.2018-11-30   # service.log.2018-11-30 with inode 1001 (was renamed from the old service.log)
   1002   service.log              # (this was newly generated service.log)

For logged data in plaintext format, you can directly config Singer to upload those logs. Singer also support high throughput logging using thrift format. You can write data to local disk using thrift-logger library that Singer provides. Currently Singer has thrift_logger libraries in Python, Java, Go, and C++.

Samples on using thrift_logger libraies:

Config Singer to upload data from local disk to Kafka

Singer uploads data based on configuration settings. Singer configuration is composed of two parts: 1) singer.properties that configures global Singer settings, e.g. size of thread pools, daily restart settings, heartbeat settings, etc. 2) log stream configuration: for each set of log streams, singer needs one log stream configuration to define log stream related settings.

Please see tutorial/etc/singer for singer configurations. docs/configuration_samples/sample_kubernetes has an example on Singer configuration for Kubernetes.

Run Singer

java -server  -cp $singer_home:$singer_home/lib/*:$singer_home/singer-$version.jar  \
     -Dlog4j.configuration=log4j.prod.properties -Dsinger.config.dir=$config_dir \
     com.pinterest.singer.SingerMain

Package Singer as a debian package

tar xzvf singer-${VERSION}-bin.tar.gz --directory $SINGER_DIR
cd $BUILD_DIR

fpm -s dir -t deb -n singer -v $VERSION --deb-upstart ../singer.upstart  \
    --deb-default ../singer.default -- .

Singer Metrics

Singer exposes metrics using Twitter Ostrich framework. Singer stats can be checked using the following command. Here 2047 is the ostrich port that you define in singer.ostrichPort configuration.

curl -s localhost:2047/stats.txt

License

Singer is distributed under Apache License, Version 2.0.