Pronounced Spark Engine
Nginx + ZeroMQ = Awesome!
Sparkngin is a high-performance persistent message stream engine built on Nginx. Sparkngin can function as logging, event or message streaming solution. When used with Neverwinter Data Platform Sparkngin can stream data to data repositories like Hive, Hbase, Storm and HDFS.
The core problem is how to stream data from rpc calls to HTTP/REST or ZeroMQ (an endpoint) and send (messages, events, logs) to an end system like (Kafka, Kinesis, Storm, HDFS ...). We want this to be horizonatally scalable HA and high-performance way that allows for reliable delivery of messages.
Out of the box includes:
Table of Contents
See the [NeverwinterDP Guide to Contributing] (https://github.com/DemandCube/NeverwinterDP#how-to-contribute)
There is a question of how to implement quaranteed delivery of logs to kafka.
Application sending messages -> Sparkngin [ Nginx -> Zeromq (Publisher) -> Zeromq (Subscriber) -> Kafka (Client call "Producer") ] -> Kafka -> (Client called consumer) -> Some Application
Purpose is to provide standard event data that is used to allow for systematic monitoring, analytics, retries and timebased partition notifications (Aka send a message when all data from hour 1 is sent)
Example Nginx Modules
Sparkngin is mean to solve the short coming of realtime event streaming using restful endpoint. Utilizing the logging and other connections in nginx is hard to configure and has limitations.
Sparkngin is built on top of two main projects Nginx which is the worlds second most popular web server and Zeromq a high performance networking library. Both provide a very solid core to realtime event streaming. If you have questions about why nginx, click the link. Some people who use it are Facebook, PInterest, Airbnb, Netflix, Hulu and Wordpress among others. Here is a summary of some nginx benefits and features.
The concept is the timestamp is the system timestamp of that machine, which the submitted timestamp is a optional timestamp you might submit in the request. So if for example you have data that you want to submit an hour later, you might want to organize it around a submitted timestamp rather than by the system recorded timestamp of the user.
Whole system is consit of two parts:
e.g. /log?level=info&type=http&stimestamp=134545634345&ver=1.0&topic=test&env=testdata&ip=1.1.1.1
sparkngin_listen port
7000
http
Set listen port of zeromq publisher.
sparkngin_buf_size size
4M
http
Set log data cache buffer size.
sparkngin_gzip on/off
off
http
Set gzip switch on/off.
sparkngin_format (json|plain) ['delimiter']
plain ' '
http
Set output format:
sparkngin_root_loc
location
Set sparkngin root location.
e.g.
With below configuration, we can access /sn/imok, /sn/stat, /sn/log.
location /sn { sparkngin_root_loc ; }
sparkngin_fields fields list
%version% %ip% %time_stamp% %level% %topic% %user-agent% %referrer% %cookie%
http
Set output fields. Below are available fields:
worker_processes 2;
events {
worker_connections 1024;
}
http {
sparkngin_listen 7000;
sparkngin_gzip on;
sparkngin_format json;
sparkngin_fields %version% %ip% %time_stamp% %level% %topic% %user-agent% %referrer% %cookie%;
include mime.types;
default_type application/octet-stream;
sendfile on;
keepalive_timeout 65;
server {
listen 80;
server_name localhost;
location / {
root html;
index index.html index.htm;
}
location /sparkngin {
sparkngin_root_loc ;
}
}
}
git remote add upstream [email protected]:DemandCube/Sparkngin.git
git fetch upstream
git checkout master
git merge upstream/master