Scribengin

Pronounced Scribe Engine

Scribengin is a highly reliable (HA) and performant event/logging transport that registers data under defined schemas in a variety of end systems. Scribengin enables you to have multiple flows of data from a source to a sink. Scribengin will tolerate system failures of individual nodes and will do a complete recovery in the case of complete system failure.

Reads data from sources:

Writes data to sinks:

Additonal:

This is part of NeverwinterDP the Data Pipeline for Hadoop

Running

To get your VM up and running:

git clone git://github.com/DemandCube/Scribengin
cd Scribengin/vagrant
vagrant up

For more info on how it all works take a look at [The DevSetup Guide] (https://github.com/DemandCube/Scribengin/blob/master/DevSetup.md)

Community

Contributing

See the [NeverwinterDP Guide to Contributing] (https://github.com/DemandCube/NeverwinterDP#how-to-contribute)

The Problem

The core problem is how to reliably and at scale have a distributed application write data to multiple destination data systems. This requires the ability to todo data mapping, partitioning with optional filtering to the destination system.

Status

Currently we are reorganizing the code for V2 of Scribengin to make things more modular and better organized.

Definitions

Yarn

See the [NeverwinterDP Guide to Yarn] (https://github.com/DemandCube/NeverwinterDP#Yarn)

Potential Implementation Strategies

Poc

There is a question of how to implement quaranteed delivery of logs to end systems.

Architecture

Scribengin Fully Distributed Mode in Yarn Scribengin Fully Distributed Mode Standalone Scribengin Pseudo Distributed Mode Scribengin Standalone Mode

Milestones

Contributors

Related Project

Research

Yarn Documentation

Keep your fork updated

Github Fork a Repo Help

git remote add upstream [email protected]b.com:DemandCube/Scribengin.git
git fetch upstream
git checkout master
git merge upstream/master