Syncer: sync & manipulate data from MySQL/MongoDB to Elasticsearch/MySQL/Http/Kafka Endpoint

Features

Consistency

Aims


If you are changing the id of event, it always means you are doing joining like I do, which

Updated Asynchronously

The business database query request is delayed as little as possible.

Input -- DataSource

The readConcern option allows you to control the consistency and isolation properties of the data read from replica sets and replica set shards. Through the effective use of write concerns and read concerns, you can adjust the level of consistency and availability guarantees as appropriate, such as waiting for stronger consistency guarantees, or loosening consistency requirements to provide higher availability. MongoDB drivers updated for MongoDB 3.2 or later support specifying read concern.


After data items come out from Input module, it is converted to SyncData(s) -- the abstraction of a single data change. In other words, a single binlog item may contain multiple line change and convert to multiple SyncDatas.

Filter -- Operation

Manipulate SyncData via (for more details, see input part of Consumer Pipeline Config):

Output -- DataSink

[1]: Be careful about this feature, it may affect your performance

Mis

Limitation

Notice

Use Syncer

Preparation

Producer Data Source Config


input:
  masters:
    - connection:
        address: ${HOST_ADDRESS}
        port: 3306
        user: xxx
        password: yyy

    - connection:
        address: ${HOST_ADDRESS}
        port: 27018
      type: mongo

Consumer Pipeline Config

Input

Filter

The following part is implemented by Spring EL, i.e. you can use any syntax Spring EL supported even if I didn't listed.

Output

In All

Full and usable samples can be found under test/config/

Syncer Config

port: 12345
ack:
  flushPeriod: 100
input:
  input-meta:
    last-run-metadata-dir: /data/syncer/input/last_position/

filter:
  worker: 3
  filter-meta:
    src: /data/syncer/filter/src

output:
  worker: 2
  batch:
    worker: 2
  output-meta:
    failure-log-dir: /data/syncer/output/failure/

Run

git clone https://github.com/zzt93/syncer
cd syncer/ && mvn package
# /path/to/config/: producer.yml, consumer.yml, password-file
# use `-XX:+UseParallelOldGC` if you have less memory and lower input pressure
# use `-XX:+UseG1GC` if you have at least 4g memory and event input rate larger than 2*10^4/s
java -server -XX:+UseG1GC -jar ./syncer-core/target/syncer-core-1.0-SNAPSHOT.jar [--debug] [--port=40000] [--config=/absolute/path/to/syncerConfig.yml] --producerConfig=/absolute/path/to/producer.yml --consumerConfig=/absolute/path/to/consumer1.yml,/absolute/path/to/consumer2.yml

Test

Dependency

Integration Test

Test data:

Pressure Test

Used In Production

TODO

See Issue 1


Implementation

Input Module

Problem & Notice

Output Module

Json Mapper


Config File Upgrade Guide

From 1.1 to 1.2

How to ?

If you have any problems with how to use Syncer or bugs of it, write a issue. I will handle it as soon as I can.

FAQ