An Apache NiFi processor to encode and decode data using Google Protocol Buffers schemas.
.desc) from disk
.protoschema file, from disk or directly embedded in a property
.protofiles at processor level (as a processor property) or directly in a flowfile property
A pre-packaged version of NiFi with the processor installed is available on Docker Hub. To run it just type:
docker run -p 8080:8080 whiver/nifi-protobuf:latest
Note that the
-p option publishes the port 8080 used by NiFi to the host, so that you can access the UI directly via
Grab the latest release directly from the releases page and copy the
.nar file in the Apache NiFi
Clone this project and build the processor
nar file using Maven:
mvn compile mvn nifi-nar:nar
Then simply copy the generated
nar file into the Apache NiFi
The project also includes a Dockerfile to easily build a Docker image of the project. In fact you just need to run:
and everything should be fine ! :)
See the installation section to learn how to integrate this processor in Apache NiFi. This projects add 2 different new processors in NiFi:
ProtobufDecoder, which decodes a Protobuf-encoded payload to different kind of structured formats ;
ProtobufEncoder, which encodes a payload in a structured format using a Protobuf schema.
In both processors, you have to specify a schema file to use for data encoding/decoding. You can do so either
processor-wide (meaning that every incoming flowfiles will be processed using the same schema) or per-flowfile. In both
cases, it is done by writing the absolute schema file path in the
protobuf.schemaPath property of the flowfile or
processor. Note that if the property is set in the flowfile, it will override the one from the processor.
I strongly recommend you to use a compiled
.desc file whenever possible, for a performance reason. This file can be
obtained by compiling the
.proto file with Google's
However, if you cannot compile your
.proto file, you can set it directly as a schema file and set the
protobuf.compileSchema property of the processor to tell it to compile the schema dynamically.
Important: The processor allows you to import only one schema file, so you need to package all you dependencies into one file. To do so, compile your main
.protofile using the
--include_importsoption of the
protoccompiler. If you are using a raw
.protofile, you need to bundle all imports inside the file.
Note: if you don't have a compiled
.descfile yet, you should take a look at
protoc, the Protobuf compiler from Google.
For now, the only structured format the processors can process is the JSON. In the future, there should be more formats available (XML and flowfile properties are expected).
By design, this processor cannot use precompiled code to handle messages (otherwise you would have already generated them)
and wouldn't be here. So this processor is using the runtime part of the Protobuf library, which dynamically parses the files,
given a compiled schema (
For convenience, the processor also allows you to provide a raw
.proto file but, to be used, it must be compiled, so this
is what the processor does before anything else. To avoid multiple compilation when not needed, the result file is cached,
and if you specified the schema in the processor configuration (and not in the flowfile properties), it will be directly
reused for each operation, and it will even avoid reading the schema from the disk.
So, if you can, specify the schema at the processor level to get the best performances.
This project is Free as in Freedom, so feel free to contribute by posting bug report or pull requests!
This project is licensed under the MIT license. The terms of this license can be found in the LICENSE file.