Spark Structured Streaming Kafka Source for Kafka 0.8.
This library is design for Spark Structured Streaming Kafka source, its aim is to provide equal functionalities for users who still use Kafka 0.8/0.9.
The main differences compared to Kafka 0.10 source are:
SimpleConsumer
rather than new Consumer
API.Like other Sources in Spark ecosystem, the simplest way to use is to add the dependencies to Spark by:
spark-submit
--master local[*] \
--packages com.hortonworks.spark:spark-kafka-0-8-sql_2.11:1.0 \
yourApp
...
Spark will automatically search central and local maven repositories to add dependencies to Spark runtime. Besides you coud use mvn install
to publish this library to local Maven repo and use --packages
, which will search local maven repo also.
To use KafkaSource
, it is the same as any other Structured Streaming Sources already supported in Spark:
import spark.implicits
val reader = spark
.readStream
.format("kafka")
.option("kafka.bootstrap.servers", testUtils.brokerAddress)
.option("startingoffset", "smallest")
.option("topics", topic)
val kafka = reader.load()
.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")
.writeStream
.format("console")
.trigger(ProcessingTime(2000L))
.start()
kafka.awaitTermination()
This Structured Streaming Kafka 0.8 source is built with Maven, you could build with:
mvn clean package
Due to the rigid changes of Structured Streaming component, This Kafka 0.8 Source can only worked with Spark after 2.0.2 and master branch.
The schema of Kafka 0.8 source is fixed, you cannot change the schema of Kafka 0.8 source, this is different from most of other Sources in Spark.
StructType(Seq(
StructField("key", BinaryType),
StructField("value", BinaryType),
StructField("topic", StringType),
StructField("partition", IntegerType),
StructField("offset", LongType)))
kafka.bootstrap.servers
or kafka.metadata.broker
in Source creation.Apache License, Version 2.0 http://www.apache.org/licenses/LICENSE-2.0