it.agilelab.bigdata.wasp.consumers.spark.plugins.kafka
Creates a streaming DataFrame from a Kafka streaming source.
Creates a streaming DataFrame from a Kafka streaming source.
The returned DataFrame will contain a column named "kafkaMetadata" column with message metadata and the message contents either as a single column named "value" or as multiple columns named after the value contents depending on the topic data type.
The "kafkaMetadata" column contains the following: - key: bytes - headers: array of {headerKey: string, headerValue: bytes} - topic: string - partition: int - offset: long - timestamp: timestamp - timestampType: int
The behaviour for message contents column(s) is the following: - the "avro" and "json" topic data types will output the columns specified by their schemas - the "plaintext" and "bytes" topic data types output a "value" column with the contents as string or bytes respectively