A ZIO-Spark specific function to describe a continuously stream with checkpoint.
A ZIO-Spark specific function to describe a continuously stream with checkpoint.
Scala Example, using ZIO duration ops:
df.writeStream.continuouslyWithCheckpointEvery(5.seconds)
Sets the output of the streaming query to be processed using the provided writer object.
Sets the output of the streaming query to be processed using the provided writer object. object. See org.apache.spark.sql.ForeachWriter for more details on the lifecycle and semantics.
2.0.0
Change the source (sink) of the stream.
Change the source (sink) of the stream.
2.0.0
A ZIO-Spark specific function to run the streaming job only once.
Adds an option to the DataFrameWriter.
Adds an option to the DataFrameWriter.
Adds an option to the DataFrameWriter.
Adds an option to the DataFrameWriter.
Adds an option to the DataFrameWriter.
Adds multiple options to the DataFrameWriter.
Specifies how data of a streaming DataFrame/Dataset is written to a streaming sink.
Specifies how data of a streaming DataFrame/Dataset is written to a streaming sink.
OutputMode.Append()
: only the new rows
in the streaming DataFrame/Dataset will be written to the
sink.OutputMode.Complete()
: all the rows in the
streaming DataFrame/Dataset will be written to the sink every time
there are some updates.OutputMode.Update()
: only the
rows that were updated in the streaming DataFrame/Dataset will be
written to the sink every time there are some updates. If the query
doesn't contain aggregations, it will be equivalent to
OutputMode.Append()
mode.
2.0.0
Partitions the output by the given columns on the file system.
Partitions the output by the given columns on the file system. If specified, the output is laid out on the file system similar to Hive's partitioning scheme. As an example, when we partition a dataset by year and then month, the directory layout would look like:
Partitioning is one of the most widely used techniques to optimize physical data layout. It provides a coarse-grained index for skipping unnecessary data reads when queries have predicates on the partitioned columns. In order for partitioning to work well, the number of distinct values in each column should typically be less than tens of thousands.
2.0.0
Specifies the name of the StreamingQuery that can be started
with start()
.
Specifies the name of the StreamingQuery that can be started
with start()
. This name must be unique among all the currently
active queries in the associated SQLContext.
2.0.0
Generate the stream as a stoppable blocking task handled by ZIO.
Starts the execution of the streaming query, which will continually output results to the given path as new data arrives.
Starts the execution of the streaming query, which will continually
output results to the given path as new data arrives. The returned
StreamingQuery object can be used to interact with the stream.
Throws a TimeoutException
if the following conditions are met:
spark.sql.streaming.stopActiveRunOnRestart
is enabledspark.sql.streaming.stopTimeout
2.0.0
Starts the execution of the streaming query, which will continually output results to the given path as new data arrives.
Starts the execution of the streaming query, which will continually output results to the given path as new data arrives. The returned StreamingQuery object can be used to interact with the stream.
2.0.0
Generate a stream with only the available current input.
Generate a stream with only the available current input. Generally used for testing purpose.
Set the trigger for the stream query.
Set the trigger for the stream query. The default value is
ProcessingTime(0)
and it will run the query as fast as possible.
Scala Example:
df.writeStream.trigger(ProcessingTime("10 seconds")) import scala.concurrent.duration._ df.writeStream.trigger(ProcessingTime(10.seconds))
2.0.0
A ZIO-Spark specific function to describe a micro batch stream.
A ZIO-Spark specific function to describe a micro batch stream.
Scala Example, using ZIO duration ops:
df.writeStream.triggerEvery(5.seconds)