Package

org.apache.spark.sql.execution

streaming

Permalink

package streaming

Visibility
  1. Public
  2. All

Type Members

  1. case class CompositeOffset(offsets: Seq[Option[Offset]]) extends Offset with Product with Serializable

    Permalink

    An ordered collection of offsets, used to track the progress of processing data from one or more Sources that are present in a streaming query.

    An ordered collection of offsets, used to track the progress of processing data from one or more Sources that are present in a streaming query. This is similar to simplified, single-instance vector clock that must progress linearly forward.

  2. class ContinuousQueryListenerBus extends SparkListener with ListenerBus[ContinuousQueryListener, Event]

    Permalink

    A bus to forward events to ContinuousQueryListeners.

    A bus to forward events to ContinuousQueryListeners. This one will wrap received ContinuousQueryListener.Events as WrappedContinuousQueryListenerEvents and send them to the Spark listener bus. It also registers itself with Spark listener bus, so that it can receive WrappedContinuousQueryListenerEvents, unwrap them as ContinuousQueryListener.Events and dispatch them to ContinuousQueryListener.

  3. class FileStreamSink extends Sink with Logging

    Permalink

    A sink that writes out results to parquet files.

    A sink that writes out results to parquet files. Each batch is written out to a unique directory. After all of the files in a batch have been successfully written, the list of file paths is appended to the log atomically. In the case of partial failures, some duplicate data may be present in the target directory, but only one copy of each file will be present in the log.

  4. class FileStreamSinkLog extends HDFSMetadataLog[Seq[SinkFileStatus]]

    Permalink

    A special log for FileStreamSink.

    A special log for FileStreamSink. It will write one log file for each batch. The first line of the log file is the version number, and there are multiple JSON lines following. Each JSON line is a JSON format of SinkFileStatus.

    As reading from many small files is usually pretty slow, FileStreamSinkLog will compact log files every "spark.sql.sink.file.log.compactLen" batches into a big file. When doing a compaction, it will read all old log files and merge them with the new batch. During the compaction, it will also delete the files that are deleted (marked by SinkFileStatus.action). When the reader uses allFiles to list all files, this method only returns the visible files (drops the deleted files).

  5. class FileStreamSinkWriter extends Serializable with Logging

    Permalink

    Writes data given to a FileStreamSink to the given basePath in the given fileFormat, partitioned by the given partitionColumnNames.

    Writes data given to a FileStreamSink to the given basePath in the given fileFormat, partitioned by the given partitionColumnNames. This writer always appends data to the directory if it already has data.

  6. class FileStreamSource extends Source with Logging

    Permalink

    A very simple source that reads text files from the given directory as they appear.

    A very simple source that reads text files from the given directory as they appear.

    TODO Clean up the metadata files periodically

  7. class HDFSMetadataLog[T] extends MetadataLog[T] with Logging

    Permalink

    A MetadataLog implementation based on HDFS.

    A MetadataLog implementation based on HDFS. HDFSMetadataLog uses the specified path as the metadata storage.

    When writing a new batch, HDFSMetadataLog will firstly write to a temp file and then rename it to the final batch file. If the rename step fails, there must be multiple writers and only one of them will succeed and the others will fail.

    Note: HDFSMetadataLog doesn't support S3-like file systems as they don't guarantee listing files in a directory always shows the latest files.

  8. class IncrementalExecution extends QueryExecution

    Permalink

    A variant of QueryExecution that allows the execution of the given LogicalPlan plan incrementally.

    A variant of QueryExecution that allows the execution of the given LogicalPlan plan incrementally. Possibly preserving state in between each execution.

  9. case class LongOffset(offset: Long) extends Offset with Product with Serializable

    Permalink

    A simple offset for sources that produce a single linear stream of data.

  10. case class MemoryPlan(sink: MemorySink, output: Seq[Attribute]) extends LeafNode with Product with Serializable

    Permalink

    Used to query the data that has been written into a MemorySink.

  11. class MemorySink extends Sink with Logging

    Permalink

    A sink that stores the results in memory.

    A sink that stores the results in memory. This Sink is primarily intended for use in unit tests and does not provide durability.

  12. case class MemoryStream[A](id: Int, sqlContext: SQLContext)(implicit evidence$2: Encoder[A]) extends Source with Logging with Product with Serializable

    Permalink

    A Source that produces value stored in memory as they are added by the user.

    A Source that produces value stored in memory as they are added by the user. This Source is primarily intended for use in unit tests as it can only replay data when the object is still available.

  13. trait MetadataLog[T] extends AnyRef

    Permalink

    A general MetadataLog that supports the following features:

    A general MetadataLog that supports the following features:

    • Allow the user to store a metadata object for each batch.
    • Allow the user to query the latest batch id.
    • Allow the user to query the metadata object of a specified batch id.
    • Allow the user to query metadata objects in a range of batch ids.
  14. class MetadataLogFileCatalog extends PartitioningAwareFileCatalog

    Permalink

    A FileCatalog that generates the list of files to processing by reading them from the metadata log files generated by the FileStreamSink.

  15. trait Offset extends Serializable

    Permalink

    A offset is a monotonically increasing metric used to track progress in the computation of a stream.

    A offset is a monotonically increasing metric used to track progress in the computation of a stream. An Offset must be comparable, and the result of compareTo must be consistent with equals and hashcode.

  16. case class OperatorStateId(checkpointLocation: String, operatorId: Long, batchId: Long) extends Product with Serializable

    Permalink

    Used to identify the state store for a given operator.

  17. case class ProcessingTimeExecutor(processingTime: ProcessingTime, clock: Clock = new SystemClock()) extends TriggerExecutor with Logging with Product with Serializable

    Permalink

    A trigger executor that runs a batch every intervalMs milliseconds.

  18. trait Sink extends AnyRef

    Permalink

    An interface for systems that can collect the results of a streaming query.

    An interface for systems that can collect the results of a streaming query. In order to preserve exactly once semantics a sink must be idempotent in the face of multiple attempts to add the same batch.

  19. case class SinkFileStatus(path: String, size: Long, isDir: Boolean, modificationTime: Long, blockReplication: Int, blockSize: Long, action: String) extends Product with Serializable

    Permalink

    The status of a file outputted by FileStreamSink.

    The status of a file outputted by FileStreamSink. A file is visible only if it appears in the sink log and its action is not "delete".

    path

    the file path.

    size

    the file size.

    isDir

    whether this file is a directory.

    modificationTime

    the file last modification time.

    blockReplication

    the block replication.

    blockSize

    the block size.

    action

    the file action. Must be either "add" or "delete".

  20. trait Source extends AnyRef

    Permalink

    A source of continually arriving data for a streaming query.

    A source of continually arriving data for a streaming query. A Source must have a monotonically increasing notion of progress that can be represented as an Offset. Spark will regularly query each Source to see if any more data is available.

  21. case class StateStoreRestoreExec(keyExpressions: Seq[Attribute], stateId: Option[OperatorStateId], child: SparkPlan) extends SparkPlan with UnaryExecNode with StatefulOperator with Product with Serializable

    Permalink

    For each input tuple, the key is calculated and the value from the StateStore is added to the stream (in addition to the input tuple) if present.

  22. case class StateStoreSaveExec(keyExpressions: Seq[Attribute], stateId: Option[OperatorStateId], child: SparkPlan) extends SparkPlan with UnaryExecNode with StatefulOperator with Product with Serializable

    Permalink

    For each input tuple, the key is calculated and the tuple is put into the StateStore.

  23. trait StatefulOperator extends SparkPlan

    Permalink

    An operator that saves or restores state from the StateStore.

    An operator that saves or restores state from the StateStore. The OperatorStateId should be filled in by prepareForExecution in IncrementalExecution.

  24. class StreamExecution extends ContinuousQuery with Logging

    Permalink

    Manages the execution of a streaming Spark SQL query that is occurring in a separate thread.

    Manages the execution of a streaming Spark SQL query that is occurring in a separate thread. Unlike a standard query, a streaming query executes repeatedly each time new data arrives at any Source present in the query plan. Whenever new data arrives, a QueryExecution is created and the results are committed transactionally to the given Sink.

  25. class StreamProgress extends Map[Source, Offset]

    Permalink

    A helper class that looks like a Map[Source, Offset].

  26. case class StreamingExecutionRelation(source: Source, output: Seq[Attribute]) extends LeafNode with Product with Serializable

    Permalink

    Used to link a streaming Source of data into a org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.

  27. case class StreamingRelation(dataSource: DataSource, sourceName: String, output: Seq[Attribute]) extends LeafNode with Product with Serializable

    Permalink

    Used to link a streaming DataSource into a org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.

    Used to link a streaming DataSource into a org.apache.spark.sql.catalyst.plans.logical.LogicalPlan. This is only used for creating a streaming org.apache.spark.sql.DataFrame from org.apache.spark.sql.DataFrameReader. It should be used to create Source and converted to StreamingExecutionRelation when passing to StreamExecution to run a query.

  28. trait TriggerExecutor extends AnyRef

    Permalink

Value Members

  1. object CompositeOffset extends Serializable

    Permalink
  2. object FileStreamSink

    Permalink
  3. object FileStreamSinkLog

    Permalink
  4. object HDFSMetadataLog

    Permalink
  5. object MemoryStream extends Serializable

    Permalink
  6. object SinkFileStatus extends Serializable

    Permalink
  7. object StreamingExecutionRelation extends Serializable

    Permalink
  8. object StreamingRelation extends Serializable

    Permalink
  9. package state

    Permalink

Ungrouped