Packages

class FileStreamSource extends SupportsAdmissionControl with SupportsTriggerAvailableNow with Source with Logging

A very simple source that reads files from the given directory as they appear.

Linear Supertypes
Logging, Source, SupportsTriggerAvailableNow, SupportsAdmissionControl, SparkDataStream, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. FileStreamSource
  2. Logging
  3. Source
  4. SupportsTriggerAvailableNow
  5. SupportsAdmissionControl
  6. SparkDataStream
  7. AnyRef
  8. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. Protected

Instance Constructors

  1. new FileStreamSource(sparkSession: SparkSession, path: String, fileFormatClassName: String, schema: StructType, partitionColumns: Seq[String], metadataPath: String, options: Map[String, String])

Value Members

  1. def commit(end: Offset): Unit

    Informs the source that Spark has completed processing all data for offsets less than or equal to end and will only request offsets greater than end in the future.

    Informs the source that Spark has completed processing all data for offsets less than or equal to end and will only request offsets greater than end in the future.

    Definition Classes
    FileStreamSourceSource
  2. def commit(end: connector.read.streaming.Offset): Unit
    Definition Classes
    Source → SparkDataStream
  3. def currentLogOffset: Long

    Return the latest offset in the FileStreamSourceLog

  4. def deserializeOffset(json: String): connector.read.streaming.Offset
    Definition Classes
    Source → SparkDataStream
  5. def getBatch(start: Option[Offset], end: Offset): DataFrame

    Returns the data that is between the offsets (start, end].

    Returns the data that is between the offsets (start, end].

    Definition Classes
    FileStreamSourceSource
  6. def getDefaultReadLimit(): ReadLimit
    Definition Classes
    FileStreamSource → SupportsAdmissionControl
  7. def getOffset: Option[Offset]

    Returns the maximum available offset for this source.

    Returns the maximum available offset for this source. Returns None if this source has never received any data.

    Definition Classes
    FileStreamSourceSource
  8. def initialOffset(): connector.read.streaming.Offset
    Definition Classes
    Source → SparkDataStream
  9. def latestOffset(startOffset: connector.read.streaming.Offset, limit: ReadLimit): connector.read.streaming.Offset
    Definition Classes
    FileStreamSource → SupportsAdmissionControl
  10. def prepareForTriggerAvailableNow(): Unit
    Definition Classes
    FileStreamSource → SupportsTriggerAvailableNow
  11. def reportLatestOffset(): connector.read.streaming.Offset
    Definition Classes
    SupportsAdmissionControl
  12. val schema: StructType

    Returns the schema of the data from this source

    Returns the schema of the data from this source

    Definition Classes
    FileStreamSourceSource
  13. val seenFiles: SeenFilesMap

    A mapping from a file that we have processed to some timestamp it was last modified.

  14. def stop(): Unit
    Definition Classes
    FileStreamSource → SparkDataStream
  15. def toString(): String
    Definition Classes
    FileStreamSource → AnyRef → Any
  16. def withBatchingLocked[T](func: => T): T

    For test only.

    For test only. Run func with the internal lock to make sure when func is running, the current offset won't be changed and no new batch will be emitted.