Class/Object

org.apache.spark.sql.execution.streaming

FileStreamSinkLog

Related Docs: object FileStreamSinkLog | package streaming

Permalink

class FileStreamSinkLog extends HDFSMetadataLog[Seq[SinkFileStatus]]

A special log for FileStreamSink. It will write one log file for each batch. The first line of the log file is the version number, and there are multiple JSON lines following. Each JSON line is a JSON format of SinkFileStatus.

As reading from many small files is usually pretty slow, FileStreamSinkLog will compact log files every "spark.sql.sink.file.log.compactLen" batches into a big file. When doing a compaction, it will read all old log files and merge them with the new batch. During the compaction, it will also delete the files that are deleted (marked by SinkFileStatus.action). When the reader uses allFiles to list all files, this method only returns the visible files (drops the deleted files).

Linear Supertypes
HDFSMetadataLog[Seq[SinkFileStatus]], Logging, MetadataLog[Seq[SinkFileStatus]], AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. FileStreamSinkLog
  2. HDFSMetadataLog
  3. Logging
  4. MetadataLog
  5. AnyRef
  6. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new FileStreamSinkLog(sparkSession: SparkSession, path: String)

    Permalink

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. def add(batchId: Long, logs: Seq[SinkFileStatus]): Boolean

    Permalink

    Store the metadata for the specified batchId and return true if successful.

    Store the metadata for the specified batchId and return true if successful. If the batchId's metadata has already been stored, this method will return false.

    Note that this method must be called on a org.apache.spark.util.UninterruptibleThread so that interrupts can be disabled while writing the batch file. This is because there is a potential dead-lock in Hadoop "Shell.runCommand" before 2.5.0 (HADOOP-10622). If the thread running "Shell.runCommand" is interrupted, then the thread can get deadlocked. In our case, writeBatch creates a file using HDFS API and calls "Shell.runCommand" to set the file permissions, and can get deadlocked if the stream execution thread is stopped by interrupt. Hence, we make sure that this method is called on UninterruptibleThread which allows us to disable interrupts here. Also see SPARK-14131.

    Definition Classes
    FileStreamSinkLogHDFSMetadataLogMetadataLog
  5. def allFiles(): Array[SinkFileStatus]

    Permalink

    Returns all files except the deleted ones.

  6. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  7. def batchIdToPath(batchId: Long): Path

    Permalink
    Definition Classes
    FileStreamSinkLogHDFSMetadataLog
  8. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  9. def deserialize(bytes: Array[Byte]): Seq[SinkFileStatus]

    Permalink
    Definition Classes
    FileStreamSinkLogHDFSMetadataLog
  10. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  11. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  12. val fileManager: FileManager

    Permalink
    Attributes
    protected
    Definition Classes
    HDFSMetadataLog
  13. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  14. def get(startId: Option[Long], endId: Option[Long]): Array[(Long, Seq[SinkFileStatus])]

    Permalink

    Return metadata for batches between startId (inclusive) and endId (inclusive).

    Return metadata for batches between startId (inclusive) and endId (inclusive). If startId is None, just return all batches before endId (inclusive).

    Definition Classes
    HDFSMetadataLogMetadataLog
  15. def get(batchId: Long): Option[Seq[SinkFileStatus]]

    Permalink

    Return the metadata for the specified batchId if it's stored.

    Return the metadata for the specified batchId if it's stored. Otherwise, return None.

    Definition Classes
    HDFSMetadataLogMetadataLog
  16. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  17. def getLatest(): Option[(Long, Seq[SinkFileStatus])]

    Permalink

    Return the latest batch Id and its metadata if exist.

    Return the latest batch Id and its metadata if exist.

    Definition Classes
    HDFSMetadataLogMetadataLog
  18. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  19. def initializeLogIfNecessary(isInterpreter: Boolean): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  20. def isBatchFile(path: Path): Boolean

    Permalink
    Definition Classes
    FileStreamSinkLogHDFSMetadataLog
  21. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  22. def isTraceEnabled(): Boolean

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  23. def log: Logger

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  24. def logDebug(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  25. def logDebug(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  26. def logError(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  27. def logError(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  28. def logInfo(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  29. def logInfo(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  30. def logName: String

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  31. def logTrace(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  32. def logTrace(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  33. def logWarning(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  34. def logWarning(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  35. val metadataPath: Path

    Permalink
    Definition Classes
    HDFSMetadataLog
  36. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  37. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  38. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  39. def pathToBatchId(path: Path): Long

    Permalink
    Definition Classes
    FileStreamSinkLogHDFSMetadataLog
  40. def serialize(logData: Seq[SinkFileStatus]): Array[Byte]

    Permalink
    Definition Classes
    FileStreamSinkLogHDFSMetadataLog
  41. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  42. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  43. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  44. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  45. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from HDFSMetadataLog[Seq[SinkFileStatus]]

Inherited from Logging

Inherited from MetadataLog[Seq[SinkFileStatus]]

Inherited from AnyRef

Inherited from Any

Ungrouped