org.apache.spark.sql.execution.streaming
Store the metadata for the specified batchId and return true
if successful.
Store the metadata for the specified batchId and return true
if successful. If the batchId's
metadata has already been stored, this method will return false
.
Note that this method must be called on a org.apache.spark.util.UninterruptibleThread
so that interrupts can be disabled while writing the batch file. This is because there is a
potential dead-lock in Hadoop "Shell.runCommand" before 2.5.0 (HADOOP-10622). If the thread
running "Shell.runCommand" is interrupted, then the thread can get deadlocked. In our
case, writeBatch
creates a file using HDFS API and calls "Shell.runCommand" to set the
file permissions, and can get deadlocked if the stream execution thread is stopped by
interrupt. Hence, we make sure that this method is called on UninterruptibleThread which
allows us to disable interrupts here. Also see SPARK-14131.
Returns all files except the deleted ones.
Return metadata for batches between startId (inclusive) and endId (inclusive).
Return metadata for batches between startId (inclusive) and endId (inclusive). If startId
is
None
, just return all batches before endId (inclusive).
Return the metadata for the specified batchId if it's stored.
Return the metadata for the specified batchId if it's stored. Otherwise, return None.
Return the latest batch Id and its metadata if exist.
Return the latest batch Id and its metadata if exist.
A special log for FileStreamSink. It will write one log file for each batch. The first line of the log file is the version number, and there are multiple JSON lines following. Each JSON line is a JSON format of SinkFileStatus.
As reading from many small files is usually pretty slow, FileStreamSinkLog will compact log files every "spark.sql.sink.file.log.compactLen" batches into a big file. When doing a compaction, it will read all old log files and merge them with the new batch. During the compaction, it will also delete the files that are deleted (marked by SinkFileStatus.action). When the reader uses
allFiles
to list all files, this method only returns the visible files (drops the deleted files).