Store the metadata for the specified batchId and return true
if successful.
Store the metadata for the specified batchId and return true
if successful. If the batchId's
metadata has already been stored, this method will return false
.
Note that this method must be called on a org.apache.spark.util.UninterruptibleThread
so that interrupts can be disabled while writing the batch file. This is because there is a
potential dead-lock in Hadoop "Shell.runCommand" before 2.5.0 (HADOOP-10622). If the thread
running "Shell.runCommand" is interrupted, then the thread can get deadlocked. In our
case, writeBatch
creates a file using HDFS API and calls "Shell.runCommand" to set the
file permissions, and can get deadlocked if the stream execution thread is stopped by
interrupt. Hence, we make sure that this method is called on UninterruptibleThread which
allows us to disable interrupts here. Also see SPARK-14131.
Return metadata for batches between startId (inclusive) and endId (inclusive).
Return metadata for batches between startId (inclusive) and endId (inclusive). If startId
is
None
, just return all batches before endId (inclusive).
Return the metadata for the specified batchId if it's stored.
Return the metadata for the specified batchId if it's stored. Otherwise, return None.
Return the latest batch Id and its metadata if exist.
Return the latest batch Id and its metadata if exist.
Removes all the log entry earlier than thresholdBatchId (exclusive).
Removes all the log entry earlier than thresholdBatchId (exclusive).
A MetadataLog implementation based on HDFS. HDFSMetadataLog uses the specified
path
as the metadata storage.When writing a new batch, HDFSMetadataLog will firstly write to a temp file and then rename it to the final batch file. If the rename step fails, there must be multiple writers and only one of them will succeed and the others will fail.
Note: HDFSMetadataLog doesn't support S3-like file systems as they don't guarantee listing files in a directory always shows the latest files.