class AsyncCommitLog extends CommitLog
Implementation of CommitLog to perform asynchronous writes to storage
- Alphabetic
- By Inheritance
- AsyncCommitLog
- CommitLog
- HDFSMetadataLog
- Logging
- MetadataLog
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Instance Constructors
- new AsyncCommitLog(sparkSession: SparkSession, path: String, executorService: ThreadPoolExecutor)
Value Members
- final def !=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def ##: Int
- Definition Classes
- AnyRef → Any
- final def ==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- def add(batchId: Long, metadata: CommitMetadata): Boolean
Store the metadata for the specified batchId and return
true
if successful.Store the metadata for the specified batchId and return
true
if successful. If the batchId's metadata has already been stored, this method will returnfalse
.- Definition Classes
- HDFSMetadataLog → MetadataLog
- def addAsync(batchId: Long, metadata: CommitMetadata): CompletableFuture[Long]
Writes a new batch to the commit log asynchronously
Writes a new batch to the commit log asynchronously
- batchId
id of batch to write
- metadata
metadata of batch to write
- returns
a CompeletableFuture that contains the batch id. The future is completed when the async write of the batch is completed. Future may also be completed exceptionally to indicate some write error.
- def addInMemory(batchId: Long, metadata: CommitMetadata): Boolean
Adds batch to commit log only in memory and not persisted to durable storage.
Adds batch to commit log only in memory and not persisted to durable storage. This method is used when we don't want to persist the commit log entry for every micro batch to durable storage
- batchId
id of batch to write
- metadata
metadata of batch to write
- returns
true if operation is successful otherwise false.
- def addNewBatchByStream(batchId: Long)(fn: (OutputStream) => Unit): Boolean
Store the metadata for the specified batchId and return
true
if successful.Store the metadata for the specified batchId and return
true
if successful. This method fills the content of metadata via executing function. If the function throws an exception, writing will be automatically cancelled and this method will propagate the exception.If the batchId's metadata has already been stored, this method will return
false
.Writing the metadata is done by writing a batch to a temp file then rename it to the batch file.
There may be multiple HDFSMetadataLog using the same metadata path. Although it is not a valid behavior, we still need to prevent it from destroying the files.
- Definition Classes
- HDFSMetadataLog
- def applyFnToBatchByStream[RET](batchId: Long, skipExistingCheck: Boolean = false)(fn: (InputStream) => RET): RET
Apply provided function to each entry in the specific batch metadata log.
Apply provided function to each entry in the specific batch metadata log.
Unlike get which will materialize all entries into memory, this method streamlines the process via READ-AND-PROCESS. This helps to avoid the memory issue on huge metadata log file.
NOTE: This no longer fails early on corruption. The caller should handle the exception properly and make sure the logic is not affected by failing in the middle.
- Definition Classes
- HDFSMetadataLog
- final def asInstanceOf[T0]: T0
- Definition Classes
- Any
- val batchCache: Map[Long, CommitMetadata]
Cache the latest two batches.
Cache the latest two batches. StreamExecution usually just accesses the latest two batches when committing offsets, this cache will save some file system operations.
- Attributes
- protected[sql]
- Definition Classes
- HDFSMetadataLog
- val batchFilesFilter: PathFilter
A
PathFilter
to filter only batch filesA
PathFilter
to filter only batch files- Attributes
- protected
- Definition Classes
- HDFSMetadataLog
- def batchIdToPath(batchId: Long): Path
- Attributes
- protected
- Definition Classes
- HDFSMetadataLog
- def clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @native()
- def deserialize(in: InputStream): CommitMetadata
Read and deserialize the metadata from input stream.
Read and deserialize the metadata from input stream. If this method is overridden in a subclass, the overriding method should not close the given input stream, as it will be closed in the caller.
- Attributes
- protected
- Definition Classes
- CommitLog → HDFSMetadataLog
- final def eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def equals(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef → Any
- val fileManager: CheckpointFileManager
- Attributes
- protected
- Definition Classes
- HDFSMetadataLog
- def finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable])
- def get(startId: Option[Long], endId: Option[Long]): Array[(Long, CommitMetadata)]
Return metadata for batches between startId (inclusive) and endId (inclusive).
Return metadata for batches between startId (inclusive) and endId (inclusive). If
startId
isNone
, just return all batches before endId (inclusive).- Definition Classes
- HDFSMetadataLog → MetadataLog
- def get(batchId: Long): Option[CommitMetadata]
Return the metadata for the specified batchId if it's stored.
Return the metadata for the specified batchId if it's stored. Otherwise, return None.
- Definition Classes
- HDFSMetadataLog → MetadataLog
- final def getClass(): Class[_ <: AnyRef]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- def getLatest(): Option[(Long, CommitMetadata)]
Return the latest batch Id and its metadata if exist.
Return the latest batch Id and its metadata if exist.
- Definition Classes
- HDFSMetadataLog → MetadataLog
- def getLatestBatchId(): Option[Long]
Return the latest batch id without reading the file.
Return the latest batch id without reading the file.
- Definition Classes
- HDFSMetadataLog
- def getOrderedBatchFiles(): Array[FileStatus]
Get an array of [FileStatus] referencing batch files.
Get an array of [FileStatus] referencing batch files. The array is sorted by most recent batch file first to oldest batch file.
- Definition Classes
- HDFSMetadataLog
- def getPrevBatchFromStorage(batchId: Long): Option[Long]
Get the id of the previous batch from storage
Get the id of the previous batch from storage
- batchId
get the previous batch id of this batch with batchId
- Definition Classes
- HDFSMetadataLog
- def hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
- Attributes
- protected
- Definition Classes
- Logging
- def initializeLogIfNecessary(isInterpreter: Boolean): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def isBatchFile(path: Path): Boolean
- Attributes
- protected
- Definition Classes
- HDFSMetadataLog
- final def isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- def isTraceEnabled(): Boolean
- Attributes
- protected
- Definition Classes
- Logging
- def listBatches: Array[Long]
List the available batches on file system.
List the available batches on file system.
- Attributes
- protected
- Definition Classes
- HDFSMetadataLog
- def listBatchesOnDisk: Array[Long]
List the batches persisted to storage
- def log: Logger
- Attributes
- protected
- Definition Classes
- Logging
- def logDebug(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logDebug(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logError(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logError(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logInfo(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logInfo(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logName: String
- Attributes
- protected
- Definition Classes
- Logging
- def logTrace(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logTrace(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logWarning(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logWarning(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- val metadataCacheEnabled: Boolean
- Attributes
- protected
- Definition Classes
- HDFSMetadataLog
- val metadataPath: Path
- Definition Classes
- HDFSMetadataLog
- final def ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- final def notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- final def notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- def pathToBatchId(path: Path): Long
- Attributes
- protected
- Definition Classes
- HDFSMetadataLog
- def purge(thresholdBatchId: Long): Unit
Purge entries in the commit log up to thresholdBatchId.
Purge entries in the commit log up to thresholdBatchId.
- Definition Classes
- AsyncCommitLog → HDFSMetadataLog → MetadataLog
- def purgeAfter(thresholdBatchId: Long): Unit
Removes all log entries later than thresholdBatchId (exclusive).
Removes all log entries later than thresholdBatchId (exclusive).
- Definition Classes
- HDFSMetadataLog
- def serialize(metadata: CommitMetadata, out: OutputStream): Unit
Serialize the metadata and write to the output stream.
Serialize the metadata and write to the output stream. If this method is overridden in a subclass, the overriding method should not close the given output stream, as it will be closed in the caller.
- Attributes
- protected
- Definition Classes
- CommitLog → HDFSMetadataLog
- final def synchronized[T0](arg0: => T0): T0
- Definition Classes
- AnyRef
- def toString(): String
- Definition Classes
- AnyRef → Any
- final def wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()
- def write(batchMetadataFile: Path, fn: (OutputStream) => Unit): Unit
- Attributes
- protected
- Definition Classes
- HDFSMetadataLog
- val writtenToDurableStorage: ConcurrentLinkedDeque[Long]