org.apache.spark.sql.execution.streaming
FileStreamSourceLog
Companion object FileStreamSourceLog
class FileStreamSourceLog extends CompactibleFileStreamLog[FileEntry]
- Alphabetic
- By Inheritance
- FileStreamSourceLog
- CompactibleFileStreamLog
- HDFSMetadataLog
- Logging
- MetadataLog
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
- new FileStreamSourceLog(metadataLogVersion: Int, sparkSession: SparkSession, path: String)
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
add(batchId: Long, logs: Array[FileEntry]): Boolean
Store the metadata for the specified batchId and return
true
if successful.Store the metadata for the specified batchId and return
true
if successful. If the batchId's metadata has already been stored, this method will returnfalse
.- Definition Classes
- FileStreamSourceLog → CompactibleFileStreamLog → HDFSMetadataLog → MetadataLog
-
def
addNewBatchByStream(batchId: Long)(fn: (OutputStream) ⇒ Unit): Boolean
Store the metadata for the specified batchId and return
true
if successful.Store the metadata for the specified batchId and return
true
if successful. This method fills the content of metadata via executing function. If the function throws an exception, writing will be automatically cancelled and this method will propagate the exception.If the batchId's metadata has already been stored, this method will return
false
.Writing the metadata is done by writing a batch to a temp file then rename it to the batch file.
There may be multiple HDFSMetadataLog using the same metadata path. Although it is not a valid behavior, we still need to prevent it from destroying the files.
- Definition Classes
- HDFSMetadataLog
-
def
allFiles(): Array[FileEntry]
Returns all files except the deleted ones.
Returns all files except the deleted ones.
- Definition Classes
- CompactibleFileStreamLog
-
def
applyFnToBatchByStream[RET](batchId: Long)(fn: (InputStream) ⇒ RET): RET
Apply provided function to each entry in the specific batch metadata log.
Apply provided function to each entry in the specific batch metadata log.
Unlike get which will materialize all entries into memory, this method streamlines the process via READ-AND-PROCESS. This helps to avoid the memory issue on huge metadata log file.
NOTE: This no longer fails early on corruption. The caller should handle the exception properly and make sure the logic is not affected by failing in the middle.
- Definition Classes
- HDFSMetadataLog
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
val
batchFilesFilter: PathFilter
A
PathFilter
to filter only batch filesA
PathFilter
to filter only batch files- Attributes
- protected
- Definition Classes
- HDFSMetadataLog
-
def
batchIdToPath(batchId: Long): Path
- Definition Classes
- CompactibleFileStreamLog → HDFSMetadataLog
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
final
lazy val
compactInterval: Int
- Attributes
- protected
- Definition Classes
- CompactibleFileStreamLog
-
val
defaultCompactInterval: Int
- Attributes
- protected
- Definition Classes
- FileStreamSourceLog → CompactibleFileStreamLog
-
def
deserialize(in: InputStream): Array[FileEntry]
Read and deserialize the metadata from input stream.
Read and deserialize the metadata from input stream. If this method is overridden in a subclass, the overriding method should not close the given input stream, as it will be closed in the caller.
- Definition Classes
- CompactibleFileStreamLog → HDFSMetadataLog
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
val
fileCleanupDelayMs: Long
If we delete the old files after compaction at once, there is a race condition in S3: other processes may see the old files are deleted but still cannot see the compaction file using "list".
If we delete the old files after compaction at once, there is a race condition in S3: other processes may see the old files are deleted but still cannot see the compaction file using "list". The
allFiles
handles this by looking for the next compaction file directly, however, a live lock may happen if the compaction happens too frequently: one processing keeps deleting old files while another one keeps retrying. Setting a reasonable cleanup delay could avoid it.- Attributes
- protected
- Definition Classes
- FileStreamSourceLog → CompactibleFileStreamLog
-
val
fileManager: CheckpointFileManager
- Attributes
- protected
- Definition Classes
- HDFSMetadataLog
-
def
filterInBatch(batchId: Long)(predicate: (FileEntry) ⇒ Boolean): Option[Array[FileEntry]]
Apply filter on all entries in the specific batch.
Apply filter on all entries in the specific batch.
- Definition Classes
- CompactibleFileStreamLog
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
def
foreachInBatch(batchId: Long)(fn: (FileEntry) ⇒ Unit): Unit
Apply function on all entries in the specific batch.
Apply function on all entries in the specific batch. The method will throw FileNotFoundException if the metadata log file doesn't exist.
NOTE: This doesn't fail early on corruption. The caller should handle the exception properly and make sure the logic is not affected by failing in the middle.
- Definition Classes
- CompactibleFileStreamLog
-
def
get(startId: Option[Long], endId: Option[Long]): Array[(Long, Array[FileEntry])]
Return metadata for batches between startId (inclusive) and endId (inclusive).
Return metadata for batches between startId (inclusive) and endId (inclusive). If
startId
isNone
, just return all batches before endId (inclusive).- Definition Classes
- FileStreamSourceLog → HDFSMetadataLog → MetadataLog
-
def
get(batchId: Long): Option[Array[FileEntry]]
Return the metadata for the specified batchId if it's stored.
Return the metadata for the specified batchId if it's stored. Otherwise, return None.
- Definition Classes
- HDFSMetadataLog → MetadataLog
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
getLatest(): Option[(Long, Array[FileEntry])]
Return the latest batch Id and its metadata if exist.
Return the latest batch Id and its metadata if exist.
- Definition Classes
- HDFSMetadataLog → MetadataLog
-
def
getLatestBatchId(): Option[Long]
Return the latest batch Id without reading the file.
Return the latest batch Id without reading the file. This method only checks for existence of file to avoid cost on reading and deserializing log file.
- Definition Classes
- HDFSMetadataLog
-
def
getOrderedBatchFiles(): Array[FileStatus]
Get an array of [FileStatus] referencing batch files.
Get an array of [FileStatus] referencing batch files. The array is sorted by most recent batch file first to oldest batch file.
- Definition Classes
- HDFSMetadataLog
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
- Attributes
- protected
- Definition Classes
- Logging
-
def
initializeLogIfNecessary(isInterpreter: Boolean): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
isBatchFile(path: Path): Boolean
- Definition Classes
- CompactibleFileStreamLog → HDFSMetadataLog
-
val
isDeletingExpiredLog: Boolean
- Attributes
- protected
- Definition Classes
- FileStreamSourceLog → CompactibleFileStreamLog
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
def
isTraceEnabled(): Boolean
- Attributes
- protected
- Definition Classes
- Logging
-
def
log: Logger
- Attributes
- protected
- Definition Classes
- Logging
-
def
logDebug(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logDebug(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logError(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logError(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logInfo(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logInfo(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logName: String
- Attributes
- protected
- Definition Classes
- Logging
-
def
logTrace(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logTrace(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logWarning(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logWarning(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
val
metadataPath: Path
- Definition Classes
- HDFSMetadataLog
-
val
minBatchesToRetain: Int
- Attributes
- protected
- Definition Classes
- CompactibleFileStreamLog
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
def
pathToBatchId(path: Path): Long
- Definition Classes
- CompactibleFileStreamLog → HDFSMetadataLog
-
def
purge(thresholdBatchId: Long): Unit
CompactibleFileStreamLog maintains logs by itself, and manual purging might break internal state, specifically which latest compaction batch is purged.
CompactibleFileStreamLog maintains logs by itself, and manual purging might break internal state, specifically which latest compaction batch is purged.
To simplify the situation, this method just throws UnsupportedOperationException regardless of given parameter, and let CompactibleFileStreamLog handles purging by itself.
- Definition Classes
- CompactibleFileStreamLog → HDFSMetadataLog → MetadataLog
-
def
purgeAfter(thresholdBatchId: Long): Unit
Removes all log entries later than thresholdBatchId (exclusive).
Removes all log entries later than thresholdBatchId (exclusive).
- Definition Classes
- HDFSMetadataLog
- def restore(): Array[FileEntry]
-
def
serialize(logData: Array[FileEntry], out: OutputStream): Unit
Serialize the metadata and write to the output stream.
Serialize the metadata and write to the output stream. If this method is overridden in a subclass, the overriding method should not close the given output stream, as it will be closed in the caller.
- Definition Classes
- CompactibleFileStreamLog → HDFSMetadataLog
-
def
shouldRetain(log: FileEntry, currentTime: Long): Boolean
Determine whether the log should be retained or not.
Determine whether the log should be retained or not.
Default implementation retains all log entries. Implementations should override the method to change the behavior.
- Definition Classes
- CompactibleFileStreamLog
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()