Class FileMergingSnapshotManagerBase
- java.lang.Object
-
- org.apache.flink.runtime.checkpoint.filemerging.FileMergingSnapshotManagerBase
-
- All Implemented Interfaces:
Closeable
,AutoCloseable
,FileMergingSnapshotManager
- Direct Known Subclasses:
AcrossCheckpointFileMergingSnapshotManager
,WithinCheckpointFileMergingSnapshotManager
public abstract class FileMergingSnapshotManagerBase extends Object implements FileMergingSnapshotManager
Base implementation ofFileMergingSnapshotManager
.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description protected static class
FileMergingSnapshotManagerBase.DirectoryHandleWithReferenceTrack
This class wrap DirectoryStreamStateHandle with reference count by ongoing checkpoint.-
Nested classes/interfaces inherited from interface org.apache.flink.runtime.checkpoint.filemerging.FileMergingSnapshotManager
FileMergingSnapshotManager.SpaceStat, FileMergingSnapshotManager.SubtaskKey
-
-
Field Summary
Fields Modifier and Type Field Description protected org.apache.flink.core.fs.Path
checkpointDir
protected PhysicalFilePool.Type
filePoolType
Type of physical file pool.protected org.apache.flink.core.fs.FileSystem
fs
TheFileSystem
that this manager works on.protected Executor
ioExecutor
The executor for I/O operations in this manager.protected Object
lock
Guard forinitFileSystem(org.apache.flink.core.fs.FileSystem, org.apache.flink.core.fs.Path, org.apache.flink.core.fs.Path, org.apache.flink.core.fs.Path, int)
,restoreStateHandles(long, org.apache.flink.runtime.checkpoint.filemerging.FileMergingSnapshotManager.SubtaskKey, java.util.stream.Stream<org.apache.flink.runtime.state.filemerging.SegmentFileStateHandle>)
and uploadedStates.protected org.apache.flink.core.fs.Path
managedExclusiveStateDir
The private state files are merged across subtasks, there is only one directory for merged-files within one TM per job.protected FileMergingSnapshotManagerBase.DirectoryHandleWithReferenceTrack
managedExclusiveStateDirHandle
TheDirectoryStreamStateHandle
with it ongoing checkpoint reference count for private state directory, one for each taskmanager and job.protected long
maxPhysicalFileSize
Max size for a physical file.protected float
maxSpaceAmplification
protected FileMergingMetricGroup
metricGroup
The metric group for file merging snapshot manager.protected PhysicalFile.PhysicalFileDeleter
physicalFileDeleter
protected org.apache.flink.core.fs.Path
sharedStateDir
protected boolean
shouldSyncAfterClosingLogicalFile
File-system dependent value.protected FileMergingSnapshotManager.SpaceStat
spaceStat
The current space statistic, updated on file creation/deletion.protected org.apache.flink.core.fs.Path
taskOwnedStateDir
protected TreeMap<Long,Set<LogicalFile>>
uploadedStates
protected int
writeBufferSize
The buffer size for writing files to the file system.
-
Constructor Summary
Constructors Constructor Description FileMergingSnapshotManagerBase(String id, long maxFileSize, PhysicalFilePool.Type filePoolType, float maxSpaceAmplification, Executor ioExecutor, org.apache.flink.metrics.MetricGroup parentMetricGroup)
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description void
close()
boolean
couldReusePreviousStateHandle(StreamStateHandle stateHandle)
Check whether previous state handles could further be reused considering the space amplification.FileMergingCheckpointStateOutputStream
createCheckpointStateOutputStream(FileMergingSnapshotManager.SubtaskKey subtaskKey, long checkpointId, CheckpointedStateScope scope)
Create a newFileMergingCheckpointStateOutputStream
.protected LogicalFile
createLogicalFile(PhysicalFile physicalFile, long startOffset, long length, FileMergingSnapshotManager.SubtaskKey subtaskKey)
Create a logical file on a physical file.protected PhysicalFile
createPhysicalFile(FileMergingSnapshotManager.SubtaskKey subtaskKey, CheckpointedStateScope scope)
Create a physical file in right location (managed directory), which is specified by scope of this checkpoint and current subtask.protected PhysicalFilePool
createPhysicalPool()
Create physical pool by filePoolType.protected void
deletePhysicalFile(org.apache.flink.core.fs.Path filePath, long size)
Delete a physical file by given file path.protected void
discardCheckpoint(long checkpointId)
The callback which will be triggered when all subtasks discarded (aborted or subsumed).void
discardSingleLogicalFile(LogicalFile logicalFile, long checkpointId)
protected org.apache.flink.core.fs.Path
generatePhysicalFilePath(org.apache.flink.core.fs.Path dirPath)
Generate a file path for a physical file.String
getId()
LogicalFile
getLogicalFile(LogicalFile.LogicalFileId fileId)
org.apache.flink.core.fs.Path
getManagedDir(FileMergingSnapshotManager.SubtaskKey subtaskKey, CheckpointedStateScope scope)
Get the managed directory of the file-merging snapshot manager, created inFileMergingSnapshotManager.initFileSystem(org.apache.flink.core.fs.FileSystem, org.apache.flink.core.fs.Path, org.apache.flink.core.fs.Path, org.apache.flink.core.fs.Path, int)
orFileMergingSnapshotManager.registerSubtaskForSharedStates(org.apache.flink.runtime.checkpoint.filemerging.FileMergingSnapshotManager.SubtaskKey)
.DirectoryStreamStateHandle
getManagedDirStateHandle(FileMergingSnapshotManager.SubtaskKey subtaskKey, CheckpointedStateScope scope)
Get theDirectoryStreamStateHandle
of the managed directory, created inFileMergingSnapshotManager.initFileSystem(org.apache.flink.core.fs.FileSystem, org.apache.flink.core.fs.Path, org.apache.flink.core.fs.Path, org.apache.flink.core.fs.Path, int)
orFileMergingSnapshotManager.registerSubtaskForSharedStates(org.apache.flink.runtime.checkpoint.filemerging.FileMergingSnapshotManager.SubtaskKey)
.protected abstract PhysicalFile
getOrCreatePhysicalFileForCheckpoint(FileMergingSnapshotManager.SubtaskKey subtaskKey, long checkpointId, CheckpointedStateScope scope)
Get a reused physical file or create one.void
initFileSystem(org.apache.flink.core.fs.FileSystem fileSystem, org.apache.flink.core.fs.Path checkpointBaseDir, org.apache.flink.core.fs.Path sharedStateDir, org.apache.flink.core.fs.Path taskOwnedStateDir, int writeBufferSize)
Initialize the file system, recording the checkpoint path the manager should work with.void
notifyCheckpointAborted(FileMergingSnapshotManager.SubtaskKey subtaskKey, long checkpointId)
This method is called as a notification once a distributed checkpoint has been aborted.void
notifyCheckpointComplete(FileMergingSnapshotManager.SubtaskKey subtaskKey, long checkpointId)
Notifies the manager that the checkpoint with the givencheckpointId
completed and was committed.void
notifyCheckpointStart(FileMergingSnapshotManager.SubtaskKey subtaskKey, long checkpointId)
SubtaskCheckpointCoordinatorImpl
use this method let the file merging manager know an ongoing checkpoint may reference the managed dirs.void
notifyCheckpointSubsumed(FileMergingSnapshotManager.SubtaskKey subtaskKey, long checkpointId)
This method is called as a notification once a distributed checkpoint has been subsumed.void
registerSubtaskForSharedStates(FileMergingSnapshotManager.SubtaskKey subtaskKey)
Register a subtask and create the managed directory for shared states.void
restoreStateHandles(long checkpointId, FileMergingSnapshotManager.SubtaskKey subtaskKey, Stream<SegmentFileStateHandle> stateHandles)
Restore and re-register the SegmentFileStateHandles into FileMergingSnapshotManager.protected abstract void
returnPhysicalFileForNextReuse(FileMergingSnapshotManager.SubtaskKey subtaskKey, long checkpointId, PhysicalFile physicalFile)
Try to return an existing physical file to the manager for next reuse.void
reusePreviousStateHandle(long checkpointId, Collection<? extends StreamStateHandle> stateHandles)
A callback method which is called when previous state handles are reused by following checkpoint(s).void
unregisterSubtask(FileMergingSnapshotManager.SubtaskKey subtaskKey)
Unregister a subtask.
-
-
-
Field Detail
-
ioExecutor
protected final Executor ioExecutor
The executor for I/O operations in this manager.
-
lock
protected final Object lock
Guard forinitFileSystem(org.apache.flink.core.fs.FileSystem, org.apache.flink.core.fs.Path, org.apache.flink.core.fs.Path, org.apache.flink.core.fs.Path, int)
,restoreStateHandles(long, org.apache.flink.runtime.checkpoint.filemerging.FileMergingSnapshotManager.SubtaskKey, java.util.stream.Stream<org.apache.flink.runtime.state.filemerging.SegmentFileStateHandle>)
and uploadedStates.
-
uploadedStates
protected TreeMap<Long,Set<LogicalFile>> uploadedStates
-
fs
protected org.apache.flink.core.fs.FileSystem fs
TheFileSystem
that this manager works on.
-
checkpointDir
protected org.apache.flink.core.fs.Path checkpointDir
-
sharedStateDir
protected org.apache.flink.core.fs.Path sharedStateDir
-
taskOwnedStateDir
protected org.apache.flink.core.fs.Path taskOwnedStateDir
-
writeBufferSize
protected int writeBufferSize
The buffer size for writing files to the file system.
-
shouldSyncAfterClosingLogicalFile
protected boolean shouldSyncAfterClosingLogicalFile
File-system dependent value. Mark whether the file system this manager running on need sync for visibility. If true, DO a file sync after writing each segment .
-
maxPhysicalFileSize
protected long maxPhysicalFileSize
Max size for a physical file.
-
filePoolType
protected PhysicalFilePool.Type filePoolType
Type of physical file pool.
-
maxSpaceAmplification
protected final float maxSpaceAmplification
-
physicalFileDeleter
protected PhysicalFile.PhysicalFileDeleter physicalFileDeleter
-
managedExclusiveStateDir
protected org.apache.flink.core.fs.Path managedExclusiveStateDir
The private state files are merged across subtasks, there is only one directory for merged-files within one TM per job.
-
managedExclusiveStateDirHandle
protected FileMergingSnapshotManagerBase.DirectoryHandleWithReferenceTrack managedExclusiveStateDirHandle
TheDirectoryStreamStateHandle
with it ongoing checkpoint reference count for private state directory, one for each taskmanager and job.
-
spaceStat
protected FileMergingSnapshotManager.SpaceStat spaceStat
The current space statistic, updated on file creation/deletion.
-
metricGroup
protected FileMergingMetricGroup metricGroup
The metric group for file merging snapshot manager.
-
-
Constructor Detail
-
FileMergingSnapshotManagerBase
public FileMergingSnapshotManagerBase(String id, long maxFileSize, PhysicalFilePool.Type filePoolType, float maxSpaceAmplification, Executor ioExecutor, org.apache.flink.metrics.MetricGroup parentMetricGroup)
-
-
Method Detail
-
initFileSystem
public void initFileSystem(org.apache.flink.core.fs.FileSystem fileSystem, org.apache.flink.core.fs.Path checkpointBaseDir, org.apache.flink.core.fs.Path sharedStateDir, org.apache.flink.core.fs.Path taskOwnedStateDir, int writeBufferSize) throws IllegalArgumentException
Description copied from interface:FileMergingSnapshotManager
Initialize the file system, recording the checkpoint path the manager should work with.The layout of checkpoint directory: /user-defined-checkpoint-dir /{job-id} (checkpointBaseDir) | + --shared/ | + --subtask-1/ + -- merged shared state files + --subtask-2/ + -- merged shared state files + --taskowned/ + -- merged private state files + --chk-1/ + --chk-2/ + --chk-3/
The reason why initializing directories in this method instead of the constructor is that the FileMergingSnapshotManager itself belongs to the
TaskStateManager
, which is initialized when receiving a task, while the base directories for checkpoint are created byFsCheckpointStorageAccess
when the state backend initializes per subtask. After the checkpoint directories are initialized, the managed subdirectories are initialized here.Note: This method may be called several times, the implementation should ensure idempotency, and throw
IllegalArgumentException
when any of the path in params change across function calls.- Specified by:
initFileSystem
in interfaceFileMergingSnapshotManager
- Parameters:
fileSystem
- The filesystem to write to.checkpointBaseDir
- The base directory for checkpoints.sharedStateDir
- The directory for shared checkpoint data.taskOwnedStateDir
- The name of the directory for state not owned/released by the master, but by the TaskManagers.writeBufferSize
- The buffer size for writing files to the file system.- Throws:
IllegalArgumentException
- thrown if these three paths are not deterministic across calls.
-
registerSubtaskForSharedStates
public void registerSubtaskForSharedStates(FileMergingSnapshotManager.SubtaskKey subtaskKey)
Description copied from interface:FileMergingSnapshotManager
Register a subtask and create the managed directory for shared states.- Specified by:
registerSubtaskForSharedStates
in interfaceFileMergingSnapshotManager
- Parameters:
subtaskKey
- the subtask key identifying a subtask.- See Also:
for layout information.
-
unregisterSubtask
public void unregisterSubtask(FileMergingSnapshotManager.SubtaskKey subtaskKey)
Description copied from interface:FileMergingSnapshotManager
Unregister a subtask.- Specified by:
unregisterSubtask
in interfaceFileMergingSnapshotManager
- Parameters:
subtaskKey
- the subtask key identifying a subtask.
-
createLogicalFile
protected LogicalFile createLogicalFile(@Nonnull PhysicalFile physicalFile, long startOffset, long length, @Nonnull FileMergingSnapshotManager.SubtaskKey subtaskKey)
Create a logical file on a physical file.- Parameters:
physicalFile
- the underlying physical file.startOffset
- the offset in the physical file that the logical file starts from.length
- the length of the logical file.subtaskKey
- the id of the subtask that the logical file belongs to.- Returns:
- the created logical file.
-
createPhysicalFile
@Nonnull protected PhysicalFile createPhysicalFile(FileMergingSnapshotManager.SubtaskKey subtaskKey, CheckpointedStateScope scope) throws IOException
Create a physical file in right location (managed directory), which is specified by scope of this checkpoint and current subtask.- Parameters:
subtaskKey
- theFileMergingSnapshotManager.SubtaskKey
of current subtask.scope
- the scope of the checkpoint.- Returns:
- the created physical file.
- Throws:
IOException
- if anything goes wrong with file system.
-
createCheckpointStateOutputStream
public FileMergingCheckpointStateOutputStream createCheckpointStateOutputStream(FileMergingSnapshotManager.SubtaskKey subtaskKey, long checkpointId, CheckpointedStateScope scope)
Description copied from interface:FileMergingSnapshotManager
Create a newFileMergingCheckpointStateOutputStream
. According to the file merging strategy, the streams returned by multiple calls to this function may share the same underlying physical file, and each stream writes to a segment of the physical file.- Specified by:
createCheckpointStateOutputStream
in interfaceFileMergingSnapshotManager
- Parameters:
subtaskKey
- The subtask key identifying the subtask.checkpointId
- ID of the checkpoint.scope
- The state's scope, whether it is exclusive or shared.- Returns:
- An output stream that writes state for the given checkpoint.
-
generatePhysicalFilePath
protected org.apache.flink.core.fs.Path generatePhysicalFilePath(org.apache.flink.core.fs.Path dirPath)
Generate a file path for a physical file.- Parameters:
dirPath
- the parent directory path for the physical file.- Returns:
- the generated file path for a physical file.
-
deletePhysicalFile
protected final void deletePhysicalFile(org.apache.flink.core.fs.Path filePath, long size)
Delete a physical file by given file path. Use the io executor to do the deletion.- Parameters:
filePath
- the given file path to delete.
-
createPhysicalPool
protected final PhysicalFilePool createPhysicalPool()
Create physical pool by filePoolType.- Returns:
- physical file pool.
-
getOrCreatePhysicalFileForCheckpoint
@Nonnull protected abstract PhysicalFile getOrCreatePhysicalFileForCheckpoint(FileMergingSnapshotManager.SubtaskKey subtaskKey, long checkpointId, CheckpointedStateScope scope) throws IOException
Get a reused physical file or create one. This will be called in checkpoint output stream creation logic.Basic logic of file reusing: whenever a physical file is needed, this method is called with necessary information provided for acquiring a file. The file will not be reused until it is written and returned to the reused pool by calling
returnPhysicalFileForNextReuse(org.apache.flink.runtime.checkpoint.filemerging.FileMergingSnapshotManager.SubtaskKey, long, org.apache.flink.runtime.checkpoint.filemerging.PhysicalFile)
.- Parameters:
subtaskKey
- the subtask key for the callercheckpointId
- the checkpoint idscope
- checkpoint scope- Returns:
- the requested physical file.
- Throws:
IOException
- thrown if anything goes wrong with file system.
-
returnPhysicalFileForNextReuse
protected abstract void returnPhysicalFileForNextReuse(FileMergingSnapshotManager.SubtaskKey subtaskKey, long checkpointId, PhysicalFile physicalFile) throws IOException
Try to return an existing physical file to the manager for next reuse. If this physical file is no longer needed (for reusing), it will be closed.Basic logic of file reusing, see
getOrCreatePhysicalFileForCheckpoint(org.apache.flink.runtime.checkpoint.filemerging.FileMergingSnapshotManager.SubtaskKey, long, org.apache.flink.runtime.state.CheckpointedStateScope)
.- Parameters:
subtaskKey
- the subtask key for the callercheckpointId
- in which checkpoint this physical file is requested.physicalFile
- the returning checkpoint- Throws:
IOException
- thrown if anything goes wrong with file system.- See Also:
getOrCreatePhysicalFileForCheckpoint(SubtaskKey, long, CheckpointedStateScope)
-
discardCheckpoint
protected void discardCheckpoint(long checkpointId) throws IOException
The callback which will be triggered when all subtasks discarded (aborted or subsumed).- Parameters:
checkpointId
- the discarded checkpoint id.- Throws:
IOException
- if anything goes wrong with file system.
-
notifyCheckpointStart
public void notifyCheckpointStart(FileMergingSnapshotManager.SubtaskKey subtaskKey, long checkpointId)
SubtaskCheckpointCoordinatorImpl
use this method let the file merging manager know an ongoing checkpoint may reference the managed dirs.- Specified by:
notifyCheckpointStart
in interfaceFileMergingSnapshotManager
- Parameters:
subtaskKey
- the subtask key identifying the subtask.checkpointId
- The ID of the checkpoint that has been started.
-
notifyCheckpointComplete
public void notifyCheckpointComplete(FileMergingSnapshotManager.SubtaskKey subtaskKey, long checkpointId) throws Exception
Description copied from interface:FileMergingSnapshotManager
Notifies the manager that the checkpoint with the givencheckpointId
completed and was committed.- Specified by:
notifyCheckpointComplete
in interfaceFileMergingSnapshotManager
- Parameters:
subtaskKey
- the subtask key identifying the subtask.checkpointId
- The ID of the checkpoint that has been completed.- Throws:
Exception
- thrown if anything goes wrong with the listener.
-
notifyCheckpointAborted
public void notifyCheckpointAborted(FileMergingSnapshotManager.SubtaskKey subtaskKey, long checkpointId) throws Exception
Description copied from interface:FileMergingSnapshotManager
This method is called as a notification once a distributed checkpoint has been aborted.- Specified by:
notifyCheckpointAborted
in interfaceFileMergingSnapshotManager
- Parameters:
subtaskKey
- the subtask key identifying the subtask.checkpointId
- The ID of the checkpoint that has been completed.- Throws:
Exception
- thrown if anything goes wrong with the listener.
-
notifyCheckpointSubsumed
public void notifyCheckpointSubsumed(FileMergingSnapshotManager.SubtaskKey subtaskKey, long checkpointId) throws Exception
Description copied from interface:FileMergingSnapshotManager
This method is called as a notification once a distributed checkpoint has been subsumed.- Specified by:
notifyCheckpointSubsumed
in interfaceFileMergingSnapshotManager
- Parameters:
subtaskKey
- the subtask key identifying the subtask.checkpointId
- The ID of the checkpoint that has been completed.- Throws:
Exception
- thrown if anything goes wrong with the listener.
-
reusePreviousStateHandle
public void reusePreviousStateHandle(long checkpointId, Collection<? extends StreamStateHandle> stateHandles)
Description copied from interface:FileMergingSnapshotManager
A callback method which is called when previous state handles are reused by following checkpoint(s).- Specified by:
reusePreviousStateHandle
in interfaceFileMergingSnapshotManager
- Parameters:
checkpointId
- the checkpoint that reuses the handles.stateHandles
- the handles to be reused.
-
couldReusePreviousStateHandle
public boolean couldReusePreviousStateHandle(StreamStateHandle stateHandle)
Description copied from interface:FileMergingSnapshotManager
Check whether previous state handles could further be reused considering the space amplification.- Specified by:
couldReusePreviousStateHandle
in interfaceFileMergingSnapshotManager
- Parameters:
stateHandle
- the handle to be reused.
-
discardSingleLogicalFile
public void discardSingleLogicalFile(LogicalFile logicalFile, long checkpointId) throws IOException
- Throws:
IOException
-
getManagedDir
public org.apache.flink.core.fs.Path getManagedDir(FileMergingSnapshotManager.SubtaskKey subtaskKey, CheckpointedStateScope scope)
Description copied from interface:FileMergingSnapshotManager
Get the managed directory of the file-merging snapshot manager, created inFileMergingSnapshotManager.initFileSystem(org.apache.flink.core.fs.FileSystem, org.apache.flink.core.fs.Path, org.apache.flink.core.fs.Path, org.apache.flink.core.fs.Path, int)
orFileMergingSnapshotManager.registerSubtaskForSharedStates(org.apache.flink.runtime.checkpoint.filemerging.FileMergingSnapshotManager.SubtaskKey)
.- Specified by:
getManagedDir
in interfaceFileMergingSnapshotManager
- Parameters:
subtaskKey
- the subtask key identifying the subtask.scope
- the checkpoint scope.- Returns:
- the managed directory for one subtask in specified checkpoint scope.
-
getManagedDirStateHandle
public DirectoryStreamStateHandle getManagedDirStateHandle(FileMergingSnapshotManager.SubtaskKey subtaskKey, CheckpointedStateScope scope)
Description copied from interface:FileMergingSnapshotManager
Get theDirectoryStreamStateHandle
of the managed directory, created inFileMergingSnapshotManager.initFileSystem(org.apache.flink.core.fs.FileSystem, org.apache.flink.core.fs.Path, org.apache.flink.core.fs.Path, org.apache.flink.core.fs.Path, int)
orFileMergingSnapshotManager.registerSubtaskForSharedStates(org.apache.flink.runtime.checkpoint.filemerging.FileMergingSnapshotManager.SubtaskKey)
.- Specified by:
getManagedDirStateHandle
in interfaceFileMergingSnapshotManager
- Parameters:
subtaskKey
- the subtask key identifying the subtask.scope
- the checkpoint scope.- Returns:
- the
DirectoryStreamStateHandle
for one subtask in specified checkpoint scope.
-
close
public void close() throws IOException
- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceCloseable
- Throws:
IOException
-
getId
@VisibleForTesting public String getId()
-
restoreStateHandles
public void restoreStateHandles(long checkpointId, FileMergingSnapshotManager.SubtaskKey subtaskKey, Stream<SegmentFileStateHandle> stateHandles)
Description copied from interface:FileMergingSnapshotManager
Restore and re-register the SegmentFileStateHandles into FileMergingSnapshotManager.- Specified by:
restoreStateHandles
in interfaceFileMergingSnapshotManager
- Parameters:
checkpointId
- the restored checkpoint id.subtaskKey
- the subtask key identifying the subtask.stateHandles
- the restored segment file handles.
-
getLogicalFile
@VisibleForTesting public LogicalFile getLogicalFile(LogicalFile.LogicalFileId fileId)
-
-