Class PermanentBlobCache

  • All Implemented Interfaces:
    Closeable, AutoCloseable, JobPermanentBlobService, PermanentBlobService

    public class PermanentBlobCache
    extends AbstractBlobCache
    implements JobPermanentBlobService
    Provides a cache for permanent BLOB files including a per-job ref-counting and a staged cleanup.

    When requesting BLOBs via getFile(JobID, PermanentBlobKey), the cache will first attempt to serve the file from its local cache. Only if the local cache does not contain the desired BLOB, it will try to download it from a distributed HA file system (if available) or the BLOB server.

    If files for a job are not needed any more, they will enter a staged, i.e. deferred, cleanup. Files may thus still be accessible upon recovery and do not need to be re-downloaded.

    • Constructor Detail

      • PermanentBlobCache

        public PermanentBlobCache​(org.apache.flink.configuration.Configuration blobClientConfig,
                                  org.apache.flink.util.Reference<File> storageDir,
                                  BlobView blobView,
                                  @Nullable
                                  InetSocketAddress serverAddress)
                           throws IOException
        Instantiates a new cache for permanent BLOBs which are also available in an HA store.
        Parameters:
        blobClientConfig - global configuration
        storageDir - storage directory for the cached blobs
        blobView - (distributed) HA blob store file system to retrieve files from first
        serverAddress - address of the BlobServer to use for fetching files from or null if none yet
        Throws:
        IOException - thrown if the (local or distributed) file storage cannot be created or is not usable
    • Method Detail

      • releaseJob

        public void releaseJob​(org.apache.flink.api.common.JobID jobId)
        Unregisters use of job-related BLOBs and allow them to be released.
        Specified by:
        releaseJob in interface JobPermanentBlobService
        Parameters:
        jobId - ID of the job this blob belongs to
        See Also:
        registerJob(JobID)
      • getNumberOfReferenceHolders

        public int getNumberOfReferenceHolders​(org.apache.flink.api.common.JobID jobId)
      • getFile

        public File getFile​(org.apache.flink.api.common.JobID jobId,
                            PermanentBlobKey key)
                     throws IOException
        Returns the path to a local copy of the file associated with the provided job ID and blob key.

        We will first attempt to serve the BLOB from the local storage. If the BLOB is not in there, we will try to download it from the HA store, or directly from the BlobServer.

        Specified by:
        getFile in interface PermanentBlobService
        Parameters:
        jobId - ID of the job this blob belongs to
        key - blob key associated with the requested file
        Returns:
        The path to the file.
        Throws:
        FileNotFoundException - if the BLOB does not exist;
        IOException - if any other error occurs when retrieving the file
      • readFile

        public byte[] readFile​(org.apache.flink.api.common.JobID jobId,
                               PermanentBlobKey blobKey)
                        throws IOException
        Returns the content of the file for the BLOB with the provided job ID the blob key.

        The method will first attempt to serve the BLOB from the local cache. If the BLOB is not in the cache, the method will try to download it from the HA store, or directly from the BlobServer.

        Compared to getFile, readFile makes sure that the file is fully read in the same write lock as the file is accessed. This avoids the scenario that the path is returned as the file is deleted concurrently by other threads.

        Specified by:
        readFile in interface PermanentBlobService
        Parameters:
        jobId - ID of the job this blob belongs to
        blobKey - BLOB key associated with the requested file
        Returns:
        The content of the BLOB.
        Throws:
        FileNotFoundException - if the BLOB does not exist;
        IOException - if any other error occurs when retrieving the file.
      • getStorageLocation

        @VisibleForTesting
        public File getStorageLocation​(org.apache.flink.api.common.JobID jobId,
                                       BlobKey key)
                                throws IOException
        Returns a file handle to the file associated with the given blob key on the blob server.
        Parameters:
        jobId - ID of the job this blob belongs to (or null if job-unrelated)
        key - identifying the file
        Returns:
        file handle to the file
        Throws:
        IOException - if creating the directory fails