Class AbstractHaServices
- java.lang.Object
-
- org.apache.flink.runtime.highavailability.AbstractHaServices
-
- All Implemented Interfaces:
AutoCloseable
,GloballyCleanableResource
,ClientHighAvailabilityServices
,HighAvailabilityServices
- Direct Known Subclasses:
ZooKeeperLeaderElectionHaServices
public abstract class AbstractHaServices extends Object implements HighAvailabilityServices
Abstract high availability services based on distributed system(e.g. Zookeeper, Kubernetes). It will help with creating all the leader election/retrieval services and the cleanup. Please return a proper leader name int the implementation ofgetLeaderPathForResourceManager()
,getLeaderPathForDispatcher()
,getLeaderPathForJobManager(org.apache.flink.api.common.JobID)
,getLeaderPathForRestServer()
. The returned leader name is the ConfigMap name in Kubernetes and child path in Zookeeper.close()
andcleanupAllData()
should be implemented to destroy the resources.The abstract class is also responsible for determining which component service should be reused. For example,
jobResultStore
is created once and could be reused many times.
-
-
Field Summary
Fields Modifier and Type Field Description protected org.apache.flink.configuration.Configuration
configuration
The runtime configuration.protected Executor
ioExecutor
The executor to run external IO operations on.protected org.slf4j.Logger
logger
-
Fields inherited from interface org.apache.flink.runtime.highavailability.HighAvailabilityServices
DEFAULT_JOB_ID, DEFAULT_LEADER_ID
-
-
Constructor Summary
Constructors Modifier Constructor Description protected
AbstractHaServices(org.apache.flink.configuration.Configuration config, LeaderElectionDriverFactory driverFactory, Executor ioExecutor, BlobStoreService blobStoreService, JobResultStore jobResultStore)
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description void
cleanupAllData()
Deletes all data stored by high availability services in external stores.void
close()
Closes the high availability services, releasing all resources.BlobStore
createBlobStore()
Creates the BLOB store in which BLOBs are stored in a highly-available fashion.protected abstract CheckpointRecoveryFactory
createCheckpointRecoveryFactory()
Create the checkpoint recovery factory for the job manager.protected abstract ExecutionPlanStore
createExecutionPlanStore()
Create the submitted execution plan store for the job manager.protected abstract LeaderRetrievalService
createLeaderRetrievalService(String leaderName)
Create leader retrieval service with specified leaderName.CheckpointRecoveryFactory
getCheckpointRecoveryFactory()
Gets the checkpoint recovery factory for the job manager.LeaderElection
getClusterRestEndpointLeaderElection()
Gets theLeaderElection
for the cluster's rest endpoint.LeaderRetrievalService
getClusterRestEndpointLeaderRetriever()
Get the leader retriever for the cluster's rest endpoint.LeaderElection
getDispatcherLeaderElection()
Gets theLeaderElection
for the cluster's dispatcher.LeaderRetrievalService
getDispatcherLeaderRetriever()
Gets the leader retriever for the dispatcher.ExecutionPlanStore
getExecutionPlanStore()
Gets the submitted execution plan store for the job manager.LeaderElection
getJobManagerLeaderElection(org.apache.flink.api.common.JobID jobID)
Gets theLeaderElection
for the job with the givenJobID
.LeaderRetrievalService
getJobManagerLeaderRetriever(org.apache.flink.api.common.JobID jobID)
Gets the leader retriever for the job JobMaster which is responsible for the given job.LeaderRetrievalService
getJobManagerLeaderRetriever(org.apache.flink.api.common.JobID jobID, String defaultJobManagerAddress)
Gets the leader retriever for the job JobMaster which is responsible for the given job.JobResultStore
getJobResultStore()
Gets the store that holds information about the state of finished jobs.protected abstract String
getLeaderPathForDispatcher()
Get the leader path for Dispatcher.protected abstract String
getLeaderPathForJobManager(org.apache.flink.api.common.JobID jobID)
Get the leader path for specific JobManager.protected abstract String
getLeaderPathForResourceManager()
Get the leader path for ResourceManager.protected abstract String
getLeaderPathForRestServer()
Get the leader path for RestServer.LeaderElection
getResourceManagerLeaderElection()
Gets theLeaderElection
for the cluster's resource manager.LeaderRetrievalService
getResourceManagerLeaderRetriever()
Gets the leader retriever for the cluster's resource manager.CompletableFuture<Void>
globalCleanupAsync(org.apache.flink.api.common.JobID jobID, Executor executor)
globalCleanupAsync
is expected to be called from the main thread.protected abstract void
internalCleanup()
Clean up the meta data in the distributed system(e.g.protected abstract void
internalCleanupJobData(org.apache.flink.api.common.JobID jobID)
Clean up the meta data in the distributed system(e.g.protected abstract void
internalClose()
Closes the components which is used for external operations(e.g.-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.flink.runtime.highavailability.HighAvailabilityServices
closeWithOptionalClean, getWebMonitorLeaderElection, getWebMonitorLeaderRetriever
-
-
-
-
Field Detail
-
logger
protected final org.slf4j.Logger logger
-
ioExecutor
protected final Executor ioExecutor
The executor to run external IO operations on.
-
configuration
protected final org.apache.flink.configuration.Configuration configuration
The runtime configuration.
-
-
Constructor Detail
-
AbstractHaServices
protected AbstractHaServices(org.apache.flink.configuration.Configuration config, LeaderElectionDriverFactory driverFactory, Executor ioExecutor, BlobStoreService blobStoreService, JobResultStore jobResultStore)
-
-
Method Detail
-
getResourceManagerLeaderRetriever
public LeaderRetrievalService getResourceManagerLeaderRetriever()
Description copied from interface:HighAvailabilityServices
Gets the leader retriever for the cluster's resource manager.- Specified by:
getResourceManagerLeaderRetriever
in interfaceHighAvailabilityServices
-
getDispatcherLeaderRetriever
public LeaderRetrievalService getDispatcherLeaderRetriever()
Description copied from interface:HighAvailabilityServices
Gets the leader retriever for the dispatcher. This leader retrieval service is not always accessible.- Specified by:
getDispatcherLeaderRetriever
in interfaceHighAvailabilityServices
-
getJobManagerLeaderRetriever
public LeaderRetrievalService getJobManagerLeaderRetriever(org.apache.flink.api.common.JobID jobID)
Description copied from interface:HighAvailabilityServices
Gets the leader retriever for the job JobMaster which is responsible for the given job.- Specified by:
getJobManagerLeaderRetriever
in interfaceHighAvailabilityServices
- Parameters:
jobID
- The identifier of the job.- Returns:
- Leader retrieval service to retrieve the job manager for the given job
-
getJobManagerLeaderRetriever
public LeaderRetrievalService getJobManagerLeaderRetriever(org.apache.flink.api.common.JobID jobID, String defaultJobManagerAddress)
Description copied from interface:HighAvailabilityServices
Gets the leader retriever for the job JobMaster which is responsible for the given job.- Specified by:
getJobManagerLeaderRetriever
in interfaceHighAvailabilityServices
- Parameters:
jobID
- The identifier of the job.defaultJobManagerAddress
- JobManager address which will be returned by a static leader retrieval service.- Returns:
- Leader retrieval service to retrieve the job manager for the given job
-
getClusterRestEndpointLeaderRetriever
public LeaderRetrievalService getClusterRestEndpointLeaderRetriever()
Description copied from interface:ClientHighAvailabilityServices
Get the leader retriever for the cluster's rest endpoint.- Specified by:
getClusterRestEndpointLeaderRetriever
in interfaceClientHighAvailabilityServices
- Specified by:
getClusterRestEndpointLeaderRetriever
in interfaceHighAvailabilityServices
- Returns:
- the leader retriever for cluster's rest endpoint.
-
getResourceManagerLeaderElection
public LeaderElection getResourceManagerLeaderElection()
Description copied from interface:HighAvailabilityServices
Gets theLeaderElection
for the cluster's resource manager.- Specified by:
getResourceManagerLeaderElection
in interfaceHighAvailabilityServices
-
getDispatcherLeaderElection
public LeaderElection getDispatcherLeaderElection()
Description copied from interface:HighAvailabilityServices
Gets theLeaderElection
for the cluster's dispatcher.- Specified by:
getDispatcherLeaderElection
in interfaceHighAvailabilityServices
-
getJobManagerLeaderElection
public LeaderElection getJobManagerLeaderElection(org.apache.flink.api.common.JobID jobID)
Description copied from interface:HighAvailabilityServices
Gets theLeaderElection
for the job with the givenJobID
.- Specified by:
getJobManagerLeaderElection
in interfaceHighAvailabilityServices
-
getClusterRestEndpointLeaderElection
public LeaderElection getClusterRestEndpointLeaderElection()
Description copied from interface:HighAvailabilityServices
Gets theLeaderElection
for the cluster's rest endpoint.- Specified by:
getClusterRestEndpointLeaderElection
in interfaceHighAvailabilityServices
-
getCheckpointRecoveryFactory
public CheckpointRecoveryFactory getCheckpointRecoveryFactory() throws Exception
Description copied from interface:HighAvailabilityServices
Gets the checkpoint recovery factory for the job manager.- Specified by:
getCheckpointRecoveryFactory
in interfaceHighAvailabilityServices
- Returns:
- Checkpoint recovery factory
- Throws:
Exception
-
getExecutionPlanStore
public ExecutionPlanStore getExecutionPlanStore() throws Exception
Description copied from interface:HighAvailabilityServices
Gets the submitted execution plan store for the job manager.- Specified by:
getExecutionPlanStore
in interfaceHighAvailabilityServices
- Returns:
- Submitted execution plan store
- Throws:
Exception
- if the submitted execution plan store could not be created
-
getJobResultStore
public JobResultStore getJobResultStore() throws Exception
Description copied from interface:HighAvailabilityServices
Gets the store that holds information about the state of finished jobs.- Specified by:
getJobResultStore
in interfaceHighAvailabilityServices
- Returns:
- Store of finished job results
- Throws:
Exception
- if job result store could not be created
-
createBlobStore
public BlobStore createBlobStore()
Description copied from interface:HighAvailabilityServices
Creates the BLOB store in which BLOBs are stored in a highly-available fashion.- Specified by:
createBlobStore
in interfaceHighAvailabilityServices
- Returns:
- Blob store
-
close
public void close() throws Exception
Description copied from interface:HighAvailabilityServices
Closes the high availability services, releasing all resources.This method does not delete or clean up any data stored in external stores (file systems, ZooKeeper, etc). Another instance of the high availability services will be able to recover the job.
If an exception occurs during closing services, this method will attempt to continue closing other services and report exceptions only after all services have been attempted to be closed.
- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceHighAvailabilityServices
- Throws:
Exception
- Thrown, if an exception occurred while closing these services.
-
cleanupAllData
public void cleanupAllData() throws Exception
Description copied from interface:HighAvailabilityServices
Deletes all data stored by high availability services in external stores.After this method was called, any job or session that was managed by these high availability services will be unrecoverable.
If an exception occurs during cleanup, this method will attempt to continue the cleanup and report exceptions only after all cleanup steps have been attempted.
- Specified by:
cleanupAllData
in interfaceHighAvailabilityServices
- Throws:
Exception
- if an error occurred while cleaning up data stored by them.
-
globalCleanupAsync
public CompletableFuture<Void> globalCleanupAsync(org.apache.flink.api.common.JobID jobID, Executor executor)
Description copied from interface:GloballyCleanableResource
globalCleanupAsync
is expected to be called from the main thread. Heavy IO tasks should be outsourced into the passedcleanupExecutor
. Thread-safety must be ensured.- Specified by:
globalCleanupAsync
in interfaceGloballyCleanableResource
- Specified by:
globalCleanupAsync
in interfaceHighAvailabilityServices
- Parameters:
jobID
- TheJobID
of the job for which the local data should be cleaned up.executor
- The fallback executor for IO-heavy operations.- Returns:
- The cleanup result future.
-
createLeaderRetrievalService
protected abstract LeaderRetrievalService createLeaderRetrievalService(String leaderName)
Create leader retrieval service with specified leaderName.- Parameters:
leaderName
- ConfigMap name in Kubernetes or child node path in Zookeeper.- Returns:
- Return LeaderRetrievalService using Zookeeper or Kubernetes.
-
createCheckpointRecoveryFactory
protected abstract CheckpointRecoveryFactory createCheckpointRecoveryFactory() throws Exception
Create the checkpoint recovery factory for the job manager.- Returns:
- Checkpoint recovery factory
- Throws:
Exception
-
createExecutionPlanStore
protected abstract ExecutionPlanStore createExecutionPlanStore() throws Exception
Create the submitted execution plan store for the job manager.- Returns:
- Submitted execution plan store
- Throws:
Exception
- if the submitted execution plan store could not be created
-
internalClose
protected abstract void internalClose() throws Exception
Closes the components which is used for external operations(e.g. Zookeeper Client, Kubernetes Client).- Throws:
Exception
- if the close operation failed
-
internalCleanup
protected abstract void internalCleanup() throws Exception
Clean up the meta data in the distributed system(e.g. Zookeeper, Kubernetes ConfigMap).If an exception occurs during internal cleanup, we will continue the cleanup in
cleanupAllData()
and report exceptions only after all cleanup steps have been attempted.- Throws:
Exception
- when do the cleanup operation on external storage.
-
internalCleanupJobData
protected abstract void internalCleanupJobData(org.apache.flink.api.common.JobID jobID) throws Exception
Clean up the meta data in the distributed system(e.g. Zookeeper, Kubernetes ConfigMap) for the specified Job. Method implementations need to be thread-safe.- Parameters:
jobID
- The identifier of the job to cleanup.- Throws:
Exception
- when do the cleanup operation on external storage.
-
getLeaderPathForResourceManager
protected abstract String getLeaderPathForResourceManager()
Get the leader path for ResourceManager.- Returns:
- Return the ResourceManager leader name. It is ConfigMap name in Kubernetes or child node path in Zookeeper.
-
getLeaderPathForDispatcher
protected abstract String getLeaderPathForDispatcher()
Get the leader path for Dispatcher.- Returns:
- Return the Dispatcher leader name. It is ConfigMap name in Kubernetes or child node path in Zookeeper.
-
getLeaderPathForJobManager
protected abstract String getLeaderPathForJobManager(org.apache.flink.api.common.JobID jobID)
Get the leader path for specific JobManager.- Parameters:
jobID
- job id- Returns:
- Return the JobManager leader name for specified job id. It is ConfigMap name in Kubernetes or child node path in Zookeeper.
-
getLeaderPathForRestServer
protected abstract String getLeaderPathForRestServer()
Get the leader path for RestServer.- Returns:
- Return the RestServer leader name. It is ConfigMap name in Kubernetes or child node path in Zookeeper.
-
-