Class JobGraph

  • All Implemented Interfaces:
    Serializable, ExecutionPlan

    public class JobGraph
    extends Object
    implements ExecutionPlan
    The JobGraph represents a Flink dataflow program, at the low level that the JobManager accepts. All programs from higher level APIs are transformed into JobGraphs.

    The JobGraph is a graph of vertices and intermediate results that are connected together to form a DAG. Note that iterations (feedback edges) are currently not encoded inside the JobGraph but inside certain special vertices that establish the feedback channel amongst themselves.

    The JobGraph defines the job-wide configuration settings, while each vertex and intermediate result define the characteristics of the concrete operation and intermediate data.

    See Also:
    Serialized Form
    • Constructor Detail

      • JobGraph

        public JobGraph​(String jobName)
        Constructs a new job graph with the given name, the given ExecutionConfig, and a random job ID. The ExecutionConfig will be serialized and can't be modified afterwards.
        Parameters:
        jobName - The name of the job.
      • JobGraph

        public JobGraph​(@Nullable
                        org.apache.flink.api.common.JobID jobId,
                        String jobName)
        Constructs a new job graph with the given job ID (or a random ID, if null is passed), the given name and the given execution configuration (see ExecutionConfig). The ExecutionConfig will be serialized and can't be modified afterwards.
        Parameters:
        jobId - The id of the job. A random ID is generated, if null is passed.
        jobName - The name of the job.
      • JobGraph

        public JobGraph​(@Nullable
                        org.apache.flink.api.common.JobID jobId,
                        String jobName,
                        JobVertex... vertices)
        Constructs a new job graph with the given name, the given ExecutionConfig, the given jobId or a random one if null supplied, and the given job vertices. The ExecutionConfig will be serialized and can't be modified afterwards.
        Parameters:
        jobId - The id of the job. A random ID is generated, if null is passed.
        jobName - The name of the job.
        vertices - The vertices to add to the graph.
    • Method Detail

      • getJobID

        public org.apache.flink.api.common.JobID getJobID()
        Returns the ID of the job.
        Specified by:
        getJobID in interface ExecutionPlan
        Returns:
        the ID of the job
      • setJobID

        public void setJobID​(org.apache.flink.api.common.JobID jobID)
        Sets the ID of the job.
      • getName

        public String getName()
        Returns the name assigned to the job graph.
        Specified by:
        getName in interface ExecutionPlan
        Returns:
        the name assigned to the job graph
      • isPartialResourceConfigured

        public boolean isPartialResourceConfigured()
        Description copied from interface: ExecutionPlan
        Checks if partial resource configuration is specified.
        Specified by:
        isPartialResourceConfigured in interface ExecutionPlan
        Returns:
        true if partial resource configuration is set; false otherwise
      • isEmpty

        public boolean isEmpty()
        Description copied from interface: ExecutionPlan
        Checks if the execution plan is empty.
        Specified by:
        isEmpty in interface ExecutionPlan
        Returns:
        true if the plan is empty; false otherwise
      • setJobConfiguration

        public void setJobConfiguration​(org.apache.flink.configuration.Configuration jobConfiguration)
      • getJobConfiguration

        public org.apache.flink.configuration.Configuration getJobConfiguration()
        Returns the configuration object for this job. Job-wide parameters should be set into that configuration object.
        Specified by:
        getJobConfiguration in interface ExecutionPlan
        Returns:
        The configuration object for this job.
      • getSerializedExecutionConfig

        public org.apache.flink.util.SerializedValue<org.apache.flink.api.common.ExecutionConfig> getSerializedExecutionConfig()
        Returns the ExecutionConfig.
        Specified by:
        getSerializedExecutionConfig in interface ExecutionPlan
        Returns:
        ExecutionConfig
      • setJobType

        public void setJobType​(JobType type)
      • setDynamic

        public void setDynamic​(boolean dynamic)
      • isDynamic

        public boolean isDynamic()
        Description copied from interface: ExecutionPlan
        Checks if the execution plan is dynamic.
        Specified by:
        isDynamic in interface ExecutionPlan
        Returns:
        true if the execution plan is dynamic; false otherwise
      • enableApproximateLocalRecovery

        public void enableApproximateLocalRecovery​(boolean enabled)
      • isApproximateLocalRecoveryEnabled

        public boolean isApproximateLocalRecoveryEnabled()
      • setExecutionConfig

        public void setExecutionConfig​(org.apache.flink.api.common.ExecutionConfig executionConfig)
                                throws IOException
        Sets the execution config. This method eagerly serialized the ExecutionConfig for future RPC transport. Further modification of the referenced ExecutionConfig object will not affect this serialized copy.
        Parameters:
        executionConfig - The ExecutionConfig to be serialized.
        Throws:
        IOException - Thrown if the serialization of the ExecutionConfig fails
      • setSerializedExecutionConfig

        public void setSerializedExecutionConfig​(org.apache.flink.util.SerializedValue<org.apache.flink.api.common.ExecutionConfig> serializedExecutionConfig)
      • addVertex

        public void addVertex​(JobVertex vertex)
        Adds a new task vertex to the job graph if it is not already included.
        Parameters:
        vertex - the new task vertex to be added
      • getVertices

        public Iterable<JobVertex> getVertices()
        Returns an Iterable to iterate all vertices registered with the job graph.
        Returns:
        an Iterable to iterate all vertices registered with the job graph
      • getVerticesAsArray

        public JobVertex[] getVerticesAsArray()
        Returns an array of all job vertices that are registered with the job graph. The order in which the vertices appear in the list is not defined.
        Returns:
        an array of all job vertices that are registered with the job graph
      • getNumberOfVertices

        public int getNumberOfVertices()
        Returns the number of all vertices.
        Returns:
        The number of all vertices.
      • getCoLocationGroups

        public Set<CoLocationGroup> getCoLocationGroups()
        Returns all CoLocationGroup instances associated with this JobGraph.
        Returns:
        The associated CoLocationGroup instances.
      • setSnapshotSettings

        public void setSnapshotSettings​(JobCheckpointingSettings settings)
        Sets the settings for asynchronous snapshots. A value of null means that snapshotting is not enabled.
        Parameters:
        settings - The snapshot settings
      • findVertexByID

        public JobVertex findVertexByID​(JobVertexID id)
        Searches for a vertex with a matching ID and returns it.
        Parameters:
        id - the ID of the vertex to search for
        Returns:
        the vertex with the matching ID or null if no vertex with such ID could be found
      • setClasspaths

        public void setClasspaths​(List<URL> paths)
        Sets the classpaths required to run the job on a task manager.
        Parameters:
        paths - paths of the directories/JAR files required to run the job on a task manager
      • getClasspaths

        public List<URL> getClasspaths()
        Description copied from interface: ExecutionPlan
        Gets the classpath required for the job.
        Specified by:
        getClasspaths in interface ExecutionPlan
        Returns:
        a list of classpath URLs
      • getMaximumParallelism

        public int getMaximumParallelism()
        Gets the maximum parallelism of all operations in this job graph.
        Specified by:
        getMaximumParallelism in interface ExecutionPlan
        Returns:
        The maximum parallelism of this job graph
      • getVerticesSortedTopologicallyFromSources

        public List<JobVertex> getVerticesSortedTopologicallyFromSources()
                                                                  throws org.apache.flink.api.common.InvalidProgramException
        Throws:
        org.apache.flink.api.common.InvalidProgramException
      • addJar

        public void addJar​(org.apache.flink.core.fs.Path jar)
        Adds the path of a JAR file required to run the job on a task manager.
        Parameters:
        jar - path of the JAR file required to run the job on a task manager
      • getUserJars

        public List<org.apache.flink.core.fs.Path> getUserJars()
        Gets the list of assigned user jar paths.
        Specified by:
        getUserJars in interface ExecutionPlan
        Returns:
        The list of assigned user jar paths
      • addUserArtifact

        public void addUserArtifact​(String name,
                                    org.apache.flink.api.common.cache.DistributedCache.DistributedCacheEntry file)
        Adds the path of a custom file required to run the job on a task manager.
        Parameters:
        name - a name under which this artifact will be accessible through DistributedCache
        file - path of a custom file required to run the job on a task manager
      • getUserArtifacts

        public Map<String,​org.apache.flink.api.common.cache.DistributedCache.DistributedCacheEntry> getUserArtifacts()
        Gets the list of assigned user jar paths.
        Specified by:
        getUserArtifacts in interface ExecutionPlan
        Returns:
        The list of assigned user jar paths
      • addUserJarBlobKey

        public void addUserJarBlobKey​(PermanentBlobKey key)
        Adds the BLOB referenced by the key to the JobGraph's dependencies.
        Specified by:
        addUserJarBlobKey in interface ExecutionPlan
        Parameters:
        key - path of the JAR file required to run the job on a task manager
      • hasUsercodeJarFiles

        public boolean hasUsercodeJarFiles()
        Checks whether the JobGraph has user code JAR files attached.
        Returns:
        True, if the JobGraph has user code JAR files attached, false otherwise.
      • getUserJarBlobKeys

        public List<PermanentBlobKey> getUserJarBlobKeys()
        Returns a set of BLOB keys referring to the JAR files required to run this job.
        Specified by:
        getUserJarBlobKeys in interface ExecutionPlan
        Returns:
        set of BLOB keys referring to the JAR files required to run this job
      • setUserArtifactBlobKey

        public void setUserArtifactBlobKey​(String entryName,
                                           PermanentBlobKey blobKey)
                                    throws IOException
        Description copied from interface: ExecutionPlan
        Sets a user artifact blob key for a specified user artifact.
        Specified by:
        setUserArtifactBlobKey in interface ExecutionPlan
        Parameters:
        entryName - the name of the user artifact
        blobKey - the blob key corresponding to the user artifact
        Throws:
        IOException - if an error occurs during the operation
      • setUserArtifactRemotePath

        public void setUserArtifactRemotePath​(String entryName,
                                              String remotePath)
      • setJobStatusHooks

        public void setJobStatusHooks​(List<org.apache.flink.core.execution.JobStatusHook> hooks)
      • getJobStatusHooks

        public List<org.apache.flink.core.execution.JobStatusHook> getJobStatusHooks()
      • setInitialClientHeartbeatTimeout

        public void setInitialClientHeartbeatTimeout​(long initialClientHeartbeatTimeout)
      • getInitialClientHeartbeatTimeout

        public long getInitialClientHeartbeatTimeout()
        Description copied from interface: ExecutionPlan
        Gets the initial client heartbeat timeout.
        Specified by:
        getInitialClientHeartbeatTimeout in interface ExecutionPlan
        Returns:
        the timeout duration in milliseconds