@Beta public interface SparkClientContext extends SchedulableProgramContext, RuntimeContext, DatasetContext, ClientLocalizationContext, Transactional, ServiceDiscoverer, PluginContext, WorkflowInfoProvider, SecureStore, MessagingContext, LineageRecorder, MetadataReader, MetadataWriter
Spark
program to interact with CDAP. This context object will be provided
to Spark
program in the ProgramLifecycle.initialize(T)
call.Modifier and Type | Method and Description |
---|---|
long |
getLogicalStartTime()
Returns the logical start time of this Spark job.
|
Metrics |
getMetrics()
Returns a
Metrics which can be used to emit custom metrics. |
SparkSpecification |
getSpecification() |
ProgramState |
getState()
Return the state of the Spark program.
|
void |
setDriverResources(Resources resources)
Sets the resources requirement for the Spark driver process.
|
void |
setExecutorResources(Resources resources)
Sets the resources, such as memory and virtual cores, to use for each executor process for the
Spark program.
|
void |
setPySparkScript(String script,
Iterable<URI> additionalPythonFiles)
Sets a python script to run using PySpark.
|
void |
setPySparkScript(String script,
URI... additionalPythonFiles)
Sets a python script to run using PySpark.
|
void |
setPySparkScript(URI scriptLocation,
Iterable<URI> additionalPythonFiles)
Sets a location that points to a python script to run using PySpark.
|
void |
setPySparkScript(URI scriptLocation,
URI... additionalPythonFiles)
Sets a location that points to a python script to run using PySpark.
|
<T> void |
setSparkConf(T sparkConf)
Sets a
SparkConf
to be used for the Spark execution.
|
getTriggeringScheduleInfo
getAdmin, getApplicationSpecification, getClusterName, getDataTracer, getNamespace, getRunId, getRuntimeArguments
discardDataset, getDataset, getDataset, getDataset, getDataset, releaseDataset
localize, localize
execute, execute
getServiceURL, getServiceURL, getServiceURL, openConnection
getPluginProperties, getPluginProperties, loadPluginClass, newPluginInstance, newPluginInstance
isFeatureEnabled
getWorkflowInfo, getWorkflowToken
get, getData, getMetadata, list
getDirectMessagePublisher, getMessageFetcher, getMessagePublisher
flushLineage, record
getMetadata, getMetadata
addProperties, addTags, addTags, removeMetadata, removeProperties, removeProperties, removeTags, removeTags
SparkSpecification getSpecification()
Spark
job instance.long getLogicalStartTime()
void setDriverResources(Resources resources)
resources
- Resources that the driver should usevoid setExecutorResources(Resources resources)
resources
- Resources that each executor should use<T> void setSparkConf(T sparkConf)
ProgramLifecycle.initialize(T)
call will affect the Spark execution.T
- the SparkConf typevoid setPySparkScript(String script, URI... additionalPythonFiles)
setPySparkScript(String, Iterable)
void setPySparkScript(String script, Iterable<URI> additionalPythonFiles)
SparkConfigurer.setMainClassName(String)
and the python script is the one always
get executed.script
- the python script to run using PySparkadditionalPythonFiles
- a list of addition python files to be included in the PYTHONPATH
. Each can be local by having file
scheme or remote, for example, by
having hdfs
or http
scheme. If the URI
has no scheme, it will be
default based on the execution environment, which would be local file in local sandbox, and
remote if running in distributed mode. Note that in distributed mode, using file
scheme means the file has to be present on every node on the cluster, since the spark
program can be submitted from any node.void setPySparkScript(URI scriptLocation, URI... additionalPythonFiles)
setPySparkScript(URI, Iterable)
void setPySparkScript(URI scriptLocation, Iterable<URI> additionalPythonFiles)
SparkConfigurer.setMainClassName(String)
and the
python script is the one always get executed.scriptLocation
- location to the python script. It can be local by having file
scheme or remote, for example, by having hdfs
or http
scheme. If the URI
has no scheme, it will be default based on the execution environment, which would be
local file in local sandbox, and remote if running in distributed mode. Note that in
distributed mode, using file
scheme means the file has to be present on every node
on the cluster, since the spark program can be submitted from any node.additionalPythonFiles
- a list of addition python files to be included in the PYTHONPATH
. Each location can be local or remote, with the same definition as the scriptLocation
parameter.ProgramState getState()
Copyright © 2024 Cask Data, Inc. Licensed under the Apache License, Version 2.0.