Packages

package root

Definition Classes: root

package org

Definition Classes: root

package apache

Definition Classes: org

package spark

Core Spark functionality.

Core Spark functionality. org.apache.spark.SparkContext serves as the main entry point to Spark, while org.apache.spark.rdd.RDD is the data type representing a distributed collection, and provides most parallel operations.

In addition, org.apache.spark.rdd.PairRDDFunctions contains operations available only on RDDs of key-value pairs, such as groupByKey and join; org.apache.spark.rdd.DoubleRDDFunctions contains operations available only on RDDs of Doubles; and org.apache.spark.rdd.SequenceFileRDDFunctions contains operations available on RDDs that can be saved as SequenceFiles. These operations are automatically available on any RDD of the right type (e.g. RDD[(Int, Int)] through implicit conversions.

Java programmers should reference the org.apache.spark.api.java package for Spark programming APIs in Java.

Classes and methods marked with Experimental are user-facing features which have not been officially adopted by the Spark project. These are subject to change or removal in minor releases.

Classes and methods marked with Developer API are intended for advanced users want to extend Spark through lower level interfaces. These are subject to changes or removal in minor releases.

Definition Classes: apache

package api

Definition Classes: spark

package broadcast

Spark's broadcast variables, used to broadcast immutable datasets to all nodes.

Definition Classes: spark

package deploy

Definition Classes: spark

package executor

Executor components used with various cluster managers.

Executor components used with various cluster managers. See org.apache.spark.executor.Executor.

Definition Classes: spark

package input

Definition Classes: spark

package internal

Definition Classes: spark

package io

IO codecs used for compression.

IO codecs used for compression. See org.apache.spark.io.CompressionCodec.

Definition Classes: spark

package mapred

Definition Classes: spark

package memory

This package implements Spark's memory management system.

This package implements Spark's memory management system. This system consists of two main components, a JVM-wide memory manager and a per-task manager:

org.apache.spark.memory.MemoryManager manages Spark's overall memory usage within a JVM. This component implements the policies for dividing the available memory across tasks and for allocating memory between storage (memory used caching and data transfer) and execution (memory used by computations, such as shuffles, joins, sorts, and aggregations).
org.apache.spark.memory.TaskMemoryManager manages the memory allocated by individual tasks. Tasks interact with TaskMemoryManager and never directly interact with the JVM-wide MemoryManager.

Internally, each of these components have additional abstractions for memory bookkeeping:

org.apache.spark.memory.MemoryConsumers are clients of the TaskMemoryManager and correspond to individual operators and data structures within a task. The TaskMemoryManager receives memory allocation requests from MemoryConsumers and issues callbacks to consumers in order to trigger spilling when running low on memory.
org.apache.spark.memory.MemoryPools are a bookkeeping abstraction used by the MemoryManager to track the division of memory between storage and execution.

Diagrammatically:

                                                       +---------------------------+
+-------------+                                        |       MemoryManager       |
| MemConsumer |----+                                   |                           |
+-------------+    |    +-------------------+          |  +---------------------+  |
                   +--->| TaskMemoryManager |----+     |  |OnHeapStorageMemPool |  |
+-------------+    |    +-------------------+    |     |  +---------------------+  |
| MemConsumer |----+                             |     |                           |
+-------------+         +-------------------+    |     |  +---------------------+  |
                        | TaskMemoryManager |----+     |  |OffHeapStorageMemPool|  |
                        +-------------------+    |     |  +---------------------+  |
                                                 +---->|                           |
                                 *               |     |  +---------------------+  |
                                 *               |     |  |OnHeapExecMemPool    |  |
+-------------+                  *               |     |  +---------------------+  |
| MemConsumer |----+                             |     |                           |
+-------------+    |    +-------------------+    |     |  +---------------------+  |
                   +--->| TaskMemoryManager |----+     |  |OffHeapExecMemPool   |  |
                        +-------------------+          |  +---------------------+  |
                                                       |                           |
                                                       +---------------------------+

There is one implementation of org.apache.spark.memory.MemoryManager:

org.apache.spark.memory.UnifiedMemoryManager enforces soft boundaries between storage and execution memory, allowing requests for memory in one region to be fulfilled by borrowing memory from the other.

Definition Classes: spark

package metrics

Definition Classes: spark

package network

Definition Classes: spark

package partial

Support for approximate results.

Support for approximate results. This provides convenient api and also implementation for approximate calculation.

Definition Classes: spark
See also: org.apache.spark.rdd.RDD.countApprox

package rdd

Provides several RDD implementations.

Provides several RDD implementations. See org.apache.spark.rdd.RDD.

Definition Classes: spark

package resource

Definition Classes: spark

ExecutorResourceRequest

ExecutorResourceRequests

ResourceDiscoveryScriptPlugin

ResourceID

ResourceInformation

ResourceProfile

ResourceProfileBuilder

ResourceRequest

TaskResourceRequest

TaskResourceRequests

package scheduler

Spark's scheduling components.

Spark's scheduling components. This includes the org.apache.spark.scheduler.DAGScheduler and lower level org.apache.spark.scheduler.TaskScheduler.

Definition Classes: spark

package security

Definition Classes: spark

package serializer

Pluggable serializers for RDD and shuffle data.

Definition Classes: spark
See also: org.apache.spark.serializer.Serializer

package shuffle

Definition Classes: spark

package status

Definition Classes: spark

package storage

Definition Classes: spark

package unsafe

Definition Classes: spark

package util

Spark utilities.

Definition Classes: spark

org.apache.spark

resource

package resource

Ordering

Alphabetic

Visibility

Public
All

Type Members

class ExecutorResourceRequest extends Serializable
An Executor resource request.
An Executor resource request. This is used in conjunction with the ResourceProfile to programmatically specify the resources needed for an RDD that will be applied at the stage level.
This is used to specify what the resource requirements are for an Executor and how Spark can find out specific details about those resources. Not all the parameters are required for every resource type. Resources like GPUs are supported and have same limitations as using the global spark configs spark.executor.resource.gpu.*. The amount, discoveryScript, and vendor parameters for resources are all the same parameters a user would specify through the configs: spark.executor.resource.{resourceName}.{amount, discoveryScript, vendor}.
For instance, a user wants to allocate an Executor with GPU resources on YARN. The user has to specify the resource name (gpu), the amount or number of GPUs per Executor, the discovery script would be specified so that when the Executor starts up it can discovery what GPU addresses are available for it to use because YARN doesn't tell Spark that, then vendor would not be used because its specific for Kubernetes.
See the configuration and cluster specific docs for more details.
Use ExecutorResourceRequests class as a convenience API.

Annotations
@Evolving() @Since( "3.1.0" )
class ExecutorResourceRequests extends Serializable
A set of Executor resource requests.
A set of Executor resource requests. This is used in conjunction with the ResourceProfile to programmatically specify the resources needed for an RDD that will be applied at the stage level.

Annotations
@Evolving() @Since( "3.1.0" )
class ResourceDiscoveryScriptPlugin extends ResourceDiscoveryPlugin with Logging
The default plugin that is loaded into a Spark application to control how custom resources are discovered.
The default plugin that is loaded into a Spark application to control how custom resources are discovered. This executes the discovery script specified by the user and gets the json output back and constructs ResourceInformation objects from that. If the user specifies custom plugins, this is the last one to be executed and throws if the resource isn't discovered.

Annotations
@DeveloperApi()
Since
3.0.0
class ResourceID extends AnyRef
Resource identifier.
Resource identifier.

Annotations
@DeveloperApi()
Since
3.0.0
class ResourceInformation extends Serializable
Class to hold information about a type of Resource.
Class to hold information about a type of Resource. A resource could be a GPU, FPGA, etc. The array of addresses are resource specific and its up to the user to interpret the address.
One example is GPUs, where the addresses would be the indices of the GPUs

Annotations
@Evolving()
Since
3.0.0
class ResourceProfile extends Serializable with Logging
Resource profile to associate with an RDD.
Resource profile to associate with an RDD. A ResourceProfile allows the user to specify executor and task requirements for an RDD that will get applied during a stage. This allows the user to change the resource requirements between stages. This is meant to be immutable so user can't change it after building. Users should use ResourceProfileBuilder to build it.

Annotations
@Evolving() @Since( "3.1.0" )
class ResourceProfileBuilder extends AnyRef
Resource profile builder to build a ResourceProfile to associate with an RDD.
Resource profile builder to build a ResourceProfile to associate with an RDD. A ResourceProfile allows the user to specify executor and task resource requirements for an RDD that will get applied during a stage. This allows the user to change the resource requirements between stages.

Annotations
@Evolving() @Since( "3.1.0" )
class ResourceRequest extends AnyRef
Class that represents a resource request.
Class that represents a resource request.
The class used when discovering resources (using the discovery script), or via the context as it is parsing configuration for the ResourceID.

Annotations
@DeveloperApi()
Since
3.0.0
class TaskResourceRequest extends Serializable
A task resource request.
A task resource request. This is used in conjunction with the ResourceProfile to programmatically specify the resources needed for an RDD that will be applied at the stage level.
Use TaskResourceRequests class as a convenience API.

Annotations
@Evolving() @Since( "3.1.0" )
class TaskResourceRequests extends Serializable
A set of task resource requests.
A set of task resource requests. This is used in conjunction with the ResourceProfile to programmatically specify the resources needed for an RDD that will be applied at the stage level.

Annotations
@Evolving() @Since( "3.1.0" )

Value Members

object ResourceProfile extends Logging with Serializable

Packages

resource 

package resource

Type Members

Value Members

Ungrouped

resource