Spark Project Core 3.1.1-hadoop-2.7 API < Back

Packages

package root

Definition Classes
root
package org

Definition Classes
root
package apache

Definition Classes
org
package spark
Core Spark functionality.
Core Spark functionality. org.apache.spark.SparkContext serves as the main entry point to Spark, while org.apache.spark.rdd.RDD is the data type representing a distributed collection, and provides most parallel operations.
In addition, org.apache.spark.rdd.PairRDDFunctions contains operations available only on RDDs of key-value pairs, such as groupByKey and join; org.apache.spark.rdd.DoubleRDDFunctions contains operations available only on RDDs of Doubles; and org.apache.spark.rdd.SequenceFileRDDFunctions contains operations available on RDDs that can be saved as SequenceFiles. These operations are automatically available on any RDD of the right type (e.g. RDD[(Int, Int)] through implicit conversions.
Java programmers should reference the org.apache.spark.api.java package for Spark programming APIs in Java.
Classes and methods marked with Experimental are user-facing features which have not been officially adopted by the Spark project. These are subject to change or removal in minor releases.
Classes and methods marked with Developer API are intended for advanced users want to extend Spark through lower level interfaces. These are subject to changes or removal in minor releases.

Definition Classes
apache

package memory

This package implements Spark's memory management system.

This package implements Spark's memory management system. This system consists of two main components, a JVM-wide memory manager and a per-task manager:

org.apache.spark.memory.MemoryManager manages Spark's overall memory usage within a JVM. This component implements the policies for dividing the available memory across tasks and for allocating memory between storage (memory used caching and data transfer) and execution (memory used by computations, such as shuffles, joins, sorts, and aggregations).
org.apache.spark.memory.TaskMemoryManager manages the memory allocated by individual tasks. Tasks interact with TaskMemoryManager and never directly interact with the JVM-wide MemoryManager.

Internally, each of these components have additional abstractions for memory bookkeeping:

org.apache.spark.memory.MemoryConsumers are clients of the TaskMemoryManager and correspond to individual operators and data structures within a task. The TaskMemoryManager receives memory allocation requests from MemoryConsumers and issues callbacks to consumers in order to trigger spilling when running low on memory.
org.apache.spark.memory.MemoryPools are a bookkeeping abstraction used by the MemoryManager to track the division of memory between storage and execution.

Diagrammatically:

                                                       +---------------------------+
+-------------+                                        |       MemoryManager       |
| MemConsumer |----+                                   |                           |
+-------------+    |    +-------------------+          |  +---------------------+  |
                   +--->| TaskMemoryManager |----+     |  |OnHeapStorageMemPool |  |
+-------------+    |    +-------------------+    |     |  +---------------------+  |
| MemConsumer |----+                             |     |                           |
+-------------+         +-------------------+    |     |  +---------------------+  |
                        | TaskMemoryManager |----+     |  |OffHeapStorageMemPool|  |
                        +-------------------+    |     |  +---------------------+  |
                                                 +---->|                           |
                                 *               |     |  +---------------------+  |
                                 *               |     |  |OnHeapExecMemPool    |  |
+-------------+                  *               |     |  +---------------------+  |
| MemConsumer |----+                             |     |                           |
+-------------+    |    +-------------------+    |     |  +---------------------+  |
                   +--->| TaskMemoryManager |----+     |  |OffHeapExecMemPool   |  |
                        +-------------------+          |  +---------------------+  |
                                                       |                           |
                                                       +---------------------------+

There is one implementation of org.apache.spark.memory.MemoryManager:

org.apache.spark.memory.UnifiedMemoryManager enforces soft boundaries between storage and execution memory, allowing requests for memory in one region to be fulfilled by borrowing memory from the other.

Definition Classes: spark

MemoryConsumer
MemoryMode
SparkOutOfMemoryError
TaskMemoryManager
TooLargePageException
UnifiedMemoryManager

org.apache.spark.memory

TaskMemoryManager

class TaskMemoryManager extends AnyRef

Manages the memory allocated by an individual task.

Most of the complexity in this class deals with encoding of off-heap addresses into 64-bit longs. In off-heap mode, memory can be directly addressed with 64-bit longs. In on-heap mode, memory is addressed by the combination of a base Object reference and a 64-bit offset within that object. This is a problem when we want to store pointers to data structures inside of other structures, such as record pointers inside hashmaps or sorting buffers. Even if we decided to use 128 bits to address memory, we can't just store the address of the base object since it's not guaranteed to remain stable as the heap gets reorganized due to GC.

Instead, we use the following approach to encode record pointers in 64-bit longs: for off-heap mode, just store the raw address, and for on-heap mode use the upper 13 bits of the address to store a "page number" and the lower 51 bits to store an offset within this page. These page numbers are used to index into a "page table" array inside of the MemoryManager in order to retrieve the base object.

This allows us to address 8192 pages. In on-heap mode, the maximum page size is limited by the maximum size of a long[] array, allowing us to address 8192 * (2^31 - 1) * 8 bytes, which is approximately 140 terabytes of memory.

Linear Supertypes

AnyRef, Any

Ordering

Alphabetic
By Inheritance

Inherited

TaskMemoryManager
AnyRef
Any

Hide All
Show All

Visibility

Public
All

Instance Constructors

new TaskMemoryManager(memoryManager: MemoryManager, taskAttemptId: Long)

Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def acquireExecutionMemory(required: Long, consumer: MemoryConsumer): Long
Acquire N bytes of memory for a consumer.
Acquire N bytes of memory for a consumer. If there is no enough memory, it will call spill() of consumers to release more memory.
returns
number of bytes successfully granted (<= N).
def allocatePage(size: Long, consumer: MemoryConsumer): MemoryBlock
Allocate a block of memory that will be tracked in the MemoryManager's page table; this is intended for allocating large blocks of Tungsten memory that will be shared between operators.
Allocate a block of memory that will be tracked in the MemoryManager's page table; this is intended for allocating large blocks of Tungsten memory that will be shared between operators.
Returns null if there was not enough memory to allocate the page. May return a page that contains fewer bytes than requested, so callers should verify the size of returned pages.

Exceptions thrown
final def asInstanceOf[T0]: T0

Definition Classes
Any
def cleanUpAllAllocatedMemory(): Long
Clean up all allocated memory and pages.
Clean up all allocated memory and pages. Returns the number of bytes freed. A non-zero return value can be used to detect memory leaks.
def clone(): AnyRef

Attributes
protected[lang]
Definition Classes
AnyRef
Annotations
@throws( ... ) @native()
def encodePageNumberAndOffset(page: MemoryBlock, offsetInPage: Long): Long
Given a memory page and offset within that page, encode this address into a 64-bit long.
Given a memory page and offset within that page, encode this address into a 64-bit long. This address will remain valid as long as the corresponding page has not been freed.
page
a data page allocated by TaskMemoryManager#allocatePage/
offsetInPage
an offset in this page which incorporates the base offset. In other words, this should be the value that you would pass as the base offset into an UNSAFE call (e.g. page.baseOffset() + something).
returns
an encoded page address.
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def finalize(): Unit

Attributes
protected[lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
def freePage(page: MemoryBlock, consumer: MemoryConsumer): Unit
Free a block of memory allocated via TaskMemoryManager#allocatePage.
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
Annotations
@native()
def getMemoryConsumptionForThisTask(): Long
Returns the memory consumption, in bytes, for the current task.
def getOffsetInPage(pagePlusOffsetAddress: Long): Long
Get the offset associated with an address encoded by long)
def getPage(pagePlusOffsetAddress: Long): AnyRef
Get the page associated with an address encoded by long)
def getTungstenMemoryMode(): MemoryMode
Returns Tungsten memory mode
def hashCode(): Int

Definition Classes
AnyRef → Any
Annotations
@native()
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
Annotations
@native()
final def notifyAll(): Unit

Definition Classes
AnyRef
Annotations
@native()
def pageSizeBytes(): Long
Return the page size in bytes.
def releaseExecutionMemory(size: Long, consumer: MemoryConsumer): Unit
Release N bytes of execution memory for a MemoryConsumer.
def showMemoryUsage(): Unit
Dump the memory usage of all consumers.
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def toString(): String

Definition Classes
AnyRef → Any
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... ) @native()

Packages

TaskMemoryManager

class TaskMemoryManager extends AnyRef

Instance Constructors

Value Members

Inherited from AnyRef

Inherited from Any

Ungrouped

Packages

TaskMemoryManager 

class TaskMemoryManager extends AnyRef

Instance Constructors

Value Members

Inherited from AnyRef

Inherited from Any

Ungrouped

TaskMemoryManager