Packages

  • package root
    Definition Classes
    root
  • package org
    Definition Classes
    root
  • package apache
    Definition Classes
    org
  • package spark

    Core Spark functionality.

    Core Spark functionality. org.apache.spark.SparkContext serves as the main entry point to Spark, while org.apache.spark.rdd.RDD is the data type representing a distributed collection, and provides most parallel operations.

    In addition, org.apache.spark.rdd.PairRDDFunctions contains operations available only on RDDs of key-value pairs, such as groupByKey and join; org.apache.spark.rdd.DoubleRDDFunctions contains operations available only on RDDs of Doubles; and org.apache.spark.rdd.SequenceFileRDDFunctions contains operations available on RDDs that can be saved as SequenceFiles. These operations are automatically available on any RDD of the right type (e.g. RDD[(Int, Int)] through implicit conversions.

    Java programmers should reference the org.apache.spark.api.java package for Spark programming APIs in Java.

    Classes and methods marked with Experimental are user-facing features which have not been officially adopted by the Spark project. These are subject to change or removal in minor releases.

    Classes and methods marked with Developer API are intended for advanced users want to extend Spark through lower level interfaces. These are subject to changes or removal in minor releases.

    Definition Classes
    apache
  • package memory

    This package implements Spark's memory management system.

    This package implements Spark's memory management system. This system consists of two main components, a JVM-wide memory manager and a per-task manager:

    • org.apache.spark.memory.MemoryManager manages Spark's overall memory usage within a JVM. This component implements the policies for dividing the available memory across tasks and for allocating memory between storage (memory used caching and data transfer) and execution (memory used by computations, such as shuffles, joins, sorts, and aggregations).
    • org.apache.spark.memory.TaskMemoryManager manages the memory allocated by individual tasks. Tasks interact with TaskMemoryManager and never directly interact with the JVM-wide MemoryManager.

    Internally, each of these components have additional abstractions for memory bookkeeping:

    • org.apache.spark.memory.MemoryConsumers are clients of the TaskMemoryManager and correspond to individual operators and data structures within a task. The TaskMemoryManager receives memory allocation requests from MemoryConsumers and issues callbacks to consumers in order to trigger spilling when running low on memory.
    • org.apache.spark.memory.MemoryPools are a bookkeeping abstraction used by the MemoryManager to track the division of memory between storage and execution.

    Diagrammatically:

                                                           +---------------------------+
    +-------------+                                        |       MemoryManager       |
    | MemConsumer |----+                                   |                           |
    +-------------+    |    +-------------------+          |  +---------------------+  |
                       +--->| TaskMemoryManager |----+     |  |OnHeapStorageMemPool |  |
    +-------------+    |    +-------------------+    |     |  +---------------------+  |
    | MemConsumer |----+                             |     |                           |
    +-------------+         +-------------------+    |     |  +---------------------+  |
                            | TaskMemoryManager |----+     |  |OffHeapStorageMemPool|  |
                            +-------------------+    |     |  +---------------------+  |
                                                     +---->|                           |
                                     *               |     |  +---------------------+  |
                                     *               |     |  |OnHeapExecMemPool    |  |
    +-------------+                  *               |     |  +---------------------+  |
    | MemConsumer |----+                             |     |                           |
    +-------------+    |    +-------------------+    |     |  +---------------------+  |
                       +--->| TaskMemoryManager |----+     |  |OffHeapExecMemPool   |  |
                            +-------------------+          |  +---------------------+  |
                                                           |                           |
                                                           +---------------------------+

    There is one implementation of org.apache.spark.memory.MemoryManager:

    • org.apache.spark.memory.UnifiedMemoryManager enforces soft boundaries between storage and execution memory, allowing requests for memory in one region to be fulfilled by borrowing memory from the other.
    Definition Classes
    spark
  • MemoryConsumer
  • MemoryMode
  • SparkOutOfMemoryError
  • TaskMemoryManager
  • TooLargePageException
  • UnifiedMemoryManager
c

org.apache.spark.memory

TaskMemoryManager

class TaskMemoryManager extends AnyRef

Manages the memory allocated by an individual task.

Most of the complexity in this class deals with encoding of off-heap addresses into 64-bit longs. In off-heap mode, memory can be directly addressed with 64-bit longs. In on-heap mode, memory is addressed by the combination of a base Object reference and a 64-bit offset within that object. This is a problem when we want to store pointers to data structures inside of other structures, such as record pointers inside hashmaps or sorting buffers. Even if we decided to use 128 bits to address memory, we can't just store the address of the base object since it's not guaranteed to remain stable as the heap gets reorganized due to GC.

Instead, we use the following approach to encode record pointers in 64-bit longs: for off-heap mode, just store the raw address, and for on-heap mode use the upper 13 bits of the address to store a "page number" and the lower 51 bits to store an offset within this page. These page numbers are used to index into a "page table" array inside of the MemoryManager in order to retrieve the base object.

This allows us to address 8192 pages. In on-heap mode, the maximum page size is limited by the maximum size of a long[] array, allowing us to address 8192 * (2^31 - 1) * 8 bytes, which is approximately 140 terabytes of memory.

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. TaskMemoryManager
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new TaskMemoryManager(memoryManager: MemoryManager, taskAttemptId: Long)

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. def acquireExecutionMemory(required: Long, consumer: MemoryConsumer): Long

    Acquire N bytes of memory for a consumer.

    Acquire N bytes of memory for a consumer. If there is no enough memory, it will call spill() of consumers to release more memory.

    returns

    number of bytes successfully granted (<= N).

  5. def allocatePage(size: Long, consumer: MemoryConsumer): MemoryBlock

    Allocate a block of memory that will be tracked in the MemoryManager's page table; this is intended for allocating large blocks of Tungsten memory that will be shared between operators.

    Allocate a block of memory that will be tracked in the MemoryManager's page table; this is intended for allocating large blocks of Tungsten memory that will be shared between operators.

    Returns null if there was not enough memory to allocate the page. May return a page that contains fewer bytes than requested, so callers should verify the size of returned pages.

    Exceptions thrown
  6. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  7. def cleanUpAllAllocatedMemory(): Long

    Clean up all allocated memory and pages.

    Clean up all allocated memory and pages. Returns the number of bytes freed. A non-zero return value can be used to detect memory leaks.

  8. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  9. def encodePageNumberAndOffset(page: MemoryBlock, offsetInPage: Long): Long

    Given a memory page and offset within that page, encode this address into a 64-bit long.

    Given a memory page and offset within that page, encode this address into a 64-bit long. This address will remain valid as long as the corresponding page has not been freed.

    page

    a data page allocated by TaskMemoryManager#allocatePage/

    offsetInPage

    an offset in this page which incorporates the base offset. In other words, this should be the value that you would pass as the base offset into an UNSAFE call (e.g. page.baseOffset() + something).

    returns

    an encoded page address.

  10. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  11. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  12. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  13. def freePage(page: MemoryBlock, consumer: MemoryConsumer): Unit

    Free a block of memory allocated via TaskMemoryManager#allocatePage.

  14. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  15. def getMemoryConsumptionForThisTask(): Long

    Returns the memory consumption, in bytes, for the current task.

  16. def getOffsetInPage(pagePlusOffsetAddress: Long): Long

    Get the offset associated with an address encoded by long)

  17. def getPage(pagePlusOffsetAddress: Long): AnyRef

    Get the page associated with an address encoded by long)

  18. def getTungstenMemoryMode(): MemoryMode

    Returns Tungsten memory mode

  19. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  20. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  21. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  22. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  23. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  24. def pageSizeBytes(): Long

    Return the page size in bytes.

  25. def releaseExecutionMemory(size: Long, consumer: MemoryConsumer): Unit

    Release N bytes of execution memory for a MemoryConsumer.

  26. def showMemoryUsage(): Unit

    Dump the memory usage of all consumers.

  27. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  28. def toString(): String
    Definition Classes
    AnyRef → Any
  29. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  30. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  31. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()

Inherited from AnyRef

Inherited from Any

Ungrouped