Packages

package root

Definition Classes: root

package org

Definition Classes: root

package apache

Definition Classes: org

package spark

Core Spark functionality.

Core Spark functionality. org.apache.spark.SparkContext serves as the main entry point to Spark, while org.apache.spark.rdd.RDD is the data type representing a distributed collection, and provides most parallel operations.

In addition, org.apache.spark.rdd.PairRDDFunctions contains operations available only on RDDs of key-value pairs, such as groupByKey and join; org.apache.spark.rdd.DoubleRDDFunctions contains operations available only on RDDs of Doubles; and org.apache.spark.rdd.SequenceFileRDDFunctions contains operations available on RDDs that can be saved as SequenceFiles. These operations are automatically available on any RDD of the right type (e.g. RDD[(Int, Int)] through implicit conversions.

Java programmers should reference the org.apache.spark.api.java package for Spark programming APIs in Java.

Classes and methods marked with Experimental are user-facing features which have not been officially adopted by the Spark project. These are subject to change or removal in minor releases.

Classes and methods marked with Developer API are intended for advanced users want to extend Spark through lower level interfaces. These are subject to changes or removal in minor releases.

Definition Classes: apache

package api

Definition Classes: spark

package broadcast

Spark's broadcast variables, used to broadcast immutable datasets to all nodes.

Definition Classes: spark

package deploy

Definition Classes: spark

package executor

Executor components used with various cluster managers.

Executor components used with various cluster managers. See org.apache.spark.executor.Executor.

Definition Classes: spark

package input

Definition Classes: spark

package internal

Definition Classes: spark

package io

IO codecs used for compression.

IO codecs used for compression. See org.apache.spark.io.CompressionCodec.

Definition Classes: spark

CompressionCodec

LZ4CompressionCodec

LZFCompressionCodec

NioBufferedFileInputStream

ReadAheadInputStream

SnappyCompressionCodec

ZStdCompressionCodec

package mapred

Definition Classes: spark

package memory

This package implements Spark's memory management system.

This package implements Spark's memory management system. This system consists of two main components, a JVM-wide memory manager and a per-task manager:

org.apache.spark.memory.MemoryManager manages Spark's overall memory usage within a JVM. This component implements the policies for dividing the available memory across tasks and for allocating memory between storage (memory used caching and data transfer) and execution (memory used by computations, such as shuffles, joins, sorts, and aggregations).
org.apache.spark.memory.TaskMemoryManager manages the memory allocated by individual tasks. Tasks interact with TaskMemoryManager and never directly interact with the JVM-wide MemoryManager.

Internally, each of these components have additional abstractions for memory bookkeeping:

org.apache.spark.memory.MemoryConsumers are clients of the TaskMemoryManager and correspond to individual operators and data structures within a task. The TaskMemoryManager receives memory allocation requests from MemoryConsumers and issues callbacks to consumers in order to trigger spilling when running low on memory.
org.apache.spark.memory.MemoryPools are a bookkeeping abstraction used by the MemoryManager to track the division of memory between storage and execution.

Diagrammatically:

                                                       +---------------------------+
+-------------+                                        |       MemoryManager       |
| MemConsumer |----+                                   |                           |
+-------------+    |    +-------------------+          |  +---------------------+  |
                   +--->| TaskMemoryManager |----+     |  |OnHeapStorageMemPool |  |
+-------------+    |    +-------------------+    |     |  +---------------------+  |
| MemConsumer |----+                             |     |                           |
+-------------+         +-------------------+    |     |  +---------------------+  |
                        | TaskMemoryManager |----+     |  |OffHeapStorageMemPool|  |
                        +-------------------+    |     |  +---------------------+  |
                                                 +---->|                           |
                                 *               |     |  +---------------------+  |
                                 *               |     |  |OnHeapExecMemPool    |  |
+-------------+                  *               |     |  +---------------------+  |
| MemConsumer |----+                             |     |                           |
+-------------+    |    +-------------------+    |     |  +---------------------+  |
                   +--->| TaskMemoryManager |----+     |  |OffHeapExecMemPool   |  |
                        +-------------------+          |  +---------------------+  |
                                                       |                           |
                                                       +---------------------------+

There is one implementation of org.apache.spark.memory.MemoryManager:

org.apache.spark.memory.UnifiedMemoryManager enforces soft boundaries between storage and execution memory, allowing requests for memory in one region to be fulfilled by borrowing memory from the other.

Definition Classes: spark

package metrics

Definition Classes: spark

package network

Definition Classes: spark

package partial

Support for approximate results.

Support for approximate results. This provides convenient api and also implementation for approximate calculation.

Definition Classes: spark
See also: org.apache.spark.rdd.RDD.countApprox

package rdd

Provides several RDD implementations.

Provides several RDD implementations. See org.apache.spark.rdd.RDD.

Definition Classes: spark

package resource

Definition Classes: spark

package scheduler

Spark's scheduling components.

Spark's scheduling components. This includes the org.apache.spark.scheduler.DAGScheduler and lower level org.apache.spark.scheduler.TaskScheduler.

Definition Classes: spark

package security

Definition Classes: spark

package serializer

Pluggable serializers for RDD and shuffle data.

Definition Classes: spark
See also: org.apache.spark.serializer.Serializer

package shuffle

Definition Classes: spark

package status

Definition Classes: spark

package storage

Definition Classes: spark

package unsafe

Definition Classes: spark

package util

Spark utilities.

Definition Classes: spark

org.apache.spark

io

package io

IO codecs used for compression. See org.apache.spark.io.CompressionCodec.

Linear Supertypes

AnyRef, Any

Type Members

trait CompressionCodec extends AnyRef
:: DeveloperApi :: CompressionCodec allows the customization of choosing different compression implementations to be used in block storage.
:: DeveloperApi :: CompressionCodec allows the customization of choosing different compression implementations to be used in block storage.

Annotations
@DeveloperApi()
Note
The wire protocol for a codec is not guaranteed compatible across versions of Spark. This is intended for use as an internal compression utility within a single Spark application.
class LZ4CompressionCodec extends CompressionCodec
:: DeveloperApi :: LZ4 implementation of org.apache.spark.io.CompressionCodec.
:: DeveloperApi :: LZ4 implementation of org.apache.spark.io.CompressionCodec. Block size can be configured by spark.io.compression.lz4.blockSize.

Annotations
@DeveloperApi()
Note
The wire protocol for this codec is not guaranteed to be compatible across versions of Spark. This is intended for use as an internal compression utility within a single Spark application.
class LZFCompressionCodec extends CompressionCodec
:: DeveloperApi :: LZF implementation of org.apache.spark.io.CompressionCodec.
:: DeveloperApi :: LZF implementation of org.apache.spark.io.CompressionCodec.

Annotations
@DeveloperApi()
Note
The wire protocol for this codec is not guaranteed to be compatible across versions of Spark. This is intended for use as an internal compression utility within a single Spark application.
final class NioBufferedFileInputStream extends InputStream
InputStream implementation which uses direct buffer to read a file to avoid extra copy of data between Java and native memory which happens when using java.io.BufferedInputStream.
InputStream implementation which uses direct buffer to read a file to avoid extra copy of data between Java and native memory which happens when using java.io.BufferedInputStream. Unfortunately, this is not something already available in JDK, sun.nio.ch.ChannelInputStream supports reading a file using nio, but does not support buffering.
class ReadAheadInputStream extends InputStream
InputStream implementation which asynchronously reads ahead from the underlying input stream when specified amount of data has been read from the current buffer.
InputStream implementation which asynchronously reads ahead from the underlying input stream when specified amount of data has been read from the current buffer. It does it by maintaining two buffers - active buffer and read ahead buffer. Active buffer contains data which should be returned when a read() call is issued. The read ahead buffer is used to asynchronously read from the underlying input stream and once the current active buffer is exhausted, we flip the two buffers so that we can start reading from the read ahead buffer without being blocked in disk I/O.
class SnappyCompressionCodec extends CompressionCodec
:: DeveloperApi :: Snappy implementation of org.apache.spark.io.CompressionCodec.
:: DeveloperApi :: Snappy implementation of org.apache.spark.io.CompressionCodec. Block size can be configured by spark.io.compression.snappy.blockSize.

Annotations
@DeveloperApi()
Note
The wire protocol for this codec is not guaranteed to be compatible across versions of Spark. This is intended for use as an internal compression utility within a single Spark application.
class ZStdCompressionCodec extends CompressionCodec
:: DeveloperApi :: ZStandard implementation of org.apache.spark.io.CompressionCodec.
:: DeveloperApi :: ZStandard implementation of org.apache.spark.io.CompressionCodec. For more details see - http://facebook.github.io/zstd/

Annotations
@DeveloperApi()
Note
The wire protocol for this codec is not guaranteed to be compatible across versions of Spark. This is intended for use as an internal compression utility within a single Spark application.

Packages

io

package io

Type Members

Inherited from AnyRef

Inherited from Any

Ungrouped

Packages

io 

package io

Type Members

Inherited from AnyRef

Inherited from Any

Ungrouped

io