Packages

  • package root
    Definition Classes
    root
  • package org
    Definition Classes
    root
  • package apache
    Definition Classes
    org
  • package spark

    Core Spark functionality.

    Core Spark functionality. org.apache.spark.SparkContext serves as the main entry point to Spark, while org.apache.spark.rdd.RDD is the data type representing a distributed collection, and provides most parallel operations.

    In addition, org.apache.spark.rdd.PairRDDFunctions contains operations available only on RDDs of key-value pairs, such as groupByKey and join; org.apache.spark.rdd.DoubleRDDFunctions contains operations available only on RDDs of Doubles; and org.apache.spark.rdd.SequenceFileRDDFunctions contains operations available on RDDs that can be saved as SequenceFiles. These operations are automatically available on any RDD of the right type (e.g. RDD[(Int, Int)] through implicit conversions.

    Java programmers should reference the org.apache.spark.api.java package for Spark programming APIs in Java.

    Classes and methods marked with Experimental are user-facing features which have not been officially adopted by the Spark project. These are subject to change or removal in minor releases.

    Classes and methods marked with Developer API are intended for advanced users want to extend Spark through lower level interfaces. These are subject to changes or removal in minor releases.

    Definition Classes
    apache
  • package api
    Definition Classes
    spark
  • package broadcast

    Spark's broadcast variables, used to broadcast immutable datasets to all nodes.

    Spark's broadcast variables, used to broadcast immutable datasets to all nodes.

    Definition Classes
    spark
  • package deploy
    Definition Classes
    spark
  • package executor

    Executor components used with various cluster managers.

    Executor components used with various cluster managers. See org.apache.spark.executor.Executor.

    Definition Classes
    spark
  • package input
    Definition Classes
    spark
  • package internal
    Definition Classes
    spark
  • package io

    IO codecs used for compression.

    IO codecs used for compression. See org.apache.spark.io.CompressionCodec.

    Definition Classes
    spark
  • package mapred
    Definition Classes
    spark
  • package memory

    This package implements Spark's memory management system.

    This package implements Spark's memory management system. This system consists of two main components, a JVM-wide memory manager and a per-task manager:

    • org.apache.spark.memory.MemoryManager manages Spark's overall memory usage within a JVM. This component implements the policies for dividing the available memory across tasks and for allocating memory between storage (memory used caching and data transfer) and execution (memory used by computations, such as shuffles, joins, sorts, and aggregations).
    • org.apache.spark.memory.TaskMemoryManager manages the memory allocated by individual tasks. Tasks interact with TaskMemoryManager and never directly interact with the JVM-wide MemoryManager.

    Internally, each of these components have additional abstractions for memory bookkeeping:

    • org.apache.spark.memory.MemoryConsumers are clients of the TaskMemoryManager and correspond to individual operators and data structures within a task. The TaskMemoryManager receives memory allocation requests from MemoryConsumers and issues callbacks to consumers in order to trigger spilling when running low on memory.
    • org.apache.spark.memory.MemoryPools are a bookkeeping abstraction used by the MemoryManager to track the division of memory between storage and execution.

    Diagrammatically:

                                                           +---------------------------+
    +-------------+                                        |       MemoryManager       |
    | MemConsumer |----+                                   |                           |
    +-------------+    |    +-------------------+          |  +---------------------+  |
                       +--->| TaskMemoryManager |----+     |  |OnHeapStorageMemPool |  |
    +-------------+    |    +-------------------+    |     |  +---------------------+  |
    | MemConsumer |----+                             |     |                           |
    +-------------+         +-------------------+    |     |  +---------------------+  |
                            | TaskMemoryManager |----+     |  |OffHeapStorageMemPool|  |
                            +-------------------+    |     |  +---------------------+  |
                                                     +---->|                           |
                                     *               |     |  +---------------------+  |
                                     *               |     |  |OnHeapExecMemPool    |  |
    +-------------+                  *               |     |  +---------------------+  |
    | MemConsumer |----+                             |     |                           |
    +-------------+    |    +-------------------+    |     |  +---------------------+  |
                       +--->| TaskMemoryManager |----+     |  |OffHeapExecMemPool   |  |
                            +-------------------+          |  +---------------------+  |
                                                           |                           |
                                                           +---------------------------+

    There is one implementation of org.apache.spark.memory.MemoryManager:

    • org.apache.spark.memory.UnifiedMemoryManager enforces soft boundaries between storage and execution memory, allowing requests for memory in one region to be fulfilled by borrowing memory from the other.
    Definition Classes
    spark
  • package metrics
    Definition Classes
    spark
  • package network
    Definition Classes
    spark
  • package partial

    Support for approximate results.

    Support for approximate results. This provides convenient api and also implementation for approximate calculation.

    Definition Classes
    spark
    See also

    org.apache.spark.rdd.RDD.countApprox

  • package rdd

    Provides several RDD implementations.

    Provides several RDD implementations. See org.apache.spark.rdd.RDD.

    Definition Classes
    spark
  • package resource
    Definition Classes
    spark
  • package scheduler

    Spark's scheduling components.

    Spark's scheduling components. This includes the org.apache.spark.scheduler.DAGScheduler and lower level org.apache.spark.scheduler.TaskScheduler.

    Definition Classes
    spark
  • package security
    Definition Classes
    spark
  • package serializer

    Pluggable serializers for RDD and shuffle data.

    Pluggable serializers for RDD and shuffle data.

    Definition Classes
    spark
    See also

    org.apache.spark.serializer.Serializer

  • package shuffle
    Definition Classes
    spark
  • package status
    Definition Classes
    spark
  • package storage
    Definition Classes
    spark
  • package unsafe
    Definition Classes
    spark
  • package util

    Spark utilities.

    Spark utilities.

    Definition Classes
    spark
  • package collection
  • package random

    Utilities for random number generation.

  • AccumulatorV2
  • ChildFirstURLClassLoader
  • CollectionAccumulator
  • DoubleAccumulator
  • EnumUtil
  • LongAccumulator
  • MutablePair
  • MutableURLClassLoader
  • ParentClassLoader
  • SerializableConfiguration
  • SizeEstimator
  • StatCounter
  • TaskCompletionListener
  • TaskFailureListener

package util

Spark utilities.

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. util
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Type Members

  1. abstract class AccumulatorV2[IN, OUT] extends Serializable

    The base class for accumulators, that can accumulate inputs of type IN, and produce output of type OUT.

    The base class for accumulators, that can accumulate inputs of type IN, and produce output of type OUT.

    OUT should be a type that can be read atomically (e.g., Int, Long), or thread-safely (e.g., synchronized collections) because it will be read from other threads.

  2. class ChildFirstURLClassLoader extends MutableURLClassLoader

    A mutable class loader that gives preference to its own URLs over the parent class loader when loading classes and resources.

  3. class CollectionAccumulator[T] extends AccumulatorV2[T, List[T]]

    An accumulator for collecting a list of elements.

    An accumulator for collecting a list of elements.

    Since

    2.0.0

  4. class DoubleAccumulator extends AccumulatorV2[Double, Double]

    An accumulator for computing sum, count, and averages for double precision floating numbers.

    An accumulator for computing sum, count, and averages for double precision floating numbers.

    Since

    2.0.0

  5. class EnumUtil extends AnyRef
  6. class LongAccumulator extends AccumulatorV2[Long, Long]

    An accumulator for computing sum, count, and average of 64-bit integers.

    An accumulator for computing sum, count, and average of 64-bit integers.

    Since

    2.0.0

  7. case class MutablePair[T1, T2](_1: T1, _2: T2) extends Product2[T1, T2] with Product with Serializable

    :: DeveloperApi :: A tuple of 2 elements.

    :: DeveloperApi :: A tuple of 2 elements. This can be used as an alternative to Scala's Tuple2 when we want to minimize object allocation.

    _1

    Element 1 of this MutablePair

    _2

    Element 2 of this MutablePair

    Annotations
    @DeveloperApi()
  8. class MutableURLClassLoader extends URLClassLoader

    URL class loader that exposes the addURL method in URLClassLoader.

  9. class ParentClassLoader extends ClassLoader

    A class loader which makes some protected methods in ClassLoader accessible.

  10. class SerializableConfiguration extends Serializable

    Hadoop configuration but serializable.

    Hadoop configuration but serializable. Use value to access the Hadoop configuration.

    Annotations
    @DeveloperApi() @Unstable()
  11. class StatCounter extends Serializable

    A class for tracking the statistics of a set of numbers (count, mean and variance) in a numerically robust way.

    A class for tracking the statistics of a set of numbers (count, mean and variance) in a numerically robust way. Includes support for merging two StatCounters. Based on Welford and Chan's algorithms for running variance.

  12. trait TaskCompletionListener extends EventListener

    :: DeveloperApi ::

    :: DeveloperApi ::

    Listener providing a callback function to invoke when a task's execution completes.

    Annotations
    @DeveloperApi()
  13. trait TaskFailureListener extends EventListener

    :: DeveloperApi ::

    :: DeveloperApi ::

    Listener providing a callback function to invoke when a task's execution encounters an error. Operations defined here must be idempotent, as onTaskFailure can be called multiple times.

    Annotations
    @DeveloperApi()

Value Members

  1. object SizeEstimator extends Logging

    :: DeveloperApi :: Estimates the sizes of Java objects (number of bytes of memory they occupy), for use in memory-aware caches.

    :: DeveloperApi :: Estimates the sizes of Java objects (number of bytes of memory they occupy), for use in memory-aware caches.

    Based on the following JavaWorld article: http://www.javaworld.com/javaworld/javaqa/2003-12/02-qa-1226-sizeof.html

    Annotations
    @DeveloperApi()
  2. object StatCounter extends Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped