Package

com.spotify.scio.extra

sparkey

Permalink

package sparkey

Main package for Sparkey side input APIs. Import all.

import com.spotify.scio.extra.sparkey._

To save an SCollection[(String, String)] to a Sparkey file:

val s = sc.parallelize(Seq("a" -> "one", "b" -> "two"))

// temporary location
val s1: SCollection[SparkeyUri] = s.asSparkey

// specific location
val s1: SCollection[SparkeyUri] = s.asSparkey("gs:////")

The result SCollection[SparkeyUri] can be converted to a side input:

val s: SCollection[SparkeyUri] = sc.parallelize(Seq("a" -> "one", "b" -> "two")).asSparkey
val side: SideInput[SparkeyReader] = s.asSparkeySideInput

These two steps can be done with a syntactic sugar:

val side: SideInput[SparkeyReader] = sc
  .parallelize(Seq("a" -> "one", "b" -> "two"))
  .asSparkeySideInput

An existing Sparkey file can also be converted to a side input directly:

sc.sparkeySideInput("gs:////")

SparkeyReader can be used like a lookup table in a side input operation:

val main: SCollection[String] = sc.parallelize(Seq("a", "b", "c"))
val side: SideInput[SparkeyReader] = sc
  .parallelize(Seq("a" -> "one", "b" -> "two"))
  .asSparkeySideInput

main.withSideInputs(side)
  .map { (x, s) =>
    s(side).getOrElse(x, "unknown")
  }
Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. sparkey
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Type Members

  1. implicit class RichStringSparkeyReader extends Map[String, String]

    Permalink

    Enhanced version of SparkeyReader that mimics a Map.

  2. implicit class SparkeyPairSCollection[K, V] extends AnyRef

    Permalink

    Enhanced version of SCollection with Sparkey methods.

  3. implicit final class SparkeySCollection extends AnyVal

    Permalink

    Enhanced version of SCollection with Sparkey methods.

  4. implicit final class SparkeyScioContext extends AnyVal

    Permalink

    Enhanced version of ScioContext with Sparkey methods.

  5. trait SparkeyUri extends Serializable

    Permalink

    Represents the base URI for a Sparkey index and log file, either on the local or a remote file system.

    Represents the base URI for a Sparkey index and log file, either on the local or a remote file system. For remote file systems, basePath should be in the form 'scheme://<bucket>/<path>/<sparkey-prefix>'. For local files, it should be in the form '/<path>/<sparkey-prefix>'. Note that basePath must not be a folder or GCS bucket as it is a base path representing two files - <sparkey-prefix>.spi and <sparkey-prefix>.spl.

  6. sealed trait SparkeyWritable[K, V] extends Serializable

    Permalink

Value Members

  1. implicit val ByteArraySparkeyWritable: SparkeyWritable[Array[Byte], Array[Byte]]

    Permalink
  2. implicit val stringSparkeyWritable: SparkeyWritable[String, String]

    Permalink

Inherited from AnyRef

Inherited from Any

Ungrouped