Package

com.spotify.scio.extra

annoy

Permalink

package annoy

Main package for Annoy side input APIs. Import all.

import com.spotify.scio.extra.annoy._

Two metrics are available, Angular and Euclidean.

To save an SCollection[(Int, Array[Float])] to an Annoy file:

val s = sc.parallelize(Seq( 1-> Array(1.2f, 3.4f), 2 -> Array(2.2f, 1.2f)))

Save to a temporary location:

val s1 = s.asAnnoy(Angular, 40, 10)

Save to a specific location:

val s1 = s.asAnnoy(Angular, 40, 10, "gs:///")

SCollection[AnnoyUri] can be converted into a side input:

val s = sc.parallelize(Seq( 1-> Array(1.2f, 3.4f), 2 -> Array(2.2f, 1.2f)))
val side = s.asAnnoySideInput(metric, dimension, numTrees)

There's syntactic sugar for saving an SCollection and converting it to a side input:

val s = sc
  .parallelize(Seq( 1-> Array(1.2f, 3.4f), 2 -> Array(2.2f, 1.2f)))
  .asAnnoySideInput(metric, dimension, numTrees)

An existing Annoy file can be converted to a side input directly:

sc.annoySideInput(metric, dimension, numTrees, "gs:///")

AnnoyReader provides nearest neighbor lookups by vector as well as item lookups:

val data = (0 until 1000).map(x => (x, Array.fill(40)(r.nextFloat())))
val main = sc.parallelize(data)
val side = main.asAnnoySideInput(metric, dimension, numTrees)

main.keys.withSideInput(side)
  .map { (i, s) =>
    val annoyReader = s(side)

    // get vector by item id, allocating a new Array[Float] each time
    val v1 = annoyReader.getItemVector(i)

    // get vector by item id, copy vector into pre-allocated Array[Float]
    val v2 = Array.fill(dim)(-1.0f)
    annoyReader.getItemVector(i, v2)

    // get 10 nearest neighbors by vector
    val results = annoyReader.getNearest(v2, 10)
  }
Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. annoy
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Type Members

  1. sealed trait AnnoyMetric extends AnyRef

    Permalink
  2. implicit class AnnoyPairSCollection extends AnyRef

    Permalink
  3. class AnnoyReader extends AnyRef

    Permalink

    AnnoyReader class for approximate nearest neighbor lookups.

    AnnoyReader class for approximate nearest neighbor lookups. Supports vector lookup by item as well as nearest neighbor lookup by vector.

  4. implicit final class AnnoySCollection extends AnyVal

    Permalink

    Enhanced version of SCollection with Annoy methods

  5. implicit final class AnnoyScioContext extends AnyVal

    Permalink

    Enhanced version of ScioContext with Annoy methods.

  6. trait AnnoyUri extends Serializable

    Permalink

    Represents the base URI for an Annoy tree, either on the local or a remote file system.

Value Members

  1. object Angular extends AnnoyMetric with Product with Serializable

    Permalink
  2. object Euclidean extends AnnoyMetric with Product with Serializable

    Permalink

Inherited from AnyRef

Inherited from Any

Ungrouped