org.zouzias.spark.lucenerdd

LuceneRDD

class LuceneRDD[T] extends RDD[T] with Logging with LuceneRDDConfigurable

Spark RDD with Lucene's query capabilities (term, prefix, fuzzy, phrase query)

T

Linear Supertypes
LuceneRDDConfigurable, Configurable, RDD[T], Logging, Serializable, Serializable, AnyRef, Any
Known Subclasses
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. LuceneRDD
  2. LuceneRDDConfigurable
  3. Configurable
  4. RDD
  5. Logging
  6. Serializable
  7. Serializable
  8. AnyRef
  9. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Instance Constructors

  1. new LuceneRDD(partitionsRDD: RDD[AbstractLuceneRDDPartition[T]])(implicit arg0: ClassTag[T])

Value Members

  1. final def !=(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  2. final def !=(arg0: Any): Boolean

    Definition Classes
    Any
  3. final def ##(): Int

    Definition Classes
    AnyRef → Any
  4. def ++(other: RDD[T]): RDD[T]

    Definition Classes
    RDD
  5. final def ==(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  6. final def ==(arg0: Any): Boolean

    Definition Classes
    Any
  7. val DefaultFacetNum: Int

    Attributes
    protected
    Definition Classes
    LuceneRDDConfigurable
  8. val DefaultTopK: Int

    Default value for topK queries

    Default value for topK queries

    Attributes
    protected
    Definition Classes
    LuceneRDDConfigurable
  9. val MaxDefaultTopKValue: Int

    Attributes
    protected
    Definition Classes
    LuceneRDDConfigurable
  10. def aggregate[U](zeroValue: U)(seqOp: (U, T) ⇒ U, combOp: (U, U) ⇒ U)(implicit arg0: ClassTag[U]): U

    Definition Classes
    RDD
  11. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  12. def cache(): LuceneRDD.this.type

    Definition Classes
    LuceneRDD → RDD
  13. def cartesian[U](other: RDD[U])(implicit arg0: ClassTag[U]): RDD[(T, U)]

    Definition Classes
    RDD
  14. def checkpoint(): Unit

    Definition Classes
    RDD
  15. def clearDependencies(): Unit

    Attributes
    protected
    Definition Classes
    RDD
  16. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  17. def close(): Unit

  18. def coalesce(numPartitions: Int, shuffle: Boolean)(implicit ord: Ordering[T]): RDD[T]

    Definition Classes
    RDD
  19. def collect[U](f: PartialFunction[T, U])(implicit arg0: ClassTag[U]): RDD[U]

    Definition Classes
    RDD
  20. def collect(): Array[T]

    Definition Classes
    RDD
  21. def compute(part: Partition, context: TaskContext): Iterator[T]

    RDD compute method.

    RDD compute method.

    Definition Classes
    LuceneRDD → RDD
  22. val config: Config

    Definition Classes
    Configurable
  23. def context: SparkContext

    Definition Classes
    RDD
  24. def count(): Long

    Definition Classes
    LuceneRDD → RDD
  25. def countApprox(timeout: Long, confidence: Double): PartialResult[BoundedDouble]

    Definition Classes
    RDD
  26. def countApproxDistinct(relativeSD: Double): Long

    Definition Classes
    RDD
  27. def countApproxDistinct(p: Int, sp: Int): Long

    Definition Classes
    RDD
  28. def countByValue()(implicit ord: Ordering[T]): Map[T, Long]

    Definition Classes
    RDD
  29. def countByValueApprox(timeout: Long, confidence: Double)(implicit ord: Ordering[T]): PartialResult[Map[T, BoundedDouble]]

    Definition Classes
    RDD
  30. final def dependencies: Seq[Dependency[_]]

    Definition Classes
    RDD
  31. def distinct(): RDD[T]

    Definition Classes
    RDD
  32. def distinct(numPartitions: Int)(implicit ord: Ordering[T]): RDD[T]

    Definition Classes
    RDD
  33. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  34. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  35. def exists(elem: T): Boolean

  36. def exists(doc: Map[String, String]): Boolean

    Lucene generic query

    Lucene generic query

    doc
    returns

  37. def fields(): Set[String]

    Return all document fields

    Return all document fields

    returns

  38. def filter(pred: (T) ⇒ Boolean): LuceneRDD[T]

    Definition Classes
    LuceneRDD → RDD
  39. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  40. def first(): T

    Definition Classes
    RDD
  41. def firstParent[U](implicit arg0: ClassTag[U]): RDD[U]

    Attributes
    protected[org.apache.spark]
    Definition Classes
    RDD
  42. def flatMap[U](f: (T) ⇒ TraversableOnce[U])(implicit arg0: ClassTag[U]): RDD[U]

    Definition Classes
    RDD
  43. def fold(zeroValue: T)(op: (T, T) ⇒ T): T

    Definition Classes
    RDD
  44. def foreach(f: (T) ⇒ Unit): Unit

    Definition Classes
    RDD
  45. def foreachPartition(f: (Iterator[T]) ⇒ Unit): Unit

    Definition Classes
    RDD
  46. def fuzzyQuery(fieldName: String, query: String, maxEdits: Int, topK: Int = DefaultTopK): LuceneRDDResponse

    Lucene fuzzy query

    Lucene fuzzy query

    fieldName

    Name of field

    query

    Query text

    maxEdits

    Fuzziness, edit distance

    topK

    Number of documents to return

    returns

  47. def getCheckpointFile: Option[String]

    Definition Classes
    RDD
  48. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  49. def getDependencies: Seq[Dependency[_]]

    Attributes
    protected
    Definition Classes
    RDD
  50. final def getNumPartitions: Int

    Definition Classes
    RDD
    Annotations
    @Since( "1.6.0" )
  51. def getPartitions: Array[Partition]

    Attributes
    protected
    Definition Classes
    LuceneRDD → RDD
  52. def getPreferredLocations(s: Partition): Seq[String]

    Attributes
    protected
    Definition Classes
    LuceneRDD → RDD
  53. def getStorageLevel: StorageLevel

    Definition Classes
    RDD
  54. def glom(): RDD[Array[T]]

    Definition Classes
    RDD
  55. def groupBy[K](f: (T) ⇒ K, p: Partitioner)(implicit kt: ClassTag[K], ord: Ordering[K]): RDD[(K, Iterable[T])]

    Definition Classes
    RDD
  56. def groupBy[K](f: (T) ⇒ K, numPartitions: Int)(implicit kt: ClassTag[K]): RDD[(K, Iterable[T])]

    Definition Classes
    RDD
  57. def groupBy[K](f: (T) ⇒ K)(implicit kt: ClassTag[K]): RDD[(K, Iterable[T])]

    Definition Classes
    RDD
  58. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  59. val id: Int

    Definition Classes
    RDD
  60. def intersection(other: RDD[T], numPartitions: Int): RDD[T]

    Definition Classes
    RDD
  61. def intersection(other: RDD[T], partitioner: Partitioner)(implicit ord: Ordering[T]): RDD[T]

    Definition Classes
    RDD
  62. def intersection(other: RDD[T]): RDD[T]

    Definition Classes
    RDD
  63. def isCheckpointed: Boolean

    Definition Classes
    RDD
  64. def isEmpty(): Boolean

    Definition Classes
    RDD
  65. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  66. def isTraceEnabled(): Boolean

    Attributes
    protected
    Definition Classes
    Logging
  67. final def iterator(split: Partition, context: TaskContext): Iterator[T]

    Definition Classes
    RDD
  68. def keyBy[K](f: (T) ⇒ K): RDD[(K, T)]

    Definition Classes
    RDD
  69. def link[T1](other: RDD[T1], searchQueryGen: (T1) ⇒ String, topK: Int = DefaultTopK)(implicit arg0: ClassTag[T1]): RDD[(T1, List[SparkScoreDoc])]

    Entity linkage via Lucene query over all elements of an RDD.

    Entity linkage via Lucene query over all elements of an RDD.

    T1

    A type

    other

    RDD to be linked

    searchQueryGen

    Function that generates a search query for each element of other

    returns

    an RDD of Tuple2 that contains the linked search Lucene documents in the second

    Note: Currently the query strings of the other RDD are collected to the driver and broadcast to the workers.

  70. def linkByQuery[T1](other: RDD[T1], searchQueryGen: (T1) ⇒ Query, topK: Int = DefaultTopK)(implicit arg0: ClassTag[T1]): RDD[(T1, List[SparkScoreDoc])]

    Entity linkage via Lucene query over all elements of an RDD.

    Entity linkage via Lucene query over all elements of an RDD.

    T1

    A type

    other

    RDD to be linked

    searchQueryGen

    Function that generates a Lucene Query object for each element of other

    returns

    an RDD of Tuple2 that contains the linked search Lucene Document in the second position

  71. def linkDataFrame(other: DataFrame, searchQueryGen: (Row) ⇒ String, topK: Int = DefaultTopK): RDD[(Row, List[SparkScoreDoc])]

    Entity linkage via Lucene query over all elements of an RDD.

    Entity linkage via Lucene query over all elements of an RDD.

    other

    DataFrame to be linked

    searchQueryGen

    Function that generates a search query for each element of other

    topK
    returns

    an RDD of Tuple2 that contains the linked search Lucene documents in the second

  72. def localCheckpoint(): LuceneRDD.this.type

    Definition Classes
    RDD
  73. def log: Logger

    Attributes
    protected
    Definition Classes
    Logging
  74. def logDebug(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  75. def logDebug(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  76. def logError(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  77. def logError(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  78. def logInfo(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  79. def logInfo(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  80. def logName: String

    Attributes
    protected
    Definition Classes
    Logging
  81. def logTrace(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  82. def logTrace(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  83. def logWarning(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  84. def logWarning(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  85. def map[U](f: (T) ⇒ U)(implicit arg0: ClassTag[U]): RDD[U]

    Definition Classes
    RDD
  86. def mapPartitions[U](f: (Iterator[T]) ⇒ Iterator[U], preservesPartitioning: Boolean)(implicit arg0: ClassTag[U]): RDD[U]

    Definition Classes
    RDD
  87. def mapPartitionsWithIndex[U](f: (Int, Iterator[T]) ⇒ Iterator[U], preservesPartitioning: Boolean)(implicit arg0: ClassTag[U]): RDD[U]

    Definition Classes
    RDD
  88. def max()(implicit ord: Ordering[T]): T

    Definition Classes
    RDD
  89. def min()(implicit ord: Ordering[T]): T

    Definition Classes
    RDD
  90. var name: String

    Definition Classes
    RDD
  91. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  92. final def notify(): Unit

    Definition Classes
    AnyRef
  93. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  94. def parent[U](j: Int)(implicit arg0: ClassTag[U]): RDD[U]

    Attributes
    protected[org.apache.spark]
    Definition Classes
    RDD
  95. def partitionMapper(f: (AbstractLuceneRDDPartition[T]) ⇒ LuceneRDDResponsePartition, k: Int): LuceneRDDResponse

    Maps partition results

    Maps partition results

    f

    Function to apply on each partition / distributed index

    k

    number of documents to return

    returns

    Attributes
    protected
  96. val partitioner: Option[Partitioner]

    Definition Classes
    RDD
  97. final def partitions: Array[Partition]

    Definition Classes
    RDD
  98. val partitionsRDD: RDD[AbstractLuceneRDDPartition[T]]

    Attributes
    protected
  99. def persist(newLevel: StorageLevel): LuceneRDD.this.type

    Definition Classes
    LuceneRDD → RDD
  100. def persist(): LuceneRDD.this.type

    Definition Classes
    RDD
  101. def phraseQuery(fieldName: String, query: String, topK: Int = DefaultTopK): LuceneRDDResponse

    Lucene phrase Query

    Lucene phrase Query

    fieldName

    Name of field

    query

    Query text

    topK

    Number of documents to return

    returns

  102. def pipe(command: Seq[String], env: Map[String, String], printPipeContext: ((String) ⇒ Unit) ⇒ Unit, printRDDElement: (T, (String) ⇒ Unit) ⇒ Unit, separateWorkingDir: Boolean): RDD[String]

    Definition Classes
    RDD
  103. def pipe(command: String, env: Map[String, String]): RDD[String]

    Definition Classes
    RDD
  104. def pipe(command: String): RDD[String]

    Definition Classes
    RDD
  105. final def preferredLocations(split: Partition): Seq[String]

    Definition Classes
    RDD
  106. def prefixQuery(fieldName: String, query: String, topK: Int = DefaultTopK): LuceneRDDResponse

    Lucene prefix query

    Lucene prefix query

    fieldName

    Name of field

    query

    Prefix query text

    topK

    Number of documents to return

    returns

  107. def query(searchString: String, topK: Int = DefaultTopK): LuceneRDDResponse

    Generic query using Lucene's query parser

    Generic query using Lucene's query parser

    searchString

    Query String

    topK
    returns

  108. def randomSplit(weights: Array[Double], seed: Long): Array[RDD[T]]

    Definition Classes
    RDD
  109. def reduce(f: (T, T) ⇒ T): T

    Definition Classes
    RDD
  110. def repartition(numPartitions: Int)(implicit ord: Ordering[T]): RDD[T]

    Definition Classes
    RDD
  111. def sample(withReplacement: Boolean, fraction: Double, seed: Long): RDD[T]

    Definition Classes
    RDD
  112. def saveAsObjectFile(path: String): Unit

    Definition Classes
    RDD
  113. def saveAsTextFile(path: String, codec: Class[_ <: CompressionCodec]): Unit

    Definition Classes
    RDD
  114. def saveAsTextFile(path: String): Unit

    Definition Classes
    RDD
  115. def setName(_name: String): LuceneRDD.this.type

    Set the name for the RDD; By default set to "LuceneRDD"

    Set the name for the RDD; By default set to "LuceneRDD"

    Definition Classes
    LuceneRDD → RDD
  116. def sortBy[K](f: (T) ⇒ K, ascending: Boolean, numPartitions: Int)(implicit ord: Ordering[K], ctag: ClassTag[K]): RDD[T]

    Definition Classes
    RDD
  117. def sparkContext: SparkContext

    Definition Classes
    RDD
  118. def subtract(other: RDD[T], p: Partitioner)(implicit ord: Ordering[T]): RDD[T]

    Definition Classes
    RDD
  119. def subtract(other: RDD[T], numPartitions: Int): RDD[T]

    Definition Classes
    RDD
  120. def subtract(other: RDD[T]): RDD[T]

    Definition Classes
    RDD
  121. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  122. def take(num: Int): Array[T]

    Definition Classes
    RDD
  123. def takeOrdered(num: Int)(implicit ord: Ordering[T]): Array[T]

    Definition Classes
    RDD
  124. def takeSample(withReplacement: Boolean, num: Int, seed: Long): Array[T]

    Definition Classes
    RDD
  125. def termQuery(fieldName: String, query: String, topK: Int = DefaultTopK): LuceneRDDResponse

    Lucene term query

    Lucene term query

    fieldName

    Name of field

    query

    Term to search on

    topK

    Number of documents to return

    returns

  126. def toDebugString: String

    Definition Classes
    RDD
  127. def toJavaRDD(): JavaRDD[T]

    Definition Classes
    RDD
  128. def toLocalIterator: Iterator[T]

    Definition Classes
    RDD
  129. def toString(): String

    Definition Classes
    RDD → AnyRef → Any
  130. def top(num: Int)(implicit ord: Ordering[T]): Array[T]

    Definition Classes
    RDD
  131. def treeAggregate[U](zeroValue: U)(seqOp: (U, T) ⇒ U, combOp: (U, U) ⇒ U, depth: Int)(implicit arg0: ClassTag[U]): U

    Definition Classes
    RDD
  132. def treeReduce(f: (T, T) ⇒ T, depth: Int): T

    Definition Classes
    RDD
  133. def union(other: RDD[T]): RDD[T]

    Definition Classes
    RDD
  134. def unpersist(blocking: Boolean = true): LuceneRDD.this.type

    Definition Classes
    LuceneRDD → RDD
  135. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  136. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  137. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  138. def zip[U](other: RDD[U])(implicit arg0: ClassTag[U]): RDD[(T, U)]

    Definition Classes
    RDD
  139. def zipPartitions[B, C, D, V](rdd2: RDD[B], rdd3: RDD[C], rdd4: RDD[D])(f: (Iterator[T], Iterator[B], Iterator[C], Iterator[D]) ⇒ Iterator[V])(implicit arg0: ClassTag[B], arg1: ClassTag[C], arg2: ClassTag[D], arg3: ClassTag[V]): RDD[V]

    Definition Classes
    RDD
  140. def zipPartitions[B, C, D, V](rdd2: RDD[B], rdd3: RDD[C], rdd4: RDD[D], preservesPartitioning: Boolean)(f: (Iterator[T], Iterator[B], Iterator[C], Iterator[D]) ⇒ Iterator[V])(implicit arg0: ClassTag[B], arg1: ClassTag[C], arg2: ClassTag[D], arg3: ClassTag[V]): RDD[V]

    Definition Classes
    RDD
  141. def zipPartitions[B, C, V](rdd2: RDD[B], rdd3: RDD[C])(f: (Iterator[T], Iterator[B], Iterator[C]) ⇒ Iterator[V])(implicit arg0: ClassTag[B], arg1: ClassTag[C], arg2: ClassTag[V]): RDD[V]

    Definition Classes
    RDD
  142. def zipPartitions[B, C, V](rdd2: RDD[B], rdd3: RDD[C], preservesPartitioning: Boolean)(f: (Iterator[T], Iterator[B], Iterator[C]) ⇒ Iterator[V])(implicit arg0: ClassTag[B], arg1: ClassTag[C], arg2: ClassTag[V]): RDD[V]

    Definition Classes
    RDD
  143. def zipPartitions[B, V](rdd2: RDD[B])(f: (Iterator[T], Iterator[B]) ⇒ Iterator[V])(implicit arg0: ClassTag[B], arg1: ClassTag[V]): RDD[V]

    Definition Classes
    RDD
  144. def zipPartitions[B, V](rdd2: RDD[B], preservesPartitioning: Boolean)(f: (Iterator[T], Iterator[B]) ⇒ Iterator[V])(implicit arg0: ClassTag[B], arg1: ClassTag[V]): RDD[V]

    Definition Classes
    RDD
  145. def zipWithIndex(): RDD[(T, Long)]

    Definition Classes
    RDD
  146. def zipWithUniqueId(): RDD[(T, Long)]

    Definition Classes
    RDD

Deprecated Value Members

  1. def filterWith[A](constructA: (Int) ⇒ A)(p: (T, A) ⇒ Boolean): RDD[T]

    Definition Classes
    RDD
    Annotations
    @deprecated
    Deprecated

    (Since version 1.0.0) use mapPartitionsWithIndex and filter

  2. def flatMapWith[A, U](constructA: (Int) ⇒ A, preservesPartitioning: Boolean)(f: (T, A) ⇒ Seq[U])(implicit arg0: ClassTag[U]): RDD[U]

    Definition Classes
    RDD
    Annotations
    @deprecated
    Deprecated

    (Since version 1.0.0) use mapPartitionsWithIndex and flatMap

  3. def foreachWith[A](constructA: (Int) ⇒ A)(f: (T, A) ⇒ Unit): Unit

    Definition Classes
    RDD
    Annotations
    @deprecated
    Deprecated

    (Since version 1.0.0) use mapPartitionsWithIndex and foreach

  4. def mapPartitionsWithContext[U](f: (TaskContext, Iterator[T]) ⇒ Iterator[U], preservesPartitioning: Boolean)(implicit arg0: ClassTag[U]): RDD[U]

    Definition Classes
    RDD
    Annotations
    @DeveloperApi() @deprecated
    Deprecated

    (Since version 1.2.0) use TaskContext.get

  5. def mapPartitionsWithSplit[U](f: (Int, Iterator[T]) ⇒ Iterator[U], preservesPartitioning: Boolean)(implicit arg0: ClassTag[U]): RDD[U]

    Definition Classes
    RDD
    Annotations
    @deprecated
    Deprecated

    (Since version 0.7.0) use mapPartitionsWithIndex

  6. def mapWith[A, U](constructA: (Int) ⇒ A, preservesPartitioning: Boolean)(f: (T, A) ⇒ U)(implicit arg0: ClassTag[U]): RDD[U]

    Definition Classes
    RDD
    Annotations
    @deprecated
    Deprecated

    (Since version 1.0.0) use mapPartitionsWithIndex

  7. def toArray(): Array[T]

    Definition Classes
    RDD
    Annotations
    @deprecated
    Deprecated

    (Since version 1.0.0) use collect

Inherited from LuceneRDDConfigurable

Inherited from Configurable

Inherited from RDD[T]

Inherited from Logging

Inherited from Serializable

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped