Object/Class

org.zouzias.spark.lucenerdd

LuceneRDD

Related Docs: class LuceneRDD | package lucenerdd

Permalink

object LuceneRDD extends Versionable with AnalyzerConfigurable with SimilarityConfigurable

Linear Supertypes
SimilarityConfigurable, AnalyzerConfigurable, Logging, Configurable, Serializable, Serializable, Versionable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. LuceneRDD
  2. SimilarityConfigurable
  3. AnalyzerConfigurable
  4. Logging
  5. Configurable
  6. Serializable
  7. Serializable
  8. Versionable
  9. AnyRef
  10. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. lazy val Config: Config

    Permalink
    Definition Classes
    Configurable
  5. val IndexAnalyzerConfigName: Option[String]

    Permalink
    Attributes
    protected
    Definition Classes
    AnalyzerConfigurable
  6. val LuceneSimilarityConfigValue: Option[String]

    Permalink
    Attributes
    protected
    Definition Classes
    SimilarityConfigurable
  7. val QueryAnalyzerConfigName: Option[String]

    Permalink
    Attributes
    protected
    Definition Classes
    AnalyzerConfigurable
  8. def apply(dataFrame: DataFrame): LuceneRDD[Row]

    Permalink

    Constructor with default index, query analyzers and Lucene similarity

    Constructor with default index, query analyzers and Lucene similarity

    dataFrame

    Input DataFrame

  9. def apply(dataFrame: DataFrame, indexAnalyzer: String, queryAnalyzer: String, similarity: String): LuceneRDD[Row]

    Permalink
  10. def apply(dataFrame: DataFrame, indexAnalyzer: String, queryAnalyzer: String, similarity: String, indexAnalyzerPerField: Map[String, String], queryAnalyzerPerField: Map[String, String]): LuceneRDD[Row]

    Permalink

    Instantiate a LuceneRDD from a DataFrame

    Instantiate a LuceneRDD from a DataFrame

    dataFrame

    Spark DataFrame

  11. def apply[T](elems: Iterable[T])(implicit arg0: ClassTag[T], sc: SparkContext, conv: (T) ⇒ Document): LuceneRDD[T]

    Permalink
  12. def apply[T](elems: RDD[T])(implicit arg0: ClassTag[T], conv: (T) ⇒ Document): LuceneRDD[T]

    Permalink
  13. def apply[T](elems: Iterable[T], indexAnalyzer: String, queryAnalyzer: String, similarity: String, indexAnalyzerPerField: Map[String, String], queryAnalyzerPerField: Map[String, String])(implicit arg0: ClassTag[T], sc: SparkContext, conv: (T) ⇒ Document): LuceneRDD[T]

    Permalink

    Instantiate a LuceneRDD with an iterable

    Instantiate a LuceneRDD with an iterable

    T

    Input type

    elems

    Elements to index

    indexAnalyzer

    Index analyzer name

    queryAnalyzer

    Query analyzer name

    similarity

    Lucene scoring similarity, i.e., BM25 or TF-IDF

    indexAnalyzerPerField

    Lucene Analyzer per field (indexing time), default empty

    queryAnalyzerPerField

    Lucene Analyzer per field (query time), default empty

    sc

    Spark Context

  14. def apply[T](elems: RDD[T], indexAnalyzer: String, queryAnalyzer: String, similarity: String, indexAnalyzerPerField: Map[String, String], queryAnalyzerPerField: Map[String, String])(implicit arg0: ClassTag[T], conv: (T) ⇒ Document): LuceneRDD[T]

    Permalink

    Instantiate a LuceneRDD given an RDD[T]

    Instantiate a LuceneRDD given an RDD[T]

    T

    Generic type

    elems

    RDD of type T

    indexAnalyzer

    Index analyzer name

    queryAnalyzer

    Query analyzer name

    similarity

    Lucene scoring similarity, i.e., BM25 or TF-IDF

    indexAnalyzerPerField

    Lucene Analyzer per field (indexing time), default empty

    queryAnalyzerPerField

    Lucene Analyzer per field (query time), default empty

  15. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  16. def blockDedup(entities: DataFrame, rowToQuery: (Row) ⇒ Query, blockingColumns: Array[String], topK: Int = 3, luceneRDDParams: LuceneRDDParams = LuceneRDDParams()): RDD[(Row, Array[Row])]

    Permalink

    Deduplication via blocking

    Deduplication via blocking

    entities

    Entities DataFrame to deduplicate

    rowToQuery

    Function that maps Row to Lucene Query

    blockingColumns

    Columns on which exact match is required

    topK

    Number of top-K query results

    luceneRDDParams

    Parameters for index-time and query-time analysis

  17. def blockEntityLinkage(queries: DataFrame, entities: DataFrame, rowToQuery: (Row) ⇒ Query, queryPartColumns: Array[String], entityPartColumns: Array[String], topK: Int = 3, luceneRDDParams: LuceneRDDParams = LuceneRDDParams()): RDD[(Row, Array[Row])]

    Permalink

    Entity linkage between two DataFrame by blocking / filtering on one or more columns.

    Entity linkage between two DataFrame by blocking / filtering on one or more columns.

    queries

    Queries / entities to be linked with @corpus

    entities

    DataFrame of entities to be linked with queries parameter

    rowToQuery

    Function[Row, Query] that converts Row to a Lucene Query

    queryPartColumns

    List of query columns for HashPartitioner

    entityPartColumns

    List of entity columns for HashPartitioner

    topK

    Number of linked results

    luceneRDDParams

    Parameters for index and query time analysis

    returns

    Returns top-k linked results as RDD of Tuple2 where _1 is query and _2 is top-k linked results as SparkScoreDoc.

  18. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  19. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  20. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  21. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  22. def getAnalyzer(analyzerName: Option[String]): Analyzer

    Permalink
    Attributes
    protected
    Definition Classes
    AnalyzerConfigurable
  23. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  24. def getOrElseClassic(): String

    Permalink
    Attributes
    protected
    Definition Classes
    SimilarityConfigurable
  25. def getOrElseEn(analyzerName: Option[String]): String

    Permalink

    Get the configured analyzers or fallback to English

    Get the configured analyzers or fallback to English

    Attributes
    protected
    Definition Classes
    AnalyzerConfigurable
  26. def getSimilarity(similarityName: Option[String]): Similarity

    Permalink
    Attributes
    protected
    Definition Classes
    SimilarityConfigurable
  27. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  28. def initializeLogIfNecessary(isInterpreter: Boolean): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  29. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  30. def isTraceEnabled(): Boolean

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  31. def log: Logger

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  32. def logDebug(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  33. def logDebug(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  34. def logError(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  35. def logError(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  36. def logInfo(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  37. def logInfo(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  38. def logTrace(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  39. def logTrace(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  40. def logWarning(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  41. def logWarning(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  42. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  43. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  44. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  45. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  46. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  47. def version(): Map[String, Any]

    Permalink

    Return project information, i.e., version number, build time etc

    Return project information, i.e., version number, build time etc

    Definition Classes
    Versionable
  48. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  49. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  50. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from SimilarityConfigurable

Inherited from AnalyzerConfigurable

Inherited from Logging

Inherited from Configurable

Inherited from Serializable

Inherited from Serializable

Inherited from Versionable

Inherited from AnyRef

Inherited from Any

Ungrouped