kr.ac.kaist.ir.deep.wordvec

PrepareCorpus

object PrepareCorpus extends Logging

Train Word2Vec and save the model.

Linear Supertypes
Logging, AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. PrepareCorpus
  2. Logging
  3. AnyRef
  4. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Value Members

  1. final def !=(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  5. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  6. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  7. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  8. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  9. def getArgument(args: Array[String], key: String, default: String): String

    Read argument

    Read argument

    args

    Argument Array

    key

    Argument Key

    default

    Default value of this argument

    returns

    Value of this key.

  10. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  11. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  12. def infrequentWords(words: RDD[String], threshold: Int): HashSet[String]

    Collect frequent words with count >= Threshold

    Collect frequent words with count >= Threshold

    words

    Word seq.

    returns

    HashSet of frequent words.

  13. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  14. def isTraceEnabled(): Boolean

    Attributes
    protected
    Definition Classes
    Logging
  15. def log: Logger

    Attributes
    protected
    Definition Classes
    Logging
  16. def logDebug(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  17. def logDebug(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  18. def logError(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  19. def logError(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  20. def logInfo(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  21. def logInfo(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  22. def logName: String

    Attributes
    protected
    Definition Classes
    Logging
  23. def logTrace(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  24. def logTrace(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  25. def logWarning(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  26. def logWarning(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  27. def main(args: Array[String]): Unit

    Main thread.

    Main thread.

    args

    CLI arguments

  28. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  29. def normalizedTokens(input: RDD[_ <: Seq[String]], infreqSet: Broadcast[HashSet[String]]): RDD[String]

    Convert tokenized string into a sentence, with appropriate conversion of (Threshold - 1) count word.

    Convert tokenized string into a sentence, with appropriate conversion of (Threshold - 1) count word.

    input

    Tokenized input sentence

    infreqSet

    Less Frequent words

    returns

    Tokenized converted sentence

  30. final def notify(): Unit

    Definition Classes
    AnyRef
  31. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  32. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  33. def toString(): String

    Definition Classes
    AnyRef → Any
  34. def tokenize(lines: RDD[String], bcFilter: Broadcast[_ <: WordFilter]): RDD[WrappedArray[String]]

    Convert input into tokenized string, using Stanford NLP toolkit.

    Convert input into tokenized string, using Stanford NLP toolkit.

    lines

    Input lines

    returns

    tokenized & normalized lines.

  35. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  36. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  37. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from Logging

Inherited from AnyRef

Inherited from Any

Ungrouped