Object

io.github.edouardfouche.preprocess

Preprocess

Related Doc: package preprocess

Permalink

object Preprocess extends Preprocessing

Encapsulate a few preprocessing steps (open a CSV file, compute the rank index structure).

Linear Supertypes
Preprocessing, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. Preprocess
  2. Preprocessing
  3. AnyRef
  4. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Type Members

  1. type PreprocessedData = Array[T]

    Permalink
    Definition Classes
    Preprocessing
  2. abstract type T

    Permalink
    Definition Classes
    Preprocessing

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  5. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  6. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  7. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  8. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  9. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  10. def getColumnNames(path: String, header: Int = 1, separator: String = ","): Array[String]

    Permalink

    Get the columns names of a data set.

    Get the columns names of a data set. Assumes the names are placed in the first line and separated by a comma.

    path

    Path of the file in the system.

    header

    Number of lines to discard (header), by default 1.

    separator

    Number of lines to discard (header), by default 1.

    returns

    An array of strings, where each string is a column name. Names are in the original order.

    Note

    This is quick and dirty, open normally by keeping the class and only keep the last column

  11. def getColumnNamesMap(path: String, header: Int = 1, separator: String = ","): Map[Int, String]

    Permalink

    Get the columns names of a data set in a map, assigning the position index (integer) to the corresponding name (string)

    Get the columns names of a data set in a map, assigning the position index (integer) to the corresponding name (string)

    path

    Path of the file in the system.

    header

    Number of lines to discard (header), by default 1.

    separator

    Number of lines to discard (header), by default 1.

    returns

    An array of strings, where each string is a column name. Names are in the original order.

    Note

    This is quick and dirty, open normally by keeping the class and only keep the last column

  12. def getLabels(path: String, header: Int = 1, separator: String = ",", excludeIndex: Boolean = false): Array[Boolean]

    Permalink

    Get the last column of a data file, assume it is the class and that it is numerical, even binary

    Get the last column of a data file, assume it is the class and that it is numerical, even binary

    path

    Path of the file in the system.

    header

    Number of lines to discard (header), by default 1.

    separator

    Number of lines to discard (header), by default 1.

    excludeIndex

    Whether to exclude an index (the first column) or not.

    returns

    The "class" column, should be an Array of Double

    Note

    This is quick and dirty, open normally by keeping the class and only keep the last column

  13. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  14. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  15. def ksRank(input: Array[Array[Double]], parallelize: Int = 0): Array[Array[(Int, Float)]]

    Permalink

    Return the rank index structure (as in HiCS).

    Return the rank index structure (as in HiCS).

    Note that the numbers might be different in the case of ties, in comparison with other implementations.

    input

    A 2-D Array of Double (data set).

    returns

    A 2-D Array of 2-D Tuple, where the first element is the original index, the second is its value (actually not in used for the KSP test)

  16. def ksRankSimple(input: Array[Array[Double]], parallelize: Int = 0): Array[Array[Int]]

    Permalink

    Return the rank index structure (as in HiCS).

    Return the rank index structure (as in HiCS).

    Note that the numbers might be different in the case of ties, in comparison with other implementations.

    input

    A 2-D Array of Double (data set, column-oriented).

    returns

    A 2-D Array of Int, where the element is the original index in the unsorted data set

  17. def mwRank(input: Array[Array[Double]], parallelize: Int): Array[Array[(Int, Float)]]

    Permalink

    Return the rank index structure for MWP, with adjusted ranks but no correction for ties.

    Return the rank index structure for MWP, with adjusted ranks but no correction for ties.

    input

    A 2-D Array of Double (data set, column-oriented).

    returns

    A 2-D Array of 2-D Tuple, where the first element is the original index, the second is its rank.

  18. def mwRankCorrectionCumulative(input: Array[Array[Double]], parallelize: Int): Array[Array[(Int, Float, Double)]]

    Permalink

    Return the rank index structure for MWP, with adjusted ranks AND correction for ties.

    Return the rank index structure for MWP, with adjusted ranks AND correction for ties.

    input

    A 2-D Array of Double (data set, column-oriented).

    returns

    A 2-D Array of 3-D Tuple, where the first element is the original index, the second is its rank and the the last one a cumulative correction for ties.

  19. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  20. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  21. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  22. def open(path: String, header: Int = 1, separator: String = ",", excludeIndex: Boolean = false, dropClass: Boolean = true, sample1000: Boolean = false): Array[Array[Double]]

    Permalink

    Helper function that redirects to openArff in case an arff is given else openCSV

    Helper function that redirects to openArff in case an arff is given else openCSV

    returns

    A data set (row oriented)

  23. def openArff(path: String, dropClass: Boolean = true, max1000: Boolean = false): Array[Array[Double]]

    Permalink

    Open an Arff file as a 2-D Array of Double

    Open an Arff file as a 2-D Array of Double

    path

    Path to the file in the current filesystem

    dropClass

    Whether to drop the "class" column if there is one

    max1000

    cap the opened data to 1000 rows. If the original data has more rows, sample 1000 without replacement

    returns

    A 2-D Array of Double containing the values for each numerical columns (row-oriented)

    Note

    This method is inspired from the work of Fabian Keller

  24. def openCSV(path: String, header: Int = 1, separator: String = ",", excludeIndex: Boolean = false, dropClass: Boolean = true, max1000: Boolean = false): Array[Array[Double]]

    Permalink

    Open a csv file at a specified path.

    Open a csv file at a specified path. Currently, only handle numerical values.

    path

    Path of the file in the system.

    header

    Number of lines to discard (header), by default 1.

    separator

    Separator used, by default, comma.

    excludeIndex

    Whether to exclude an index (the first column) or not.

    dropClass

    Whether to drop the "class" column if there is one. (assumes it is the last one)

    max1000

    cap the opened data to 1000 rows. If the original data has more rows, sample 1000 without replacement

    returns

    A 2-D Array of Double containing the values from the csv. (row-oriented)

  25. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  26. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  27. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  28. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  29. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from Preprocessing

Inherited from AnyRef

Inherited from Any

Ungrouped