Get the columns names of a data set.
Get the columns names of a data set. Assumes the names are placed in the first line and separated by a comma.
Path of the file in the system.
Number of lines to discard (header), by default 1.
Number of lines to discard (header), by default 1.
An array of strings, where each string is a column name. Names are in the original order.
This is quick and dirty, open normally by keeping the class and only keep the last column
Get the columns names of a data set in a map, assigning the position index (integer) to the corresponding name (string)
Get the columns names of a data set in a map, assigning the position index (integer) to the corresponding name (string)
Path of the file in the system.
Number of lines to discard (header), by default 1.
Number of lines to discard (header), by default 1.
An array of strings, where each string is a column name. Names are in the original order.
This is quick and dirty, open normally by keeping the class and only keep the last column
Get the last column of a data file, assume it is the class and that it is numerical, even binary
Get the last column of a data file, assume it is the class and that it is numerical, even binary
Path of the file in the system.
Number of lines to discard (header), by default 1.
Number of lines to discard (header), by default 1.
Whether to exclude an index (the first column) or not.
The "class" column, should be an Array of Double
This is quick and dirty, open normally by keeping the class and only keep the last column
Return the rank index structure (as in HiCS).
Return the rank index structure (as in HiCS).
Note that the numbers might be different in the case of ties, in comparison with other implementations.
A 2-D Array of Double (data set).
A 2-D Array of 2-D Tuple, where the first element is the original index, the second is its value (actually not in used for the KSP test)
Return the rank index structure (as in HiCS).
Return the rank index structure (as in HiCS).
Note that the numbers might be different in the case of ties, in comparison with other implementations.
A 2-D Array of Double (data set, column-oriented).
A 2-D Array of Int, where the element is the original index in the unsorted data set
Return the rank index structure for MWP, with adjusted ranks but no correction for ties.
Return the rank index structure for MWP, with adjusted ranks but no correction for ties.
A 2-D Array of Double (data set, column-oriented).
A 2-D Array of 2-D Tuple, where the first element is the original index, the second is its rank.
Return the rank index structure for MWP, with adjusted ranks AND correction for ties.
Return the rank index structure for MWP, with adjusted ranks AND correction for ties.
A 2-D Array of Double (data set, column-oriented).
A 2-D Array of 3-D Tuple, where the first element is the original index, the second is its rank and the the last one a cumulative correction for ties.
Helper function that redirects to openArff in case an arff is given else openCSV
Helper function that redirects to openArff in case an arff is given else openCSV
A data set (row oriented)
Open an Arff file as a 2-D Array of Double
Open an Arff file as a 2-D Array of Double
Path to the file in the current filesystem
Whether to drop the "class" column if there is one
cap the opened data to 1000 rows. If the original data has more rows, sample 1000 without replacement
A 2-D Array of Double containing the values for each numerical columns (row-oriented)
This method is inspired from the work of Fabian Keller
Open a csv file at a specified path.
Open a csv file at a specified path. Currently, only handle numerical values.
Path of the file in the system.
Number of lines to discard (header), by default 1.
Separator used, by default, comma.
Whether to exclude an index (the first column) or not.
Whether to drop the "class" column if there is one. (assumes it is the last one)
cap the opened data to 1000 rows. If the original data has more rows, sample 1000 without replacement
A 2-D Array of Double containing the values from the csv. (row-oriented)
Encapsulate a few preprocessing steps (open a CSV file, compute the rank index structure).