function to use to prepare the dataset for modeling eg - do data balancing or dropping based on the labels
function to use to prepare the dataset for modeling eg - do data balancing or dropping based on the labels
first column must be the label as a double
Training set test set
Fraction of data to reserve for test Default is 0.1
Fraction of data to reserve for test Default is 0.1
Seed for data splitting
Seed for data splitting
Function to use to create the training set and test set.
Function to use to create the training set and test set.
(dataTrain, dataTest)
Instance that will make a holdout set and prepare the data for multiclass modeling Creates instance that will split data into training and test set filtering out any labels that don't meet the minimum fraction cutoff or fall in the top N labels specified.