NN
NN
Options: Activation.activation --Tanh --TanhWithDropout --Rectifier --RectifierWithDropout --Maxout --MaxoutWithDropout --ExpRectifier --ExpRectifierWithDropout
NN: Adaptive Rate
NN: Adaptive Rate
XGB
XGB
Options: Backend.backend --auto --gpu --cpu
XGB
XGB
Options: Booster.booster --gbtree --gblinear --dart
XGB
XGB
Options: DartNormalizeType.dartNormalizeType --tree --forest
GLM
GLM
Options: Family.family
--Binary Classification: binomial
--Multi-class Classification: multinomial
--Regression: gaussian, quasibinomial, ordinal, poisson, gamma, tweedie,
SVM: Gradient Options: Hinge, LeastSquares, Logistic
XGB
XGB
Options: --depthwise --lossguide
GLRM Initialization
GLRM Initialization
Options: GlrmInitialization.initialization --PlusPlus --Random --SVD --Power
KMM Initialization
KMM Initialization
Options: KMeansInitialization.initialization --Random --PlusPlus --Furthest --User
NN: Initialization
NN: Initialization
Options: --UniformAdaptive: optimized initialization based on the size of the network; --Uniform: zero mean with a parameterized interval (-initialWeightScale, initialWeightScale) --Normal: zero mean with a parameterized standard deviation N(0, initialWeightScale^2)
NN: Initialization
NN: Initialization
Options: --UniformAdaptive: optimized initialization based on the size of the network; --Uniform: zero mean with a parameterized interval (-initialWeightScale, initialWeightScale) --Normal: zero mean with a parameterized standard deviation N(0, initialWeightScale^2)
NN: loss
NN: loss
Options: Loss.loss --Automatic --Classification: Quadratic, ModifiedHuber, CrossEntropy --Regression: Absolute, Quadratic, Huber, Quantile
GLRM: Loss
GLRM: Loss
Options: GlrmLoss.glrmloss --Numeric features ----Quadratic ----Absolute ----Huber ----Poisson ----Periodic --Binary features ----Logistic ----Hinge --Multinomial features ----Categorical ----Ordinal
GLRM: Loss
GLRM: Loss
Options: GlrmLoss.glrmloss --Numeric features ----Quadratic ----Absolute ----Huber ----Poisson ----Periodic --Binary features ----Logistic ----Hinge --Multinomial features ----Categorical ----Ordinal
W2V: NormModel
W2V: NormModel
Options: NormModel.normModel --HSM
NN: learning rate annealing, only active is disable adaptiveRate learning rate annealing The annealing rate is the inverse of the number of training samples for halving the learning rate
NN: learning rate annealing, only active is disable adaptiveRate learning rate annealing The annealing rate is the inverse of the number of training samples for halving the learning rate
GLRM: regularizer on X and Y
GLRM: regularizer on X and Y
Options: GlrmRegularizer.glrmRegularizer
--None
--Quadratic
--L2
--L1
--NonNegative
--OneSparse
--UnitOneSparse
--Simplex
NN: score validation sampling method
NN: score validation sampling method
Options: ClassSamplingMethod.classSamplingMethod --Uniform --Stratified
GLM GLM Solver
GLM GLM Solver
Options: Solver.solver --IRLSM: Iteratively Reweighted Least Squares Method --L_BFGS: Limited-memory Broyden-Fletcher-Goldfarb-Shanno algorithm --COORDINATE_DESCENT: Coordinate Decent --COORDINATE_DESCENT_NAIVE: Coordinate Decent Naive --AUTO: Sets the solver based on given data and parameters (default) --GRADIENT_DESCENT_LH: Gradient Descent Likelihood (available for Ordinal family only; default for Ordinal family) --GRADIENT_DESCENT_SQERR: Gradient Descent Squared Error (available for Ordinal family only) Guidelines: --L_BFGS works much better for L2-only multininomial and if you have too many active predictors. --You must use IRLSM if you have p-values. --IRLSM and COORDINATE_DESCENT share the same path (i.e., they both compute the same gram matrix), they just solve it differently. --Use COORDINATE_DESCENT if you have less than 5000 predictors and L1 penalty. --COORDINATE_DESCENT performs better when lambda_search is enabled. Also with bounds, it tends to get a higher accuracy. --Use GRADIENT_DESCENT_LH or GRADIENT_DESCENT_SQERR when family=ordinal. With GRADIENT_DESCENT_LH, the model parameters are adjusted by minimizing the loss function; with GRADIENT_DESCENT_SQERR, the model parameters are adjusted using the loss function.
stopping metric: {AUTO, deviance, logloss, MSE, RMSE,MAE,RMSLE, AUC, lift_top_group, misclassification, mean_per_class_error, custom, r2} Stopping Metric mean_per_class_error --> average recall
stopping metric: {AUTO, deviance, logloss, MSE, RMSE,MAE,RMSLE, AUC, lift_top_group, misclassification, mean_per_class_error, custom, r2} Stopping Metric mean_per_class_error --> average recall
NN: number of training samples per iteration (if using N nodes, each node will get 1/N samples)
NN: number of training samples per iteration (if using N nodes, each node will get 1/N samples)
Options: -- 0: one epoch per iteration, -- -1: the maximum amount of data per iteration (if **replicate training data** is enabled, N epochs will be trained per iteration on N nodes, otherwise one epoch). -- -2: automatic mode (auto-tuning)
XGB
XGB
Options: --auto --exact --approx --hist
SVM: updater Options: L1, L2, Simple
W2V: WordModel
W2V: WordModel
Options: WordModel.wordModel --SkipGram
finally, assign values to those hyperparameters