Packages

  • package root

    Smile (Statistical Machine Intelligence and Learning Engine) is a fast and comprehensive machine learning, NLP, linear algebra, graph, interpolation, and visualization system in Java and Scala.

    Smile (Statistical Machine Intelligence and Learning Engine) is a fast and comprehensive machine learning, NLP, linear algebra, graph, interpolation, and visualization system in Java and Scala. With advanced data structures and algorithms, Smile delivers state-of-art performance.

    Smile covers every aspect of machine learning, including classification, regression, clustering, association rule mining, feature selection, manifold learning, multidimensional scaling, genetic algorithms, missing value imputation, efficient nearest neighbor search, etc.

    Definition Classes
    root
  • package smile
    Definition Classes
    root
  • package data

    Data manipulation functions.

    Data manipulation functions.

    Definition Classes
    smile
  • package math

    Mathematical and statistical functions.

    Mathematical and statistical functions.

    Definition Classes
    smile
  • package plot
    Definition Classes
    smile
  • package regression

    Regression analysis.

    Regression analysis. Regression analysis includes any techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables. Most commonly, regression analysis estimates the conditional expectation of the dependent variable given the independent variables. Therefore, the estimation target is a function of the independent variables called the regression function. Regression analysis is widely used for prediction and forecasting.

    Definition Classes
    smile
  • package util

    Utility functions.

    Utility functions.

    Definition Classes
    smile
  • package validation

    Model validation.

    Model validation.

    Definition Classes
    smile
  • bootstrap
  • cv
  • loocv
p

smile

validation

package validation

Model validation.

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. validation
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. Protected

Value Members

  1. def accuracy(truth: Array[Int], prediction: Array[Int]): Double

    The accuracy is the proportion of true results (both true positives and true negatives) in the population.

  2. def adjustedRandIndex(y1: Array[Int], y2: Array[Int]): Double

    Adjusted Rand Index.

    Adjusted Rand Index. Adjusted Rand Index assumes the generalized hyper-geometric distribution as the model of randomness. The adjusted Rand index has the maximum value 1, and its expected value is 0 in the case of random clusters. A larger adjusted Rand index means a higher agreement between two partitions. The adjusted Rand index is recommended for measuring agreement even when the partitions compared have different numbers of clusters.

  3. def auc(truth: Array[Int], probability: Array[Double]): Double

    The area under the curve (AUC).

    The area under the curve (AUC). When using normalized units, the area under the curve is equal to the probability that a classifier will rank a randomly chosen positive instance higher than a randomly chosen negative one (assuming 'positive' ranks higher than 'negative').

  4. def confusion(truth: Array[Int], prediction: Array[Int]): ConfusionMatrix

    Computes the confusion matrix.

  5. def f1(truth: Array[Int], prediction: Array[Int]): Double

    The F-score (or F-measure) considers both the precision and the recall of the test to compute the score.

    The F-score (or F-measure) considers both the precision and the recall of the test to compute the score. The precision p is the number of correct positive results divided by the number of all positive results, and the recall r is the number of correct positive results divided by the number of positive results that should have been returned.

    The traditional or balanced F-score (F1 score) is the harmonic mean of precision and recall, where an F1 score reaches its best value at 1 and worst at 0.

  6. def fallout(truth: Array[Int], prediction: Array[Int]): Double

    Fall-out, false alarm rate, or false positive rate (FPR).

    Fall-out, false alarm rate, or false positive rate (FPR). Fall-out is actually Type I error and closely related to specificity (1 - specificity).

  7. def fdr(truth: Array[Int], prediction: Array[Int]): Double

    The false discovery rate (FDR) is ratio of false positives to combined true and false positives, which is actually 1 - precision.

  8. def mad(truth: Array[Double], prediction: Array[Double]): Double

    Mean absolute deviation error.

  9. def mcc(truth: Array[Int], prediction: Array[Int]): Double

    MCC is a correlation coefficient between prediction and actual values.

    MCC is a correlation coefficient between prediction and actual values. It is considered as a balanced measure for binary classification, even in unbalanced data sets. It varies between -1 and +1. 1 when there is perfect agreement between ground truth and prediction, -1 when there is a perfect disagreement between ground truth and predictions. MCC of 0 means the model is not better then random.

  10. def mse(truth: Array[Double], prediction: Array[Double]): Double

    Mean squared error.

  11. def nmi(y1: Array[Int], y2: Array[Int]): Double

    Normalized mutual information (normalized by max(H(y1), H(y2)) between two clusterings.

  12. def precision(truth: Array[Int], prediction: Array[Int]): Double

    The precision or positive predictive value (PPV) is ratio of true positives to combined true and false positives, which is different from sensitivity.

  13. def randIndex(y1: Array[Int], y2: Array[Int]): Double

    Rand index is defined as the number of pairs of objects that are either in the same group or in different groups in both partitions divided by the total number of pairs of objects.

    Rand index is defined as the number of pairs of objects that are either in the same group or in different groups in both partitions divided by the total number of pairs of objects. The Rand index lies between 0 and 1. When two partitions agree perfectly, the Rand index achieves the maximum value 1. A problem with Rand index is that the expected value of the Rand index between two random partitions is not a constant. This problem is corrected by the adjusted Rand index.

  14. def recall(truth: Array[Int], prediction: Array[Int]): Double

    In information retrieval area, sensitivity is called recall.

  15. def rmse(truth: Array[Double], prediction: Array[Double]): Double

    Root mean squared error.

  16. def rss(truth: Array[Double], prediction: Array[Double]): Double

    Residual sum of squares.

  17. def sensitivity(truth: Array[Int], prediction: Array[Int]): Double

    Sensitivity or true positive rate (TPR) (also called hit rate, recall) is a statistical measures of the performance of a binary classification test.

    Sensitivity or true positive rate (TPR) (also called hit rate, recall) is a statistical measures of the performance of a binary classification test. Sensitivity is the proportion of actual positives which are correctly identified as such.

  18. def specificity(truth: Array[Int], prediction: Array[Int]): Double

    Specificity or True Negative Rate is a statistical measures of the performance of a binary classification test.

    Specificity or True Negative Rate is a statistical measures of the performance of a binary classification test. Specificity measures the proportion of negatives which are correctly identified.

  19. def test[C <: DataFrameClassifier](formula: Formula, train: DataFrame, test: DataFrame)(trainer: (Formula, DataFrame) => C): C

    Test a generic classifier.

    Test a generic classifier. The accuracy will be measured and printed out on standard output.

    train

    training data.

    test

    test data.

    trainer

    a code block to return a classifier trained on the given data.

    returns

    the trained classifier.

  20. def test[T, C <: Classifier[T]](x: Array[T], y: Array[Int], testx: Array[T], testy: Array[Int])(trainer: (Array[T], Array[Int]) => C): C

    Test a generic classifier.

    Test a generic classifier. The accuracy will be measured and printed out on standard output.

    T

    the type of training and test data.

    x

    training data.

    y

    training labels.

    testx

    test data.

    testy

    test data labels.

    trainer

    a code block to return a classifier trained on the given data.

    returns

    the trained classifier.

  21. def test2[C <: DataFrameClassifier](formula: Formula, train: DataFrame, test: DataFrame)(trainer: (Formula, DataFrame) => C): C

    Test a binary classifier.

    Test a binary classifier. The accuracy, sensitivity, specificity, precision, F-1 score, F-2 score, and F-0.5 score will be measured and printed out on standard output.

    train

    training data.

    test

    test data.

    trainer

    a code block to return a classifier trained on the given data.

    returns

    the trained classifier.

  22. def test2[T, C <: Classifier[T]](x: Array[T], y: Array[Int], testx: Array[T], testy: Array[Int])(trainer: (Array[T], Array[Int]) => C): C

    Test a binary classifier.

    Test a binary classifier. The accuracy, sensitivity, specificity, precision, F-1 score, F-2 score, and F-0.5 score will be measured and printed out on standard output.

    T

    the type of training and test data.

    x

    training data.

    y

    training labels.

    testx

    test data.

    testy

    test data labels.

    trainer

    a code block to return a binary classifier trained on the given data.

    returns

    the trained classifier.

  23. def test2soft[C <: SoftClassifier[Tuple]](formula: Formula, train: DataFrame, test: DataFrame)(trainer: (Formula, DataFrame) => C): C

    Test a binary soft classifier.

    Test a binary soft classifier. The accuracy, sensitivity, specificity, precision, F-1 score, F-2 score, F-0.5 score, and AUC will be measured and printed out on standard output.

    train

    training data.

    test

    test data.

    trainer

    a code block to return a binary classifier trained on the given data.

    returns

    the trained classifier.

  24. def test2soft[T, C <: SoftClassifier[T]](x: Array[T], y: Array[Int], testx: Array[T], testy: Array[Int])(trainer: (Array[T], Array[Int]) => C): C

    Test a binary soft classifier.

    Test a binary soft classifier. The accuracy, sensitivity, specificity, precision, F-1 score, F-2 score, F-0.5 score, and AUC will be measured and printed out on standard output.

    T

    the type of training and test data.

    x

    training data.

    y

    training labels.

    testx

    test data.

    testy

    test data labels.

    trainer

    a code block to return a binary classifier trained on the given data.

    returns

    the trained classifier.

  25. object bootstrap
  26. object cv
  27. object loocv

Inherited from AnyRef

Inherited from Any

Ungrouped