Package

com.databricks.labs.automl.exploration

tools

Permalink

package tools

Visibility
  1. Public
  2. All

Type Members

  1. case class CorrelationTestResult(covariance: Double, pearsonCoefficient: Double, spearmanCoefficient: Double, kendallsTauCoefficient: Double) extends Product with Serializable

    Permalink
  2. case class DistributionTestPayload(testName: String, distribution: RealDistribution) extends Product with Serializable

    Permalink
  3. case class DistributionTestResult(bestDistributionFit: String, bestDistributionPValue: Double, bestDistributionDStatistic: Double, allTests: Array[DistributionValidationData]) extends Product with Serializable

    Permalink
  4. case class DistributionValidationData(test: String, pValue: Double, dStatistic: Double) extends Product with Serializable

    Permalink
  5. case class KSTestResult(ksTestPvalue: Double, ksTestDStatistic: Double, ksTestEquivalency: Char) extends Product with Serializable

    Permalink
  6. class OneDimStats extends AnyRef

    Permalink

    Package for testing a one-dimensional data set with standard data explanation metrics

    Package for testing a one-dimensional data set with standard data explanation metrics

    Attributes tested: Mean Geometric Mean Variance Semi Variance Standard Deviation Skew Kurtosis

    The return type also provides String classifications based on thresholds for Skew and Kurtosis to classify the distribution.

    Since

    0.7.0

    Note

    Kurtosis types: Mesokurtic - kurtosis around zero Leptokurtic - positive excess kurtosis (long heavy tails) Platykurtic - negative excess kurtosis (short thin tails) Skewness types: Symmetrical -> normal Asymmetricral Positive skewness -> right tailed Asymmetrical Negative skewness -> left tailed

  7. case class OneDimStatsData(mean: Double, geomMean: Double, variance: Double, semiVariance: Double, stddev: Double, skew: Double, kurtosis: Double, kurtosisType: String, skewType: String, summaryStats: SummaryStats, distributionData: DistributionTestResult) extends Product with Serializable

    Permalink
  8. case class PCACEigenResult(column: String, PCA1EigenVector: Double, PCA2EigenVector: Double) extends Product with Serializable

    Permalink
  9. class PCAReducer extends SparkSessionWrapper

    Permalink

    API wrapper for conducting a 2-component PCA for visualizing a data set's feature relationships in a way that can be readily visualized.

    API wrapper for conducting a 2-component PCA for visualizing a data set's feature relationships in a way that can be readily visualized. Provides DataFrame export types for both the raw data with PC1 and PC2 values, as well as the eigen vector values.

  10. case class PCAReducerResult(data: DataFrame, explainedVariances: Array[Double], pcMatrix: Array[PCACEigenResult], pcEigenDataFrame: DataFrame) extends Product with Serializable

    Permalink
  11. case class PairedSeq(left: SummaryStatistics, right: SummaryStatistics) extends Product with Serializable

    Permalink
  12. case class PairedTestResult(correlationTestData: CorrelationTestResult, tTestData: TTestData, kolmogorovSmirnovData: KSTestResult) extends Product with Serializable

    Permalink
  13. class PairedTesting extends AnyRef

    Permalink
  14. case class PolynomialRegressorResult(order: Int, function: PolynomialFunction, residualSumSquares: Double, sumSquareError: Double, totalSumSquares: Double, mse: Double, rmse: Double, r2: Double) extends Product with Serializable

    Permalink
  15. case class RegressionBarData(xBar: Double, yBar: Double, xyBar: Double) extends Product with Serializable

    Permalink
  16. case class RegressionCoefficients(slope: Double, intercept: Double, t1: Double, t2: Double, t3: Double) extends Product with Serializable

    Permalink
  17. case class RegressionInternal(sumX: Double, sumY: Double, sumSqX: Double, sumSqY: Double, sumProduct: Double) extends Product with Serializable

    Permalink
  18. case class RegressionResidualData(ssr: Double, rss: Double) extends Product with Serializable

    Permalink
  19. trait ShapiroBase extends AnyRef

    Permalink
  20. case class ShapiroInternalData(w: Double, z: Double, probability: Double, normalcyTest: Boolean, normalcy: String) extends Product with Serializable

    Permalink
  21. case class ShapiroScoreData(w: Double, z: Double, probability: Double) extends Product with Serializable

    Permalink
  22. case class SimpleRegressorResult(slope: Double, slopeStdErr: Double, slopeConfidenceInterval: Double, intercept: Double, interceptStdErr: Double, rSquared: Double, significance: Double, mse: Double, rmse: Double, sumSquares: Double, totalSumSquares: Double, sumSquareError: Double, pairLength: Long, pearsonR: Double, crossProductSum: Double) extends Product with Serializable

    Permalink
  23. case class SummaryStats(count: Long, min: Double, max: Double, sum: Double, mean: Double, geometricMean: Double, variance: Double, popVariance: Double, secondMoment: Double, sumOfSquares: Double, stdDeviation: Double, sumOfLogs: Double) extends Product with Serializable

    Permalink
  24. case class TTestData(alpha: Double, tStat: Double, tTestSignificance: Boolean, tTestPValue: Double, equivalencyJudgement: Char) extends Product with Serializable

    Permalink

Value Members

  1. object AnovaTest

    Permalink
  2. object OneDimStats

    Permalink

    Companion Object

  3. object PCAReducer extends Serializable

    Permalink
  4. object PairedTesting

    Permalink

    Companion Object for Paired Testing

  5. object PolynomialRegressor

    Permalink
  6. object ShapiroWilk extends ShapiroBase

    Permalink

    Shapiro-Wilk test for normality.

    Shapiro-Wilk test for normality.

    Note

    the algorithm below is restricted to a maximum of 5000 elements.

  7. object SimpleRegressor

    Permalink

Ungrouped