Class PCA


  • public class PCA
    extends Object
    • Constructor Summary

      Constructors 
      Constructor Description
      PCA​(INDArray dataset)
      Create a PCA instance with calculated data: covariance, mean, eigenvectors, and eigenvalues.
    • Constructor Detail

      • PCA

        public PCA​(INDArray dataset)
        Create a PCA instance with calculated data: covariance, mean, eigenvectors, and eigenvalues.
        Parameters:
        dataset - The set of data (records) of features, each row is a data record and each column is a feature, every data record has the same number of features.
    • Method Detail

      • reducedBasis

        public INDArray reducedBasis​(double variance)
        Return a reduced basis set that covers a certain fraction of the variance of the data
        Parameters:
        variance - The desired fractional variance (0 to 1), it will always be greater than the value.
        Returns:
        The basis vectors as columns, size N rows by ndims columns, where ndims is less than or equal to N
      • convertToComponents

        public INDArray convertToComponents​(INDArray data)
        Takes a set of data on each row, with the same number of features as the constructing data and returns the data in the coordinates of the basis set about the mean.
        Parameters:
        data - Data of the same features used to construct the PCA object
        Returns:
        The record in terms of the principal component vectors, you can set unused ones to zero.
      • convertBackToFeatures

        public INDArray convertBackToFeatures​(INDArray data)
        Take the data that has been transformed to the principal components about the mean and transform it back into the original feature set. Make sure to fill in zeroes in columns where components were dropped!
        Parameters:
        data - Data of the same features used to construct the PCA object but as the components
        Returns:
        The records in terms of the original features
      • estimateVariance

        public double estimateVariance​(INDArray data,
                                       int ndims)
        Estimate the variance of a single record with reduced # of dimensions.
        Parameters:
        data - A single record with the same N features as the constructing data set
        ndims - The number of dimensions to include in calculation
        Returns:
        The fraction (0 to 1) of the total variance covered by the ndims basis set.
      • generateGaussianSamples

        public INDArray generateGaussianSamples​(long count)
        Generates a set of count random samples with the same variance and mean and eigenvector/values as the data set used to initialize the PCA object, with same number of features N.
        Parameters:
        count - The number of samples to generate
        Returns:
        A matrix of size count rows by N columns
      • pca

        public static INDArray pca​(INDArray A,
                                   int nDims,
                                   boolean normalize)
        Calculates pca vectors of a matrix, for a flags number of reduced features returns the reduced feature set The return is a projection of A onto principal nDims components To use the PCA: assume A is the original feature set then project A onto a reduced set of features. It is possible to reconstruct the original data ( losing information, but having the same dimensionality )
         
        
         INDArray Areduced = A.mmul( factor ) ;
         INDArray Aoriginal = Areduced.mmul( factor.transpose() ) ;
         
         
         
        Parameters:
        A - the array of features, rows are results, columns are features - will be changed
        nDims - the number of components on which to project the features
        normalize - whether to normalize (adjust each feature to have zero mean)
        Returns:
        the reduced parameters of A
      • pca_factor

        public static INDArray pca_factor​(INDArray A,
                                          int nDims,
                                          boolean normalize)
        Calculates pca factors of a matrix, for a flags number of reduced features returns the factors to scale observations The return is a factor matrix to reduce (normalized) feature sets
        Parameters:
        A - the array of features, rows are results, columns are features - will be changed
        nDims - the number of components on which to project the features
        normalize - whether to normalize (adjust each feature to have zero mean)
        Returns:
        the reduced feature set
        See Also:
        pca(org.nd4j.linalg.api.ndarray.INDArray,int,boolean)
      • pca

        public static INDArray pca​(INDArray A,
                                   double variance,
                                   boolean normalize)
        Calculates pca reduced value of a matrix, for a given variance. A larger variance (99%) will result in a higher order feature set. The returned matrix is a projection of A onto principal components
        Parameters:
        A - the array of features, rows are results, columns are features - will be changed
        variance - the amount of variance to preserve as a float 0 - 1
        normalize - whether to normalize (set features to have zero mean)
        Returns:
        the matrix representing a reduced feature set
        See Also:
        pca(org.nd4j.linalg.api.ndarray.INDArray,int,boolean)
      • pca_factor

        public static INDArray pca_factor​(INDArray A,
                                          double variance,
                                          boolean normalize)
        Calculates pca vectors of a matrix, for a given variance. A larger variance (99%) will result in a higher order feature set. To use the returned factor: multiply feature(s) by the factor to get a reduced dimension INDArray Areduced = A.mmul( factor ) ; The array Areduced is a projection of A onto principal components
        Parameters:
        A - the array of features, rows are results, columns are features - will be changed
        variance - the amount of variance to preserve as a float 0 - 1
        normalize - whether to normalize (set features to have zero mean)
        Returns:
        the matrix to mulitiply a feature by to get a reduced feature set
        See Also:
        pca(org.nd4j.linalg.api.ndarray.INDArray,double,boolean)
      • pca2

        public static INDArray pca2​(INDArray in,
                                    double variance)
        This method performs a dimensionality reduction, including principal components that cover a fraction of the total variance of the system. It does all calculations about the mean.
        Parameters:
        in - A matrix of datapoints as rows, where column are features with fixed number N
        variance - The desired fraction of the total variance required
        Returns:
        The reduced basis set
      • covarianceMatrix

        public static INDArray[] covarianceMatrix​(INDArray in)
        Returns the covariance matrix of a data set of many records, each with N features. It also returns the average values, which are usually going to be important since in this version, all modes are centered around the mean. It's a matrix that has elements that are expressed as average dx_i * dx_j (used in procedure) or average x_i * x_j - average x_i * average x_j
        Parameters:
        in - A matrix of vectors of fixed length N (N features) on each row
        Returns:
        INDArray[2], an N x N covariance matrix is element 0, and the average values is element 1.
      • principalComponents

        public static INDArray[] principalComponents​(INDArray cov)
        Calculates the principal component vectors and their eigenvalues (lambda) for the covariance matrix. The result includes two things: the eigenvectors (modes) as result[0] and the eigenvalues (lambda) as result[1].
        Parameters:
        cov - The covariance matrix (calculated with the covarianceMatrix(in) method)
        Returns:
        Array INDArray[2] "result". The principal component vectors in decreasing flexibility are the columns of element 0 and the eigenvalues are element 1.
      • getCovarianceMatrix

        public INDArray getCovarianceMatrix()
      • getEigenvectors

        public INDArray getEigenvectors()
      • getEigenvalues

        public INDArray getEigenvalues()