Class PCA
- java.lang.Object
-
- org.nd4j.linalg.dimensionalityreduction.PCA
-
public class PCA extends Object
-
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description INDArray
convertBackToFeatures(INDArray data)
Take the data that has been transformed to the principal components about the mean and transform it back into the original feature set.INDArray
convertToComponents(INDArray data)
Takes a set of data on each row, with the same number of features as the constructing data and returns the data in the coordinates of the basis set about the mean.static INDArray[]
covarianceMatrix(INDArray in)
Returns the covariance matrix of a data set of many records, each with N features.double
estimateVariance(INDArray data, int ndims)
Estimate the variance of a single record with reduced # of dimensions.INDArray
generateGaussianSamples(long count)
Generates a set of count random samples with the same variance and mean and eigenvector/values as the data set used to initialize the PCA object, with same number of features N.INDArray
getCovarianceMatrix()
INDArray
getEigenvalues()
INDArray
getEigenvectors()
INDArray
getMean()
static INDArray
pca(INDArray A, double variance, boolean normalize)
Calculates pca reduced value of a matrix, for a given variance.static INDArray
pca(INDArray A, int nDims, boolean normalize)
Calculates pca vectors of a matrix, for a flags number of reduced features returns the reduced feature set The return is a projection of A onto principal nDims components To use the PCA: assume A is the original feature set then project A onto a reduced set of features.static INDArray
pca_factor(INDArray A, double variance, boolean normalize)
Calculates pca vectors of a matrix, for a given variance.static INDArray
pca_factor(INDArray A, int nDims, boolean normalize)
Calculates pca factors of a matrix, for a flags number of reduced features returns the factors to scale observations The return is a factor matrix to reduce (normalized) feature setsstatic INDArray
pca2(INDArray in, double variance)
This method performs a dimensionality reduction, including principal components that cover a fraction of the total variance of the system.static INDArray[]
principalComponents(INDArray cov)
Calculates the principal component vectors and their eigenvalues (lambda) for the covariance matrix.INDArray
reducedBasis(double variance)
Return a reduced basis set that covers a certain fraction of the variance of the data
-
-
-
Constructor Detail
-
PCA
public PCA(INDArray dataset)
Create a PCA instance with calculated data: covariance, mean, eigenvectors, and eigenvalues.- Parameters:
dataset
- The set of data (records) of features, each row is a data record and each column is a feature, every data record has the same number of features.
-
-
Method Detail
-
reducedBasis
public INDArray reducedBasis(double variance)
Return a reduced basis set that covers a certain fraction of the variance of the data- Parameters:
variance
- The desired fractional variance (0 to 1), it will always be greater than the value.- Returns:
- The basis vectors as columns, size N rows by ndims columns, where ndims is less than or equal to N
-
convertToComponents
public INDArray convertToComponents(INDArray data)
Takes a set of data on each row, with the same number of features as the constructing data and returns the data in the coordinates of the basis set about the mean.- Parameters:
data
- Data of the same features used to construct the PCA object- Returns:
- The record in terms of the principal component vectors, you can set unused ones to zero.
-
convertBackToFeatures
public INDArray convertBackToFeatures(INDArray data)
Take the data that has been transformed to the principal components about the mean and transform it back into the original feature set. Make sure to fill in zeroes in columns where components were dropped!- Parameters:
data
- Data of the same features used to construct the PCA object but as the components- Returns:
- The records in terms of the original features
-
estimateVariance
public double estimateVariance(INDArray data, int ndims)
Estimate the variance of a single record with reduced # of dimensions.- Parameters:
data
- A single record with the same N features as the constructing data setndims
- The number of dimensions to include in calculation- Returns:
- The fraction (0 to 1) of the total variance covered by the ndims basis set.
-
generateGaussianSamples
public INDArray generateGaussianSamples(long count)
Generates a set of count random samples with the same variance and mean and eigenvector/values as the data set used to initialize the PCA object, with same number of features N.- Parameters:
count
- The number of samples to generate- Returns:
- A matrix of size count rows by N columns
-
pca
public static INDArray pca(INDArray A, int nDims, boolean normalize)
Calculates pca vectors of a matrix, for a flags number of reduced features returns the reduced feature set The return is a projection of A onto principal nDims components To use the PCA: assume A is the original feature set then project A onto a reduced set of features. It is possible to reconstruct the original data ( losing information, but having the same dimensionality )INDArray Areduced = A.mmul( factor ) ; INDArray Aoriginal = Areduced.mmul( factor.transpose() ) ;
- Parameters:
A
- the array of features, rows are results, columns are features - will be changednDims
- the number of components on which to project the featuresnormalize
- whether to normalize (adjust each feature to have zero mean)- Returns:
- the reduced parameters of A
-
pca_factor
public static INDArray pca_factor(INDArray A, int nDims, boolean normalize)
Calculates pca factors of a matrix, for a flags number of reduced features returns the factors to scale observations The return is a factor matrix to reduce (normalized) feature sets- Parameters:
A
- the array of features, rows are results, columns are features - will be changednDims
- the number of components on which to project the featuresnormalize
- whether to normalize (adjust each feature to have zero mean)- Returns:
- the reduced feature set
- See Also:
pca(org.nd4j.linalg.api.ndarray.INDArray,int,boolean)
-
pca
public static INDArray pca(INDArray A, double variance, boolean normalize)
Calculates pca reduced value of a matrix, for a given variance. A larger variance (99%) will result in a higher order feature set. The returned matrix is a projection of A onto principal components- Parameters:
A
- the array of features, rows are results, columns are features - will be changedvariance
- the amount of variance to preserve as a float 0 - 1normalize
- whether to normalize (set features to have zero mean)- Returns:
- the matrix representing a reduced feature set
- See Also:
pca(org.nd4j.linalg.api.ndarray.INDArray,int,boolean)
-
pca_factor
public static INDArray pca_factor(INDArray A, double variance, boolean normalize)
Calculates pca vectors of a matrix, for a given variance. A larger variance (99%) will result in a higher order feature set. To use the returned factor: multiply feature(s) by the factor to get a reduced dimension INDArray Areduced = A.mmul( factor ) ; The array Areduced is a projection of A onto principal components- Parameters:
A
- the array of features, rows are results, columns are features - will be changedvariance
- the amount of variance to preserve as a float 0 - 1normalize
- whether to normalize (set features to have zero mean)- Returns:
- the matrix to mulitiply a feature by to get a reduced feature set
- See Also:
pca(org.nd4j.linalg.api.ndarray.INDArray,double,boolean)
-
pca2
public static INDArray pca2(INDArray in, double variance)
This method performs a dimensionality reduction, including principal components that cover a fraction of the total variance of the system. It does all calculations about the mean.- Parameters:
in
- A matrix of datapoints as rows, where column are features with fixed number Nvariance
- The desired fraction of the total variance required- Returns:
- The reduced basis set
-
covarianceMatrix
public static INDArray[] covarianceMatrix(INDArray in)
Returns the covariance matrix of a data set of many records, each with N features. It also returns the average values, which are usually going to be important since in this version, all modes are centered around the mean. It's a matrix that has elements that are expressed as average dx_i * dx_j (used in procedure) or average x_i * x_j - average x_i * average x_j- Parameters:
in
- A matrix of vectors of fixed length N (N features) on each row- Returns:
- INDArray[2], an N x N covariance matrix is element 0, and the average values is element 1.
-
principalComponents
public static INDArray[] principalComponents(INDArray cov)
Calculates the principal component vectors and their eigenvalues (lambda) for the covariance matrix. The result includes two things: the eigenvectors (modes) as result[0] and the eigenvalues (lambda) as result[1].- Parameters:
cov
- The covariance matrix (calculated with the covarianceMatrix(in) method)- Returns:
- Array INDArray[2] "result". The principal component vectors in decreasing flexibility are the columns of element 0 and the eigenvalues are element 1.
-
getCovarianceMatrix
public INDArray getCovarianceMatrix()
-
getMean
public INDArray getMean()
-
getEigenvectors
public INDArray getEigenvectors()
-
getEigenvalues
public INDArray getEigenvalues()
-
-