public class GaussianProcessRegression<T> extends java.lang.Object implements Regression<T>
A Gaussian process can be used as a prior probability distribution over functions in Bayesian inference. Given any set of N points in the desired domain of your functions, take a multivariate Gaussian whose covariance matrix parameter is the Gram matrix of N points with some desired kernel, and sample from that Gaussian. Inference of continuous values with a Gaussian process prior is known as Gaussian process regression.
The fitting is performed in the reproducing kernel Hilbert space with the "kernel trick". The loss function is squared-error. This also arises as the kriging estimate of a Gaussian random field in spatial statistics.
A significant problem with Gaussian process prediction is that it typically scales as O(n3). For large problems (e.g. n > 10,000) both storing the Gram matrix and solving the associated linear systems are prohibitive on modern workstations. An extensive range of proposals have been suggested to deal with this problem. A popular approach is the reduced-rank Approximations of the Gram Matrix, known as Nystrom approximation. Subset of Regressors (SR) is another popular approach that uses an active set of training samples of size m selected from the training set of size n > m. We assume that it is impossible to search for the optimal subset of size m due to combinatorics. The samples in the active set could be selected randomly, but in general we might expect better performance if the samples are selected greedily w.r.t. some criterion. Recently, researchers had proposed relaxing the constraint that the inducing variables must be a subset of training/test cases, turning the discrete selection problem into one of continuous optimization.
Experimental evidence suggests that for large m the SR and Nystrom methods have similar performance, but for small m the Nystrom method can be quite poor. Also embarrassments can occur like the approximated predictive variance being negative. For these reasons we do not recommend the Nystrom method over the SR method.
Modifier and Type | Class and Description |
---|---|
class |
GaussianProcessRegression.JointPrediction
The joint prediction of multiple data points.
|
Regression.Metric
Modifier and Type | Field and Description |
---|---|
smile.math.kernel.MercerKernel<T> |
kernel
The covariance/kernel function.
|
double |
L
The log marginal likelihood, which may be not available (NaN) when the model
is fit with approximate methods.
|
double |
mean
The mean of responsible variable.
|
double |
noise
The variance of noise.
|
T[] |
regressors
The regressors.
|
double |
sd
The standard deviation of responsible variable.
|
double[] |
w
The linear weights.
|
Constructor and Description |
---|
GaussianProcessRegression(smile.math.kernel.MercerKernel<T> kernel,
T[] regressors,
double[] weight,
double noise)
Constructor.
|
GaussianProcessRegression(smile.math.kernel.MercerKernel<T> kernel,
T[] regressors,
double[] weight,
double noise,
double mean,
double sd)
Constructor.
|
GaussianProcessRegression(smile.math.kernel.MercerKernel<T> kernel,
T[] regressors,
double[] weight,
double noise,
double mean,
double sd,
smile.math.matrix.Matrix.Cholesky cholesky,
double L)
Constructor.
|
Modifier and Type | Method and Description |
---|---|
static <T> GaussianProcessRegression<T> |
fit(T[] x,
double[] y,
smile.math.kernel.MercerKernel<T> kernel,
double noise)
Fits a regular Gaussian process model by the method of subset of regressors.
|
static <T> GaussianProcessRegression<T> |
fit(T[] x,
double[] y,
smile.math.kernel.MercerKernel<T> kernel,
double noise,
boolean normalize,
double tol,
int maxIter)
Fits a regular Gaussian process model.
|
static <T> GaussianProcessRegression<T> |
fit(T[] x,
double[] y,
smile.math.kernel.MercerKernel<T> kernel,
java.util.Properties prop)
Fits a regular Gaussian process model.
|
static <T> GaussianProcessRegression<T> |
fit(T[] x,
double[] y,
T[] t,
smile.math.kernel.MercerKernel<T> kernel,
double noise)
Fits an approximate Gaussian process model by the method of subset of regressors.
|
static <T> GaussianProcessRegression<T> |
fit(T[] x,
double[] y,
T[] t,
smile.math.kernel.MercerKernel<T> kernel,
double noise,
boolean normalize)
Fits an approximate Gaussian process model by the method of subset of regressors.
|
static <T> GaussianProcessRegression<T> |
fit(T[] x,
double[] y,
T[] t,
smile.math.kernel.MercerKernel<T> kernel,
java.util.Properties prop)
Fits an approximate Gaussian process model by the method of subset of regressors.
|
static <T> GaussianProcessRegression<T> |
nystrom(T[] x,
double[] y,
T[] t,
smile.math.kernel.MercerKernel<T> kernel,
double noise)
Fits an approximate Gaussian process model with Nystrom approximation of kernel matrix.
|
static <T> GaussianProcessRegression<T> |
nystrom(T[] x,
double[] y,
T[] t,
smile.math.kernel.MercerKernel<T> kernel,
double noise,
boolean normalize)
Fits an approximate Gaussian process model with Nystrom approximation of kernel matrix.
|
static <T> GaussianProcessRegression<T> |
nystrom(T[] x,
double[] y,
T[] t,
smile.math.kernel.MercerKernel<T> kernel,
java.util.Properties prop)
Fits an approximate Gaussian process model with Nystrom approximation of kernel matrix.
|
double |
predict(T x)
Predicts the dependent variable of an instance.
|
double |
predict(T x,
double[] estimation)
Predicts the mean and standard deviation of an instance.
|
GaussianProcessRegression.JointPrediction |
query(T[] samples)
Evaluates the Gaussian Process at some query points.
|
java.lang.String |
toString() |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
applyAsDouble, metric, metric, predict
public final smile.math.kernel.MercerKernel<T> kernel
public final T[] regressors
public final double[] w
public final double mean
public final double sd
public final double noise
public final double L
public GaussianProcessRegression(smile.math.kernel.MercerKernel<T> kernel, T[] regressors, double[] weight, double noise)
kernel
- Kernel function.regressors
- The regressors.weight
- The weights of regressors.noise
- The variance of noise.public GaussianProcessRegression(smile.math.kernel.MercerKernel<T> kernel, T[] regressors, double[] weight, double noise, double mean, double sd)
kernel
- Kernel function.regressors
- The regressors.weight
- The weights of regressors.noise
- The variance of noise.mean
- The mean of responsible variable.sd
- The standard deviation of responsible variable.public GaussianProcessRegression(smile.math.kernel.MercerKernel<T> kernel, T[] regressors, double[] weight, double noise, double mean, double sd, smile.math.matrix.Matrix.Cholesky cholesky, double L)
kernel
- Kernel function.regressors
- The regressors.weight
- The weights of regressors.noise
- The variance of noise.mean
- The mean of responsible variable.sd
- The standard deviation of responsible variable.cholesky
- The Cholesky decomposition of kernel matrix.L
- The log marginal likelihood.public double predict(T x)
Regression
predict
in interface Regression<T>
x
- an instance.public double predict(T x, double[] estimation)
x
- an instance.estimation
- an output array of the estimated mean and standard deviation.public GaussianProcessRegression.JointPrediction query(T[] samples)
samples
- query points.public java.lang.String toString()
toString
in class java.lang.Object
public static <T> GaussianProcessRegression<T> fit(T[] x, double[] y, smile.math.kernel.MercerKernel<T> kernel, java.util.Properties prop)
x
- the training dataset.y
- the response variable.kernel
- the Mercer kernel.prop
- Training algorithm hyper-parameters and properties.public static <T> GaussianProcessRegression<T> fit(T[] x, double[] y, smile.math.kernel.MercerKernel<T> kernel, double noise)
x
- the training dataset.y
- the response variable.kernel
- the Mercer kernel.noise
- the noise variance, which also works as a regularization parameter.public static <T> GaussianProcessRegression<T> fit(T[] x, double[] y, smile.math.kernel.MercerKernel<T> kernel, double noise, boolean normalize, double tol, int maxIter)
x
- the training dataset.y
- the response variable.kernel
- the Mercer kernel.noise
- the noise variance, which also works as a regularization parameter.normalize
- the flag if normalize the response variable.tol
- the stopping tolerance for HPO.maxIter
- the maximum number of iterations for HPO. No HPO if maxIter <= 0.public static <T> GaussianProcessRegression<T> fit(T[] x, double[] y, T[] t, smile.math.kernel.MercerKernel<T> kernel, java.util.Properties prop)
x
- the training dataset.y
- the response variable.kernel
- the Mercer kernel.prop
- Training algorithm hyper-parameters and properties.public static <T> GaussianProcessRegression<T> fit(T[] x, double[] y, T[] t, smile.math.kernel.MercerKernel<T> kernel, double noise)
x
- the training dataset.y
- the response variable.t
- the inducing input, which are pre-selected or inducing samples
acting as active set of regressors. In simple case, these can
be chosen randomly from the training set or as the centers of
k-means clustering.kernel
- the Mercer kernel.noise
- the noise variance, which also works as a regularization parameter.public static <T> GaussianProcessRegression<T> fit(T[] x, double[] y, T[] t, smile.math.kernel.MercerKernel<T> kernel, double noise, boolean normalize)
x
- the training dataset.y
- the response variable.t
- the inducing input, which are pre-selected or inducing samples
acting as active set of regressors. In simple case, these can
be chosen randomly from the training set or as the centers of
k-means clustering.kernel
- the Mercer kernel.noise
- the noise variance, which also works as a regularization parameter.normalize
- the option to normalize the response variable.public static <T> GaussianProcessRegression<T> nystrom(T[] x, double[] y, T[] t, smile.math.kernel.MercerKernel<T> kernel, java.util.Properties prop)
x
- the training dataset.y
- the response variable.kernel
- the Mercer kernel.prop
- Training algorithm hyper-parameters and properties.public static <T> GaussianProcessRegression<T> nystrom(T[] x, double[] y, T[] t, smile.math.kernel.MercerKernel<T> kernel, double noise)
x
- the training dataset.y
- the response variable.t
- the inducing input, which are pre-selected or inducing samples
acting as active set of regressors. In simple case, these can
be chosen randomly from the training set or as the centers of
k-means clustering.kernel
- the Mercer kernel.noise
- the noise variance, which also works as a regularization parameter.public static <T> GaussianProcessRegression<T> nystrom(T[] x, double[] y, T[] t, smile.math.kernel.MercerKernel<T> kernel, double noise, boolean normalize)
x
- the training dataset.y
- the response variable.t
- the inducing input for Nystrom approximation. Commonly, these
can be chosen as the centers of k-means clustering.kernel
- the Mercer kernel.noise
- the noise variance, which also works as a regularization parameter.normalize
- the option to normalize the response variable.