public class LinearModel extends java.lang.Object implements OnlineRegression<double[]>, DataFrameRegression
Once a regression model has been constructed, it may be important to confirm the goodness of fit of the model and the statistical significance of the estimated parameters. Commonly used checks of goodness of fit include the R-squared, analysis of the pattern of residuals and hypothesis testing. Statistical significance can be checked by an F-test of the overall fit, followed by t-tests of individual parameters.
Interpretations of these diagnostic tests rest heavily on the model assumptions. Although examination of the residuals can be used to invalidate a model, the results of a t-test or F-test are sometimes more difficult to interpret if the model's assumptions are violated. For example, if the error term does not have a normal distribution, in small samples the estimated parameters will not follow normal distributions and complicate inference. With relatively large samples, however, a central limit theorem can be invoked such that hypothesis testing may proceed using asymptotic approximations.
Regression.Metric
Constructor and Description |
---|
LinearModel(smile.data.formula.Formula formula,
smile.data.type.StructType schema,
smile.math.matrix.Matrix X,
double[] y,
double[] w,
double b)
Constructor.
|
Modifier and Type | Method and Description |
---|---|
double |
adjustedRSquared()
Returns adjusted R2 statistic.
|
double[] |
coefficients()
Returns the linear coefficients (without intercept).
|
int |
df()
Returns the degree-of-freedom of residual standard error.
|
double |
error()
Returns the residual standard error.
|
double[] |
fittedValues()
Returns the fitted values.
|
smile.data.formula.Formula |
formula()
Returns the formula associated with the model.
|
double |
ftest()
Returns the F-statistic of goodness-of-fit.
|
double |
intercept()
Returns the intercept.
|
double[] |
predict(smile.data.DataFrame df)
Predicts the dependent variables of a data frame.
|
double |
predict(double[] x)
Predicts the dependent variable of an instance.
|
double |
predict(smile.data.Tuple x)
Predicts the dependent variable of a tuple instance.
|
double |
pvalue()
Returns the p-value of goodness-of-fit test.
|
double[] |
residuals()
Returns the residuals, that is response minus fitted values.
|
double |
RSquared()
Returns R2 statistic.
|
double |
RSS()
Returns the residual sum of squares.
|
smile.data.type.StructType |
schema()
Returns the schema of predictors.
|
java.lang.String |
toString() |
double[][] |
ttest()
Returns the t-test of the coefficients (including intercept).
|
void |
update(smile.data.DataFrame data)
Online update the regression model with a new data frame.
|
void |
update(double[] x,
double y)
Growing window recursive least squares with lambda = 1.
|
void |
update(double[] x,
double y,
double lambda)
Recursive least squares.
|
void |
update(smile.data.Tuple data)
Online update the regression model with a new training instance.
|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
update
applyAsDouble, metric, metric, predict
public LinearModel(smile.data.formula.Formula formula, smile.data.type.StructType schema, smile.math.matrix.Matrix X, double[] y, double[] w, double b)
formula
- a symbolic description of the model to be fitted.schema
- the schema of input data.X
- the design matrix.y
- the responsible variable.w
- the linear weights.b
- the intercept.public smile.data.formula.Formula formula()
DataFrameRegression
formula
in interface DataFrameRegression
public smile.data.type.StructType schema()
DataFrameRegression
schema
in interface DataFrameRegression
public double[][] ttest()
public double[] coefficients()
public double intercept()
public double[] residuals()
public double[] fittedValues()
public double RSS()
public double error()
public int df()
public double RSquared()
In the case of ordinary least-squares regression, R2 increases as we increase the number of variables in the model (R2 will not decrease). This illustrates a drawback to one possible use of R2, where one might try to include more variables in the model until "there is no more improvement". This leads to the alternative approach of looking at the adjusted R2.
public double adjustedRSquared()
public double ftest()
public double pvalue()
public double predict(double[] x)
Regression
predict
in interface Regression<double[]>
x
- an instance.public double predict(smile.data.Tuple x)
DataFrameRegression
predict
in interface DataFrameRegression
x
- a tuple instance.public double[] predict(smile.data.DataFrame df)
DataFrameRegression
predict
in interface DataFrameRegression
df
- the data frame.public void update(smile.data.Tuple data)
public void update(smile.data.DataFrame data)
public void update(double[] x, double y)
update
in interface OnlineRegression<double[]>
x
- training instance.y
- response variable.public void update(double[] x, double y, double lambda)
x
- training instance.y
- response variable.lambda
- The forgetting factor in (0, 1]. The smaller lambda is,
the smaller is the contribution of previous samples to
the covariance matrix. This makes the filter more
sensitive to recent samples, which means more fluctuations
in the filter coefficients. The lambda = 1 case is referred
to as the growing window RLS algorithm. In practice, lambda
is usually chosen between 0.98 and 1.public java.lang.String toString()
toString
in class java.lang.Object