public class LinearModel extends java.lang.Object implements OnlineRegression<double[]>, DataFrameRegression
Once a regression model has been constructed, it may be important to confirm the goodness of fit of the model and the statistical significance of the estimated parameters. Commonly used checks of goodness of fit include the R-squared, analysis of the pattern of residuals and hypothesis testing. Statistical significance can be checked by an F-test of the overall fit, followed by t-tests of individual parameters.
Interpretations of these diagnostic tests rest heavily on the model assumptions. Although examination of the residuals can be used to invalidate a model, the results of a t-test or F-test are sometimes more difficult to interpret if the model's assumptions are violated. For example, if the error term does not have a normal distribution, in small samples the estimated parameters will not follow normal distributions and complicate inference. With relatively large samples, however, a central limit theorem can be invoked such that hypothesis testing may proceed using asymptotic approximations.
Modifier and Type | Method and Description |
---|---|
double |
adjustedRSquared()
Returns adjusted R2 statistic.
|
double[] |
coefficients()
Returns the linear coefficients (without intercept).
|
int |
df()
Returns the degree-of-freedom of residual standard error.
|
double |
error()
Returns the residual standard error.
|
double[] |
fittedValues()
Returns the fitted values.
|
smile.data.formula.Formula |
formula()
Returns the formula associated with the model.
|
double |
ftest()
Returns the F-statistic of goodness-of-fit.
|
double |
intercept()
Returns the intercept.
|
double[] |
predict(smile.data.DataFrame df)
Predicts the dependent variables of a data frame.
|
double |
predict(double[] x)
Predicts the dependent variable of an instance.
|
double |
predict(smile.data.Tuple x)
Predicts the dependent variable of a tuple instance.
|
double |
pvalue()
Returns the p-value of goodness-of-fit test.
|
double[] |
residuals()
Returns the residuals, that is response minus fitted values.
|
double |
RSquared()
Returns R2 statistic.
|
double |
RSS()
Returns the residual sum of squares.
|
smile.data.type.StructType |
schema()
Returns the design matrix schema.
|
java.lang.String |
toString() |
double[][] |
ttest()
Returns the t-test of the coefficients (including intercept).
|
void |
update(smile.data.DataFrame data)
Online update the regression model with a new data frame.
|
void |
update(double[] x,
double y)
Online update the regression model with a new training instance.
|
void |
update(double[] x,
double y,
double lambda)
Recursive least squares.
|
void |
update(smile.data.Tuple data)
Online update the regression model with a new training instance.
|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
update
applyAsDouble, predict
public smile.data.formula.Formula formula()
DataFrameRegression
formula
in interface DataFrameRegression
public smile.data.type.StructType schema()
DataFrameRegression
schema
in interface DataFrameRegression
public double[][] ttest()
public double[] coefficients()
public double intercept()
public double[] residuals()
public double[] fittedValues()
public double RSS()
public double error()
public int df()
public double RSquared()
In the case of ordinary least-squares regression, R2 increases as we increase the number of variables in the model (R2 will not decrease). This illustrates a drawback to one possible use of R2, where one might try to include more variables in the model until "there is no more improvement". This leads to the alternative approach of looking at the adjusted R2.
public double adjustedRSquared()
public double ftest()
public double pvalue()
public double predict(double[] x)
Regression
predict
in interface Regression<double[]>
x
- an instance.public double predict(smile.data.Tuple x)
DataFrameRegression
predict
in interface DataFrameRegression
x
- a tuple instance.public double[] predict(smile.data.DataFrame df)
DataFrameRegression
predict
in interface DataFrameRegression
df
- the data frame.public void update(smile.data.Tuple data)
public void update(smile.data.DataFrame data)
public void update(double[] x, double y)
OnlineRegression
update
in interface OnlineRegression<double[]>
x
- training instance.y
- response variable.public void update(double[] x, double y, double lambda)
x
- training instance.y
- response variable.lambda
- The forgetting factor in (0, 1]. Values closer to 1 will have
longer memory and values closer to 0 will be have shorter memory.public java.lang.String toString()
toString
in class java.lang.Object