Calculating the PRESS statistic in R
June 18, 2013
Engineers tend to use a version of the residual sum of squares (RSS) called PRESS, for predictive RSS. The idea is that RSS describes how well a linear model fits the data to which it was fitted, but PRESS tells you how well the model will predict new data.
Let's make up some data and fit a model,
n <- 10
x <- rnorm(n)
y <- rnorm(n)
(m <- lm(y ~ x))
##
## Call:
## lm(formula = y ~ x)
##
## Coefficients:
## (Intercept) x
## -1.037 -0.551
Here are the residuals,
(r <- resid(m))
## 1 2 3 4 5 6 7 8
## -0.51730 -0.19795 0.61276 -1.76696 1.45084 0.06266 1.01057 -0.29810
## 9 10
## -0.35962 0.00310
And here are the predictively adjusted residuals,
(pr <- resid(m)/(1 - lm.influence(m)$hat))
## 1 2 3 4 5 6 7
## -0.742199 -0.220755 0.689584 -2.066720 1.614070 0.115833 1.231767
## 8 9 10
## -0.337332 -0.530903 0.003685
There is some theoretical magic that makes this equal to the cross-validated residuals. So the regular RSS is,
sum(r^2)
## [1] 7.153
and the PRESS is,
sum(pr^2)
## [1] 9.878
which is bigger because predicting is harder than fitting.
5 Comments
leave one →
How to calculate manually?
Yes, as Usman writes, it would be great to see a simple example of how to calculate Predicted R-squared manually, using a simple data set of a maybe 5 Xs and 5 Ys.