# Calculating the PRESS statistic in R

June 18, 2013

Engineers tend to use a version of the residual sum of squares (RSS) called PRESS, for predictive RSS. The idea is that RSS describes how well a linear model fits the data to which it was fitted, but PRESS tells you how well the model will predict new data.

Let's make up some data and fit a model,

```
n <- 10
x <- rnorm(n)
y <- rnorm(n)
(m <- lm(y ~ x))
```

```
##
## Call:
## lm(formula = y ~ x)
##
## Coefficients:
## (Intercept) x
## -1.037 -0.551
```

Here are the residuals,

```
(r <- resid(m))
```

```
## 1 2 3 4 5 6 7 8
## -0.51730 -0.19795 0.61276 -1.76696 1.45084 0.06266 1.01057 -0.29810
## 9 10
## -0.35962 0.00310
```

And here are the predictively adjusted residuals,

```
(pr <- resid(m)/(1 - lm.influence(m)$hat))
```

```
## 1 2 3 4 5 6 7
## -0.742199 -0.220755 0.689584 -2.066720 1.614070 0.115833 1.231767
## 8 9 10
## -0.337332 -0.530903 0.003685
```

There is some theoretical magic that makes this equal to the cross-validated residuals. So the regular RSS is,

```
sum(r^2)
```

```
## [1] 7.153
```

and the PRESS is,

```
sum(pr^2)
```

```
## [1] 9.878
```

which is bigger because predicting is harder than fitting.

Advertisements

5 Comments
leave one →

How to calculate manually?

Yes, as Usman writes, it would be great to see a simple example of how to calculate Predicted R-squared manually, using a simple data set of a maybe 5 Xs and 5 Ys.