Skip to content

Whoops…I did it again…

October 19, 2013

…and forgot that R-squared statistics are weird in R when the intercept is removed. There’s so much about this issue, so you really don’t have to read on. This post really is me just underlining one of my common silly mistakes so that I don’t ever redo it. This FAQ says it all. I understand these arguments, but the problem for me is that in interactive mode I often centre the response variable before sending it to lm and then automatically add the -1 to the formula. But in my head I know there’s still an intercept. For some reason I can remember to mentally subtract a degree of freedom from the ANOVA tables but I always forget about the R-squared. I think the solution is just to never use -1 if I don’t really mean it.

Advertisements
3 Comments leave one →
  1. Brian McGill permalink
    October 19, 2013 9:00 pm

    I find the behavior of R with -1 in the formula highly counter-intuitive. I’d even go so far as to say wrong or at least not useful. I don’t think I’ve ever wanted to test against a null of y=0 when I use -1. I still want a test vs a null of y=ybar

    • October 19, 2013 11:21 pm

      I’m not going to disagree, but to try and understand where they’re coming from…they do have a point that it is at least strange to calculate an R-squared when the null (y = ybar) isn’t nested within the alternative (y = yhat), which is what you get for models without an intercept. Don’t get me wrong, that’s what I want to do too, I’m just noting that its strange. On the other hand, I don’t at all get their problem with negative R-squares. For me, that’s a meaningful thing that says your model is really terrible.

      • October 23, 2013 2:08 am

        Speaking as someone who’s published a paper with negative R^2 values, I certainly agree that they’re meaningful! It means your model fits less well than taking the grand mean of the data. In my case I was comparing predictions of population dynamical models to time series data. The models had been fully parameterized from other data, so there were no free parameters to estimate. I wanted some way to summarize the match between the model predictions and the data, and R^2 seemed like a natural choice. And if it sometimes came out negative, well, as you say that just meant that the model was terrible.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: