Skip to content

Keeping prediction intervals positive

June 21, 2013

Often in ecology our response variables are strictly positive
(e.g. population density). As a result some standard default
proceedures won't work well. For example, here are some data,

x <- rnorm(10)
y <- exp(-x^2 - 2 + rnorm(10))
df <- data.frame(x, y)

Here's a standard ggplot exploration,

ggplot(df) + geom_point(aes(x, y)) + geom_smooth(aes(x, y), method = "loess")


But wait, that's not right, the prediction interval dips below zero!

The best way to deal with this problem is to use models that account
for bounded data censored responses (so-called left-censored in this case). But if you just want to do some exploration and be quick about it, here's a
simple-minded solution that works well. Start by log-transforming (you might need log + 1) the response and fitting a loess model to the transformed data,

m <- loess(log(y) ~ x)

Then use this hidden ggplot function for getting prediction
information about the model fit,

pdf <- ggplot2:::predictdf(m, seq(min(x), max(x), length = 100), TRUE, 0.95)

Then back-transform,

pdf <- within(pdf, {
    y <- exp(y)
    ymin <- exp(ymin)
    ymax <- exp(ymax)

and re-plot,

ggplot(pdf) + geom_path(aes(x, y), colour = "blue") + geom_ribbon(aes(x, ymin = ymin, 
    ymax = ymax), alpha = 0.2) + geom_point(aes(x, y), data = df)


That's better!

2 Comments leave one →
  1. crowding permalink
    June 25, 2013 3:55 am

    Censoring is not quite the concept you want — data censoring applies to situations such as: measuring the weight of a sample of individuals, but some individuals max out the scale so you can’t record a weight (but you have recorded the failed measurement)

    Here you have data whose range is simply nonnegative, no censoring. For density data, a Poisson regression is more appropriate.

    • June 25, 2013 12:07 pm

      Quite right. Have edited above, thanks.

      Just in the interest of not looking like a complete idiot, censoring can come into play in these sorts of cases when the abundance measurements are modeled as continuous with a detection limit (e.g. densities in a large population, like phytoplankton). Another example is measuring the environmental concentrations of contaminants. These are the kinds of things I had in mind when writing the post, but realize that I wasn’t at all clear. Thanks again for the clarification.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: