# Keeping prediction intervals positive

Often in ecology our response variables are strictly positive

(e.g. population density). As a result some standard default

proceedures won't work well. For example, here are some data,

```
set.seed(1)
x <- rnorm(10)
y <- exp(-x^2 - 2 + rnorm(10))
df <- data.frame(x, y)
```

Here's a standard ggplot exploration,

```
library(ggplot2)
ggplot(df) + geom_point(aes(x, y)) + geom_smooth(aes(x, y), method = "loess")
```

But wait, that's not right, the prediction interval dips below zero!

The best way to deal with this problem is to use models that account

for bounded data ~~censored responses (so-called left-censored in this case)~~. But if you just want to do some exploration and be quick about it, here's a

simple-minded solution that works well. Start by log-transforming (you might need log + 1) the response and fitting a loess model to the transformed data,

```
m <- loess(log(y) ~ x)
```

Then use this hidden ggplot function for getting prediction

information about the model fit,

```
pdf <- ggplot2:::predictdf(m, seq(min(x), max(x), length = 100), TRUE, 0.95)
```

Then back-transform,

```
pdf <- within(pdf, {
y <- exp(y)
ymin <- exp(ymin)
ymax <- exp(ymax)
})
```

and re-plot,

```
ggplot(pdf) + geom_path(aes(x, y), colour = "blue") + geom_ribbon(aes(x, ymin = ymin,
ymax = ymax), alpha = 0.2) + geom_point(aes(x, y), data = df)
```

That's better!

Censoring is not quite the concept you want — data censoring applies to situations such as: measuring the weight of a sample of individuals, but some individuals max out the scale so you can’t record a weight (but you have recorded the failed measurement)

Here you have data whose range is simply nonnegative, no censoring. For density data, a Poisson regression is more appropriate.

Quite right. Have edited above, thanks.

Just in the interest of not looking like a complete idiot, censoring can come into play in these sorts of cases when the abundance measurements are modeled as continuous with a detection limit (e.g. densities in a large population, like phytoplankton). Another example is measuring the environmental concentrations of contaminants. These are the kinds of things I had in mind when writing the post, but realize that I wasn’t at all clear. Thanks again for the clarification.