# Merging / combining / adding together two formula objects in R

August 6, 2012

Here’s the basic issue of the post. Suppose we have two formula objects with the same response,

```
form1 <- y ~ -1 + a + sin(b)
form2 <- y ~ c + d
```

and we want to have a quick intuitive way to get the formula that adds together all of the terms in each of the two formulas. For example we want to do `form1 + form2`

or `merge(form1, form2)`

, or something like that, to get,

```
y ~ -1 + a + sin(b) + c + d
```

Here’s an example of the benefits. Consider these data:

```
set.seed(1)
> df
> df
y a b c d
1 -0.6264538 0.3295078 0.5757814 -0.62124058 -0.01619026
2 0.1836433 -0.8204684 -0.3053884 -2.21469989 0.94383621
3 -0.8356286 0.4874291 1.5117812 1.12493092 0.82122120
4 1.5952808 0.7383247 0.3898432 -0.04493361 0.59390132
```

Suppose we want to fit `form1`

, `form2`

, and their union, `form1 + form2`

. If this last addition was possible, we could just do

```
> lm(form1, df)
Call:
lm(formula = form1, data = df)
Coefficients:
a sin(b)
1.425 -1.521
> lm(form2, df)
Call:
lm(formula = form2, data = df)
Coefficients:
(Intercept) c d
-0.2375 -0.1422 0.4342
> lm(form1 + form2, df)
Call:
lm(formula = form1 + form2, data = df)
Coefficients:
a sin(b) c d
2.7033 -2.9483 -0.1731 1.1990
```

But this isn’t possible in standard R packages (I’ve stopped saying that anything is not possible in R!).

Here’s my solution:

```
merge.formula <- function(form1, form2, ...){
# get character strings of the names for the responses
# (i.e. left hand sides, lhs)
lhs1 <- deparse(form1[[2]])
lhs2 <- deparse(form2[[2]])
if(lhs1 != lhs2) stop('both formulas must have the same response')
# get character strings of the right hand sides
rhs1 <- strsplit(deparse(form1[[3]]), " \\+ ")[[1]]
rhs2 <- strsplit(deparse(form2[[3]]), " \\+ ")[[1]]
# create the merged rhs and lhs in character string form
rhs <- c(rhs1, rhs2)
lhs <- lhs1
# put the two sides together with the amazing
# reformulate function
out <- reformulate(rhs, lhs)
# set the environment of the formula (i.e. where should
# R look for variables when data aren't specified?)
environment(out) <- parent.frame()
return(out)
}
# this is how you get the addition operator working for formulas
Ops.formula <- function(e1, e2){
FUN <- .Generic
if(FUN == '+'){
out <- merge(e1, e2)
environment(out) <- parent.frame()
return(out)
}
else stop('can not yet subtract formula objects')
}
```

Here are some things I learned while writing these functions:

- The
`reformulate`

function is really useful…check it out. - To use
`strsplit`

on ‘special’ characters, you often need to put slashes in front of the character (in the`strsplit`

help page find the phrase: “If you really want to split on ‘.'”). - The
`merge.formula`

function needs to do`environment(out) <- parent.frame()`

or any function that used the resulting formula would not be able to find the appropriate variables in the global environment, if no data frame was passed with the formula (sorry for being terse here).

Advertisements

4 Comments
leave one →

This was extremely helpful. I needed a function that added a term to a model object and modifying this was very simple. Thanks! Please continue to be awesome!

You’re totally welcome!

Maybe this is what you want instead?

http://stat.ethz.ch/R-manual/R-patched/library/stats/html/add1.html

And please continue to be encouraging!

They don’t quite do what I am after, but thank you for pointing them out.

Cheers!