Skip to content

Merging / combining / adding together two formula objects in R

August 6, 2012

Here’s the basic issue of the post. Suppose we have two formula objects with the same response,


form1 <- y ~ -1 + a + sin(b)
form2 <- y ~ c + d

and we want to have a quick intuitive way to get the formula that adds together all of the terms in each of the two formulas. For example we want to do form1 + form2 or merge(form1, form2), or something like that, to get,


y ~ -1 + a + sin(b) + c + d

Here’s an example of the benefits. Consider these data:


set.seed(1)
> df  	
> df
           y          a          b           c           d
1 -0.6264538  0.3295078  0.5757814 -0.62124058 -0.01619026
2  0.1836433 -0.8204684 -0.3053884 -2.21469989  0.94383621
3 -0.8356286  0.4874291  1.5117812  1.12493092  0.82122120
4  1.5952808  0.7383247  0.3898432 -0.04493361  0.59390132

Suppose we want to fit form1, form2, and their union, form1 + form2. If this last addition was possible, we could just do


> lm(form1, df)

Call:
lm(formula = form1, data = df)

Coefficients:
     a  sin(b)  
 1.425  -1.521  

> lm(form2, df)

Call:
lm(formula = form2, data = df)

Coefficients:
(Intercept)            c            d  
    -0.2375      -0.1422       0.4342  

> lm(form1 + form2, df)

Call:
lm(formula = form1 + form2, data = df)

Coefficients:
      a   sin(b)        c        d  
 2.7033  -2.9483  -0.1731   1.1990  

But this isn’t possible in standard R packages (I’ve stopped saying that anything is not possible in R!).

Here’s my solution:


merge.formula <- function(form1, form2, ...){

	# get character strings of the names for the responses 
	# (i.e. left hand sides, lhs)
	lhs1 <- deparse(form1[[2]])
	lhs2 <- deparse(form2[[2]])
	if(lhs1 != lhs2) stop('both formulas must have the same response')

	# get character strings of the right hand sides
	rhs1 <- strsplit(deparse(form1[[3]]), " \\+ ")[[1]]
	rhs2 <- strsplit(deparse(form2[[3]]), " \\+ ")[[1]]

	# create the merged rhs and lhs in character string form
	rhs <- c(rhs1, rhs2)
	lhs <- lhs1

	# put the two sides together with the amazing 
	# reformulate function
	out <- reformulate(rhs, lhs)

	# set the environment of the formula (i.e. where should
	# R look for variables when data aren't specified?)
	environment(out) <- parent.frame()

	return(out)
}

# this is how you get the addition operator working for formulas
Ops.formula <- function(e1, e2){
	FUN <- .Generic
	if(FUN == '+'){
		out <- merge(e1, e2)
		environment(out) <- parent.frame()
		return(out)
	}
	else stop('can not yet subtract formula objects')
}

Here are some things I learned while writing these functions:

  1. The reformulate function is really useful…check it out.
  2. To use strsplit on ‘special’ characters, you often need to put slashes in front of the character (in the strsplit help page find the phrase: “If you really want to split on ‘.'”).
  3. The merge.formula function needs to do environment(out) <- parent.frame() or any function that used the resulting formula would not be able to find the appropriate variables in the global environment, if no data frame was passed with the formula (sorry for being terse here).
Advertisements
4 Comments leave one →
  1. davyk1984 permalink
    September 5, 2013 4:18 pm

    This was extremely helpful. I needed a function that added a term to a model object and modifying this was very simple. Thanks! Please continue to be awesome!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: