Skip to content

Multiple table data

May 30, 2012

Local environmental factors are often poor predictors of community composition. These days, more accurate models incorporate the effects of dispersal barriers, functional traits, species interactions, and (spatiotemporal and phylogenetic) autocorrelation. The recognition of the importance of these factors has led to what might be called the ‘next-generation’ of community ecological data.

Next-generation data are characterised by complex multiple-table structures. The figure to the right represents a real next-generation data set of zooplankton species abundances and environmental variables replicated spatiotemporally, at different time scales, and species are characterised by their traits. Such data cannot easily be organised into a standard data table with rows and columns representing replicates and variables, yet most statistical tools require such a data format. To accommodate this requirement, ecologists summarise parts of their data in order to fit them into standard tables—functional diversity indices and community weighted mean traits are common summaries. This is the ‘summarise first ask questions later’ approach, which neglects potentially important information that has been averaged out through summarisation. For example, although relationships between functional diversity indices and environmental variables can be interesting, they provide little insight into how traits and environmental variables interact to affect species abundances; only non-summarised data provide information about such trait-by-environment interactions.

If our data all fit into a single table, they are easy to manage and model. For example, when each column is a variable, one simply relates these variables (columns) over a set of observations (rows) using model fitting functions like lm, etc… — ‘everyone’ understands this. But when data structures get more complex (e.g. figure above), the complexities of data management can keep researchers from seeing how to apply what they know about statistical modelling. To address this issue, I am developing a more general relational structure that (I hope) allows researchers to see more clearly how to make models with their multiple-table data sets. The current incarnation of this work can be found in an R package, multitable, on CRAN and RForge.

No comments yet

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: