Why don’t we do experiments anymore?


I came into the science of ecology during the mid-1980′s. This was a time when ecologists were learning the lesson that one cannot simply go out and collect observational data to test hypotheses. Proving that species compete by demonstrating Hutchinsonian ratios proved to be a rather futile endeavor, to say the least. The problem is that many different causal mechanisms can create the same general pattern in the data that one collects.

The field of ecology seems to have recently slid back into this “observation only” data trap. With the proliferation of data being online in various repositories and forms, we’ve moved into an era of simply downloading some data, analyzing it, and saying what it means. Two main areas come to my mind here – niche modeling and phylogenetic diversity analyses – but this critique is not limited to these two areas.

Such practices have a number of problems to my mind. First and foremost, the same data is simply recycled over and over again. No new data are brought to problems.

Second, data sets were usually collected with a specific purpose in mind, and so the methods used to gather those data were tailored to the issue at hand. This often means that the data is poorly applicable to many other problems for which it might superficially seem applicable. Many different issues arise here. For example, a person is conducting a study of the interactions of two species. The person then designs a methodology to quantify the abundances of those two species. In sampling those species, the researcher may also gather data on eight other species that are captured with this methodology, even though these methods poorly sample the other species. The person then posts those data, and someone else downloads the data and uses them as estimates for all ten species in the data set. Problem? If you download data like this, do you even know there’s a problem?

Anotehr huge problem in my mind is that only variables that are easy to measure become members of data sets, whether those are the correct variables to address the problem or not. For example, much of niche modeling is done with data on only temperature and water availability, because huge data sets are available on the web for these variables for the entire globe. These variables are correlated with a huge number of other environmental variables. Thus, patterns are there to be seen in almost all analyses. The problem is that because patterns will always be apparent, all the drivers of range limits and niche extents appear to be driven by temperature and water availability, because those are the variables in the analyses. One cannot ascribe interpretations to variables that are not in the data set. Thus, even though many of these patterns may be driven by unmeasured variables that are correlated with water and temperature, the conclusions will always be that water and temperature are the drivers. If you’ve got a hammer, everything looks like a nail.

But most importantly, causal mechanisms are not manipulated experimentally do test the causal connections between hypothesized drivers. Ultimately, only experiments can separate the correlated responses from the direct causal agents, and test for the operation of fundamentally assumed mechanisms.

One cannot say it enough: Correlation is not causation.

Share

, , ,

  1. No comments yet.
(will not be published)