Sacrificial pseudoreplication and Nested ANOVA

Nested ANOVA. This example of sacrificial pseudoreplication is useful because it both introduces sacrificial pseudoreplication, nested ANOVA, and the comparative method.

The question to address is: is the design of limnetic sticklebacks better than that of benthic sticklebacks for feeding on open-water zooplankton. By design, I mean features of morphology that are functionally related to the ability to locate and accurately strike a small, elusive prey such as a calanoid copepod. These features include body shape, mouth morphololgy, eye size, and number of gill rakers.

Example A. The typical way this has been done is to measure feeding performance in 20 benthic fish and 20 limnetic fish, then use a t-test or simple, one-way ANOVA to test the null hypothesis of no difference in performance (the sample size is just an example). While there is replication of fish within a treatment (benthic or limnetic are the two levels of the treatment species or"ecotype") there is no replication of the treatment itself, so this is a classic class of pseudoreplication (replication at the wrong level). If we reject the null hypothesis of no feeding performance difference and the direction of the difference is in the predicted direction (limnetics are better), all we can can conclude is that the limnetics in this lake are better than the benthics but we cannot conclude that it is because of a better design.

What needs to be replicated is the "treatment", which in this case requires finding independently derived populations of benthic and limnetic ecotypes. There are actually independently derived species pairs of benthics in limnetics in six different lakes in British Columbia. These populations have become quite famous because Dolph Schluter has developed a nice research program on ecological speciation using this system and has, consequently, won some nice awards such as election to the Royal Society.

Example B1. So repeat the experiment with one randomly sampled benthic and one randomly sampled limnetic from each of the six lakes. We have replicated the treatments but since our sample size within each replicate is one, we may not have good estimates of the typical (average) feeding performance within each replicate. Instead...

Example B2. Measure performance in 10 benthics and 10 limentics from each of the six lakes. Now just use the mean performance of each treatment within each lake. That is, we now have six limnetic means and six benthic means. Now just do the t-test or ANOVA on these 12 means with species as the "treatment". While we have replicated the treatment, we also have much less power to reject the null hypothesis than in the example A (40 fish vs. 12 means). Of course, the power to reject the correct null hypothesis in example A is impossibly low (2 means)!

Example C. Instead of doing a t-test on the limnetic and benthic means within each of the six lakes, do a nested ANOVA. The treatment is still species. But there is a second factor, "site", which is nested within species. This looks like this in an ANOVA table "site[species]". In this way, we are asking for differences at two levels: "among sites within a treatment" and "among treatments". By "among sites within a treatment" I mean, is there significant variation among the different benthics or significant variation among the different limnetics. The nested factor is a "random effect" and not a fixed effect and it is very important to make sure that the statistical package that you are using analyzes it this way. "Random effect" means a number of things: that the particular sites sampled are a random sample from the sites that are actually available, or that we are not interested in mean differences among the sites, or that it is not a treatment that is applied to both levels of the main treatment (species). Interestingly, none of these are really consistent with what we've done in this example, because it was not a random sample of sites (but maybe we can imagine there are actually more sites that we do not know about), we may want to know the differences among the sites within a treatment, and the site is applied to both if we consider "site" to be the lake (but maybe just consider the "site" to be analogous to a bottle of fruit flies").

What we get is an ANOVA table with a ss and ms and F statistic and P value for the main treatment (species) and for the nested factor (site[species]). If the P value for species is < 0.05 then there is a significant treatment effect, that is, there is a difference in the ability to feed on copepods between limnetics and benthics. While this is consistent with the hypothesis that it is the measured morphological difference between the species that If the cause of the difference in feeding ability, the difference could just as likely be a function of unmeasured differences in morphology. In other words, the replication still doesn't allow us to know the cause of the difference and this is ultimately what we want! This is the fundamental problem with the comparative method - we will talk about this later.

If the P value for site[species] <0.05 then there is a significant variation among sites within benthics and/or limnetics. If the design is balanced (equal number of sites within each species and equal number of fish measured within each site within each treatment) then the statistics for the treatment are the same as if we just ignored within site variation and did a t-test on the site means (example B), so what do we gain? Good question. We do get the statistic that there is or is not significant among site variation. If the design is not balanced, then the nested ANOVA can be hard to compute although there are ways to do it. But, if you use nested ANOVA the advice is generally, try to have a balanced design!

Nested vs. two factor ANOVA. In the example above we have two factors, the main treatment (species) and the nested treatment (site). How does this differ from a two-factor ANOVA?

Example D. Measure the ability of 10 benthics and 10 limnetics to feed on calanoid copepods in a test aquarium with and without artificial vegetation. There are two treatments (species and structure). Most people would measure the ability to feed with and without artificial vegetation but these should be measured on different fish in a two-factor ANOVA or else the measures for the structure factor are not independent of each other (that is, the rank order of ability without vegetation will not likely be independent of the rank order with vegetation). This is a two-factor ANOVA with fixed effects (we fixed the levels of structure to with and without and we want to know the difference between these and we are applying it to both levels of the other treatment, that is, to both benthic and limnetics).

Sacrificial pseudoreplication.

If the main effect in a nested ANOVA is not significant, this will often be because the power is low even though the effect size is large, because of a small among-treatment sample size (in Example C, this sample size is 6), which usually is the case because it is a lot of work to collect all the data for many, many replicates of the main effect. If the nested factor is also not significant, that is, there is no difference among replicates within the treatment, many textbooks suggest pooling the all of the replicates within a treatment, and do a new ANOVA with no nested effect. This is what Hurlbert calls sacrificial pseudoreplication, but it is discussed more in this Jenkins paper (see reading list).