18

I want to perform an ANCOVA analysis of data concerning density of plant epiphytes. At first, I would like to know if there is any difference in plant density between two slopes, one N and one S, but I have other data such as altitude, canopy openness and height of the host plant. I know that my covariate would have to be the two slopes (N and S). I built this model that runs in R and although I have no idea if it performs well. Also I would like to know what the difference is if I use the symbol + or *.

model1 <- aov(density~slope+altitude+canopy+height)
summary(model1)
model1
Silverfish
  • 20,678
  • 23
  • 92
  • 180
Pauloc
  • 613
  • 2
  • 6
  • 6
  • 3
    + will calculate main effects only, * will estimate interactions between factors connected with *. ANCOVA frameworks usually estimate only a main effect of the continuous factor, but interactions between all grouped factors. – russellpierce Mar 09 '13 at 20:04

3 Answers3

15

The basic tool for this is lm; note that aov is a wrapper for lm.

In particular, if you have some grouping variable (factor), $g$, and a continuous covariate $x$, the model y ~ x + g would fit a main effects ANCOVA model, while y ~ x * g would fit a model which includes interaction with the covariate. aov will take the same formulas.

Pay particular attention to the Note in the help on aov.

As for + vs *, russellpierce pretty much covers it, but I'd recommend you look at ?lm and ?formula and most especially section 11.1 of the manual An Introduction to R that comes with R (or you can find it online if you haven't figured out how to find it on your computer; most easily, this involves finding the "Help" pull down menu in either R or RStudio).

iNyar
  • 147
  • 7
Glen_b
  • 257,508
  • 32
  • 553
  • 939
  • suppose i have two group factors $g_1,g_2$, and two covariate $x_1,x_2$, with my model being $$y_{ij}=\mu+\alpha_i+\eta_j+x_{ij1}\gamma_1+x_{ij2}\gamma_2+\epsilon_{ij}$$ Does y~g_1+g_2+x_1+x_2 does the same trick? Do the F values obtained against x_1 and x_2 test $\gamma_1=0$ and $\gamma_2=0$ respectively? – Sayan Dec 27 '15 at 06:01
  • Not sure how I missed this. Yes. .... and if you want to test both at once, fit both with and without them and pass the fitted lm objects to `anova` (you'll soon see if you give them in the wrong order because some SS will be negative if you do) – Glen_b Oct 15 '17 at 20:05
11

I recommend getting and reading Discovering Statistics using R by Field. He has a nice section on ANCOVA.

To run ANCOVA in R load the following packages:

car
compute.es
effects
ggplot2
multcomp
pastecs
WRS

If you are using lm or aov (I use aov) make sure that you set the contrasts using the "contrasts" function before doing either aov or lm. R uses non-orthogonal contrasts by default which can mess everything up in an ANCOVA. If you want to set orthogonal contrasts use:

contrasts(dataname$factorvariable)=contr.poly(# of levels, i.e. 3) 

then run your model as

model.1=aov(dv~covariate+factorvariable, data=dataname)

To view the model use:

Anova(model.1, type="III") 

Make sure you use capital "A" Anova here and not anova. This will give results using type III SS.

summary.lm(model.1) will give another summary and includes the R-sq. output.

posth=glht(model.1, linfct=mcp(factorvariable="Tukey"))  ##gives the post-hoc Tukey analysis
summary(posth) ##shows the output in a nice format.

If you want to test for homogeneity of regression slopes you can also include an interaction term for the IV and covariate. That would be:

model=aov(dv~covariate+IV+covariate:IV, data=dataname)

If the interaction term is significant then you do not have homogeneity.

eyanquenb
  • 145
  • 8
  • Why do non-orthogonal contrasts mess everything up? – tintinthong Aug 25 '16 at 23:27
  • 1
    To answer the question above about "why non-orthogonal contrasts mess everything up". The answer is that R defaults to non-orthogonal (i.e. difference between means) which can cause problems if you want to see the contribution of each IV separately. When we specify orthogonal contrasts we tell R that we want the SS for the IV's to be completely partitioned and non-overlapping. In this way we can see the variation attributed to each predictor cleanly and clearly. If you do not specify, R defaults to a more liberal approach to the contrast. –  Aug 31 '16 at 20:07
  • 2
    Why the interest in type III SS? – Frank Harrell Oct 04 '18 at 16:21
4

Here is a complementary documentation http://goo.gl/yxUZ1R of the procedure suggested by @Butorovich. In addition, my observation is that when the covariate is binary, using summary(lm.object) would give same IV estimate as generate by Anova(lm.object, type="III").

X.X
  • 173
  • 5
  • 1
    It isn't clear that this is supposed to be an answer. Is it? If so, please edit to clarify. If it is a question, please ask by clicking the `ASK QUESTION` at the top & asking it there. Then we can help you properly. – gung - Reinstate Monica Feb 10 '15 at 22:38
  • Agreed. The message has been revised as an (complementary) answer to the previous one. – X.X Feb 10 '15 at 23:39