10

I am doing a one way ANOVA (per species) with custom contrasts.

     [,1] [,2] [,3] [,4]
0.5    -1    0    0    0
5       1   -1    0    0
12.5    0    1   -1    0
25      0    0    1   -1
50      0    0    0    1

where I compare intensity 0.5 against 5, 5 against 12.5 and so on. These is the data I'm working on

enter image description here

with the following results

Generalized least squares fit by REML
  Model: dark ~ intensity 
  Data: skofijski.diurnal[skofijski.diurnal$species == "niphargus", ] 
       AIC      BIC    logLik
  63.41333 67.66163 -25.70667

Coefficients:
            Value Std.Error  t-value p-value
(Intercept) 16.95 0.2140872 79.17334  0.0000
intensity1   2.20 0.4281744  5.13809  0.0001
intensity2   1.40 0.5244044  2.66970  0.0175
intensity3   2.10 0.5244044  4.00454  0.0011
intensity4   1.80 0.4281744  4.20389  0.0008

 Correlation: 
           (Intr) intns1 intns2 intns3
intensity1 0.000                      
intensity2 0.000  0.612               
intensity3 0.000  0.408  0.667        
intensity4 0.000  0.250  0.408  0.612 

Standardized residuals:
       Min         Q1        Med         Q3        Max 
-2.3500484 -0.7833495  0.2611165  0.7833495  1.3055824 

Residual standard error: 0.9574271 
Degrees of freedom: 20 total; 15 residual

16.95 is the global mean for "niphargus". In intensity1, I'm comparing means for intensity 0.5 against 5.

If I understood this right, the coefficient for intensity1 of 2.2 should be half the difference between means of intensity levels 0.5 and 5. However, my hand calculations don't match those of the summary. Can anyone chip in what am I doing wrong?

ce1 <- skofijski.diurnal$intensity
levels(ce1) <- c("0.5", "5", "0", "0", "0")
ce1 <- as.factor(as.character(ce1))
tapply(skofijski.diurnal$dark, ce1, mean)
       0    0.5      5 
  14.500 11.875 13.000 
diff(tapply(skofijski.diurnal$dark, ce1, mean))/2
      0.5       5 
  -1.3125  0.5625 
chl
  • 50,972
  • 18
  • 205
  • 364
Roman Luštrik
  • 3,338
  • 3
  • 31
  • 39
  • Could you provide the lm() function from R that you used to estimate. How exactly did you use the contrasts function? – Philippe Sep 11 '12 at 00:50
  • btw `geom_points(position=position_dodge(width=0.75))` will fix the way the points in your plot don't line up with the boxes. – flies Sep 10 '18 at 13:49
  • @flies since my question, there's been an introduction of `geom_jitter`, which is a shortcut for all the geom_point() parameters that jitter. – Roman Luštrik Sep 11 '18 at 16:06
  • I didn't notice the jitter there. does `geom_jitter(position_dodge)` work? I've been using `geom_points(position_jitterdodge)` to add dots to boxplots with dodging. – flies Sep 12 '18 at 21:21
  • @flies see the docs for `geom_jitter` [here](https://ggplot2.tidyverse.org/reference/geom_jitter.html). In my experience since my above answer, I find it unnecessary to use boxplots. Ever. If I have many points, I use violin plots which show density of points in much finer details than boxplots. Boxplots were invented back when plotting many points or their densities was not convenient. Perhaps it's time we start thinking of dropping this (handicapped) visualization. – Roman Luštrik Sep 14 '18 at 07:19

2 Answers2

11

The matrix you specified for the contrasts is correct in principle. To convert it into an appropriate contrast matrix, you need to calculate the generalized inverse of your original matrix.

If M is your matrix:

M

#     [,1] [,2] [,3] [,4]
#0.5    -1    0    0    0
#5       1   -1    0    0
#12.5    0    1   -1    0
#25      0    0    1   -1
#50      0    0    0    1 

Now, calculate the generalized inverse using ginv and transpose the result using t:

library(MASS)
t(ginv(M))

#     [,1] [,2] [,3] [,4]
#[1,] -0.8 -0.6 -0.4 -0.2
#[2,]  0.2 -0.6 -0.4 -0.2
#[3,]  0.2  0.4 -0.4 -0.2
#[4,]  0.2  0.4  0.6 -0.2
#[5,]  0.2  0.4  0.6  0.8

The result is identical to the one of @Greg Snow. Use this matrix for your analysis.

This is a much easier way than doing it manually.


There is an even easier way to generate a matrix of sliding differences (a.k.a. repeated contrasts). This can be done with the function contr.sdif and the number of factor levels as a parameter. If you have five factor levels, like in your example:

library(MASS)
contr.sdif(5)

#   2-1  3-2  4-3  5-4
#1 -0.8 -0.6 -0.4 -0.2
#2  0.2 -0.6 -0.4 -0.2
#3  0.2  0.4 -0.4 -0.2
#4  0.2  0.4  0.6 -0.2
#5  0.2  0.4  0.6  0.8
Sven Hohenstein
  • 6,285
  • 25
  • 30
  • 39
4

If the matrix at the top is how you are encoding the dummy variables (what you are passing to the C or contrast function in R) then they the first one is comparing the 1st level to the others (actually 0.8 times the 1st subtracted from 0.2 times the sum of the others).

The second term compares the 1st 2 levels to the last 3. The 3rd compares the 1st 3 levels to the last2 and the 4th compares the 1st 4 levels to the last one.

If you want to do the comparisons that you describe (compare each pair) then the dummy variable encoding that you want is:

      [,1] [,2] [,3] [,4]
[1,] -0.8 -0.6 -0.4 -0.2
[2,]  0.2 -0.6 -0.4 -0.2
[3,]  0.2  0.4 -0.4 -0.2
[4,]  0.2  0.4  0.6 -0.2
[5,]  0.2  0.4  0.6  0.8
Greg Snow
  • 46,563
  • 2
  • 90
  • 159
  • Is the use of this generalized inverese matrix also necessary when using `aov()` instead of `lm()`? I'm asking, because I have read several tutorials, in which contrast matrices for `aov()` are constructed just as the one given by Roman. E.g. see Chapter 5 in http://cran.r-project.org/doc/contrib/Vikneswaran-ED_companion.pdf – crsh Oct 18 '12 at 16:14
  • 2
    The `aov` function calls the `lm` function to do the main computations, so things like contrast matrices will have the same effect in both. – Greg Snow Oct 18 '12 at 16:26