4

Need some help interpreting the summary() -function results.

I am running a lme from the package nlme in R.

I have a simple (and quite small) dataset with three grouping variables: origin, genotype and time, response is a continuous variable named Maxi.

Origin = 2 levels, called Ka and La

Genotype = 3 levels nested within origin Ka and 2 levels nested within origin La

Time = 2 levels nested within each genotype

I am interested in the main effects of Origin, Time and their interaction. In addition to testing I'd like to have their estimates. I want to model the random part as Genotype nested within Origin. Here's the model I had in mind:

model = lme(fixed = Maxi ~ Origin*Time, random = ~ 1 |Genotype)

anova()s etc work fine and there's actually no significant interaction, but

here's the problem:

when I run summary(model), I get:

Fixed effects: Maxi ~ Origin * Time 
                                   Value Std.Error DF   t-value p-value
(Intercept)                    15.399386 1.1127382 20 13.839181  0.0000
OriginLa                       -1.986388 1.7702416  3 -1.122100  0.3435
Timeeve                         0.074444 0.8942694 20  0.083246  0.9345
OriginLa:Timeeve               -1.387448 1.5648876 20 -0.886612  0.3858

Where are my estimates for the other levels of the factors? I thought that to be able to interpret these fixed effects the summary-table would have to show all the levels in some manner? Or do I interpret this such that:

  1. the estimate for OriginKa is 15.399386
  2. the estimate for OriginLa is 15.399386-1.986388
  3. the estimate for Timemor is 15.399386
  4. the estimate for Timeeve is 15.399386+0.074444

    and then I can't even guess how to interpret the interaction estimate...

It doesn't feel intuitively right that the estimates would be the same for both a level of the Origin -factor and a level of the Time factor.

Notes:

  1. I did NOT make my data into a groupedData (is it always necessary?)
  2. I wanted to include random = 1 ~ |Origin/Genotype in the model but that produced NaNs in the output, apparently the model became too complex or my data is arranged wrongly?

So the Questions are:

  1. How do I interpret the summary to get the estimates, or is there something wrong here, or is there another way of getting the estimates. I need the estimates for both levels of Time within both Origins.
  2. How do I specify the random effects with this data structure? Have I done it right?

Any pointers?

Here's the data needed to reproduce my problem:

Orig.Genot.Time Maxi
Ka  Ka1     mor 14,59
Ka  Ka1     eve 13,42
Ka  Ka11    mor 14,08
Ka  Ka11    eve 16,29
Ka  Ka15    mor 14,38
Ka  Ka15    eve 14,56
La  La1     mor 17,82
La  La1     eve 13,28
Ka  Ka1     mor 16,44
Ka  Ka1     eve 15,52
Ka  Ka15    mor 13,76
Ka  Ka15    eve 13,55
Ka  Ka1     mor 19,15
Ka  Ka1     eve 19,12
La  La6     mor 10,54
La  La6     mor 11,38
La  La6     eve 10,48
Ka  Ka15    mor 15,25
Ka  Ka15    eve 16,51
La  La1     mor 17,46
La  La1     eve 15,57
Ka  Ka1     mor 16,83
Ka  Ka1     eve 15,63
Ka  Ka15    mor 14,54
Ka  Ka15    eve 15,09
La  La1     mor 11,3
La  La1     eve 11,94
gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
tuhinokkaeläin
  • 81
  • 1
  • 1
  • 9

1 Answers1

4

I'll have to let someone else address the question of how best to specify your random effects.

The question of how to understand the coefficients is a FAQ. You have two factors with two levels each. Thus, there are four means. You have four parameters ((Intercept), OriginLa, Timeeve, and OriginLa:Timeeve). With them you can recreate your four means. Your "do I interpret..." is essentially right, except that OriginLa, e.g., is only the difference between OriginKa and OriginLa when Time is mor, and Timeeve is only the difference between Timeeve and Timemor when Origin is ka. To compute the mean where Origin is La and Time is mor, you sum all four coefficients. It may help you to read my answers here: Interpretation of betas when there are multiple categorical variables, and here: Interpretation of interaction term.

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
  • I thought this might be a FAQ, and was hesitant to ask, so thanks for answering! This is a bit off-topic, but is there no function in `nlme` to automatically extract these? Or in another package? Even if one really should know how to do the calculation. I think I might be able to do it with `predict.lme`, but so far haven't been able to set up the correct newdata -dataframe, perhaps because of the interaction-term? `data.frame(Origin = c("Ka", "Ka", "La", "La"), Time = c("mor", "eve", "mor", "eve"))`, returns `cannot evaluate groups for desired levels on 'newdata'` – tuhinokkaeläin Feb 23 '15 at 19:28
  • 1
    @tuhinokkaeläin, it's OK to ask. We prefer you search the site 1st, but then you might not know what to search for. Note also that you asked about the proper specification of the RE, which is *definitely* on-topic here, & which I didn't answer (I'm honestly not certain); hopefully someone will provide that information soon. W/o running your model, I'm not sure why `predict` didn't work. You could ask on [SO], since you do have a reproducible example. – gung - Reinstate Monica Feb 23 '15 at 19:32
  • So far this community has been nothing but helpful, I really appreciate you guys. And yes, that was the problem, I did not know what to search for, thanks for pointing me the right direction. I do always make an effort before posting, and try to avoid creating double-posts. I'm still a bit unsure about the estimates though. Would this be right: β0 = OriginKa,Timemor; β0 + β1 = OriginLa,Timemor; β0 + β2 = OriginKa,Timeeve; β0 + β1 + β2 = OriginLa,Timeeve. But this would only be true _on the condition_ that the interaction is non-significant? – tuhinokkaeläin Feb 23 '15 at 20:38
  • @tuhinokkaeläin, close. The mean of `OriginLa, Timeeve` is $\beta_0 + \beta_1 + \beta_2 + \beta_3$, *whether or not the interaction is significant*. – gung - Reinstate Monica Feb 23 '15 at 20:44
  • Aha right! I think I get it now. I suppose I got confused and somehow got this mixed up with another problem I've been tackling; the general recommendation is _not to_ interpret the main effects if the interaction is significant, right? In this case, if the interaction _was_ significant, I could not trust the `anova()` results for `Origin` or `Time`, correct? Similarly, should I then also forfeit making multiple comparisons? Or is that exactly what I should then do, via contrasts (i.e. test simple main effects)? Which leads to another hardship; how to define the contrast matrix in this case... – tuhinokkaeläin Feb 23 '15 at 21:45
  • @tuhinokkaeläin, the issue is that in this case, the 'main effects' are actually simple effects at the reference level of the other factor. `anova()` can be fine, if you want a sequential test (since that is what it does). You already have 4 means, so there is no need for multiple comparisons. – gung - Reinstate Monica Feb 23 '15 at 21:55
  • I think I might be twisting my brain the wrong way, because I just came from a different but still a slightly similar model. To summarise what you are saying: It's okay, in this particular setting, to use `anova()` and to trust it's reported F- and p-values (at least to the extent that they can be trusted in mixed models) regardless of the significance of the interaction term? I'm still confused, but at this point I'm "accepting" your answer (: – tuhinokkaeläin Feb 23 '15 at 23:21
  • The mixed model issue is irrelevant. The issue is that the tests are sequentially nested, which is either what you want or it isn't. You needn't accept my answer; it didn't cover the RE specification, eg, you could just (unaccept &) upvote & leave the Q to see if you get the rest, if you prefer. – gung - Reinstate Monica Feb 24 '15 at 00:08
  • Okay, I'll do that instead. But I still don't understand. Is the table produced with `anova.lme` interpretable in the classical sense as a table of tests for differences between the levels? I notice it includes the intercept term as well, and I suppose each term and p-value in the table is then somehow nested within the term before that (or something), similar to the fixed effects table from `summary()`, and that is what you mean with sequential? In the classical hypothesis testing setting, isn't `anova.lme` what I would use to see if differences exist between the levels of a factor? – tuhinokkaeläin Feb 24 '15 at 18:08
  • addition: I would not want sequential tests in this case, if I now correctly understand the concept at all. I am not doing model selection, nor interested in dropping factors / comparing models. So would in fact `anova(model, type = "marginal")` produce a table where the SS for a factor are calculated given that all other factors are also in the model? (i.e. classical inference on categorical factors). I'm probably pushing your patience (and the comments section lenght) with my inadequate knowledge, sorry about that. – tuhinokkaeläin Feb 24 '15 at 18:49
  • @tuhinokkaeläin, it may help you to read my answer here: [How to interpret type I (sequential) ANOVA and MANOVA?](http://stats.stackexchange.com/a/20455/7290) – gung - Reinstate Monica Feb 24 '15 at 20:52
  • That was indeed helpful, and brought to memory some Euler diagrams (I have a little stats backgroud, some of which is now returning to mind). If I understood correctly, the "type II SS" are not commonly used, and what is left then is either sequential or marginal testing. Sequential testing tests the inclusion of each predictor in order, and in marginally testing the full model is the model including all the predictors and the null model is a model where each term in turn has been dropped from the full model. The choice of testing has no effect on the interpretation of the interaction p-value. – tuhinokkaeläin Feb 24 '15 at 23:44
  • @tuhinokkaeläin, the testing is just whether you believe the effect exists, the interpretation is as discussed above. – gung - Reinstate Monica Feb 24 '15 at 23:52
  • I think I will go with marginal, because I would like to see how much of the variability is explained by each predictor, while also retaining all of the other predictors and taking into account the variability explained by them, because, in my case, I'm using `lme` as an alternative to a more classical ANOVA (which would not accommodate random effects and variance functions etc.), i.e. testing for interesting effects. Is my reasoning... reasonable? Also, is the (un)balancedeness of data an important criterion in selecting the type of testing? – tuhinokkaeläin Feb 25 '15 at 01:02
  • @tuhinokkaeläin, marginal is more common b/c many people have trouble understanding sequential. Unbalancedness is important in that if the data aren't unbalanced marginal=sequential. – gung - Reinstate Monica Feb 25 '15 at 01:07
  • I did read your comments somewhere on the FDA using type III as default. Interesting. I also read several of your other comments elsewhere and realize typeIII/marginal would give intrinsically wrong SS. However, isn't sequential testing kind of limited too, it would make me choose in advance the order of testing, and changing the order would give different results, not to mention that non-statisticians will probably always get confused reading the results. Is this just a fact we have to live with, or is this one of those things most people have trouble understanding? – tuhinokkaeläin Feb 25 '15 at 01:46
  • @tuhinokkaeläin, you have to choose the order in advance. This is just a fact we have to live with, *and* it is one of those things most people have trouble understanding. – gung - Reinstate Monica Feb 25 '15 at 03:59
  • Okay then. Well, this was a learning experience, like stats always are for me. Thank you a lot for always answering, I think I'll be using CV a lot more in the future because of your awesome input! – tuhinokkaeläin Feb 25 '15 at 15:17
  • I was not supposed to return here anymore, but I forgot to ask, and it's about the original question, namely do I also calculate the std. errors from the summary table for the fixed effects in the same way as I would the means? – tuhinokkaeläin Feb 26 '15 at 21:56