How to account for different degrees of familial relatedness (eg, twins) in a linear mixed model? Is heteroscedasticity a problem?

Question

(Let me preface this by stating that I am somewhat of an R newb and also a LMM newb.)

I have a very, very simple research question. I have a continuous outcome variable ("HAPPY"). I have two factors that divide my sample into 4 cells (Factor1, Factor2), and I want to assess whether there are any main effects or an interactions. A simple 2 x 2 ANOVA should do the trick.

There are two issues:

The individuals in my sample are family members. Every single individual has a fraternal twin or an identical twin in the dataset. I know that this throws off all the independence assumptions, so I was instructed to use linear mixed models and include the family identification variable (which identifies an individual as being part of a twin pair) ("Family") in my dataset as a random effect.

Right now, this is my R syntax:
```
model <- lme(HAPPY ~ Factor1*Factor2, random= ~1|Family, data=mydata,
             method="REML", na.action=na.omit)
summary(model)
anova(model,type="marginal")
```
So right now, there is only one random effect. But -- theoretically, we'd expect that identical twins and fraternal twins would differ in their degree of relatedness (the variable is called "twintype"). Should this be reflected in my model, and if so...how? Conceptually, I'm struggling with how it would be specified. And practically, I'm also struggling -- putting the above syntax together was actually quite the accomplishment for me!
Heteroscedasticity. What is the best way to assess if heteroscedasticity is affecting my model? Is it something I need to be concerned about and check for? Right now I'm just eyeballing--plot(fitted(model), resid(model))--and looking for cone shapes.

And if heteroscedasticity is an issue, what can I do to fix it? From what I've gathered from the google machine, I can enter a weighting command that will account for the inconstant variance. Could I modify my syntax above to read --
```
model <- lme(HAPPY ~ Factor1*Factor2, random= ~1|Family, data=mydata, 
             method="REML", na.action=na.omit, weights=varPower())
```

Yup yup -- I actually saw this post and it was helpful, but I posted again because I'm using lme and wasn't confident that I could successfully apply the solution to my case. And because I have the additional heterscedasticity question. — R Frank, Aug 28 '12 at 18:07
Informal eyeballing for changes in variance is probably a good way to start. If heteroscedasticity is a problem there are a few options. Two that come to mind are (1) variance function estimation which I have spoken of before and (2) generalized (weighted) least squares. The weighting is generally inversely proportiona to the standard deviation at the point in the x space (usually you can make a rough estimate to get reasonable weights). — Michael R. Chernick, Aug 28 '12 at 18:09
For the nmle R package would adding `weights=varPower())` be a solution? — R Frank, Aug 28 '12 at 18:19
Macro, I see that you discussed a solution to the lmer twin study question, any thoughts on how it would it apply when I have MZ/DZ twins and two contrast-coded predictor variables w/interaction? — R Frank, Aug 28 '12 at 18:23
@RFrank, I think you can use the same random effects structure I described in the other answer, although you may delete the random slopes in the predictors if that doesn't make sense in the context of your application. Have a look at the second bullet point to see how that model characterizes correlations between both MZ twin pairs and DZ twin pairs and let me know if you still have questions. FYI for future reference, you can use '@' in front of a user name to ping someone you've been talking to in a comment thread- I only noticed your comment because I happened to look at this question again — Macro, Aug 28 '12 at 21:52
Ah, thanks very much for your help @Macro! I'm going to tackle this tomorrow and will post the syntax/data structure I come up with to see if it makes sense. — R Frank, Aug 30 '12 at 04:52
@Macro, I realized (I hope correctly?) that I'd need to switch to lmer to accommodate multiple random effects. So I had a variable with a family ID ("Family") that uniquely identifies each twin pair. I created a new ID variable ("MZID") that is identical for a given pair of MZ twins but unique for each individual that's a member of a DZ twin pair. So now I have: model — R Frank, Aug 31 '12 at 21:22
@RFrank, I think that `lme` can accommodate multiple random effects but I do think that switching to `lmer` isn't a bad idea for various reasons. Re your main question: if there are multiple different MZ twin pairs within a family (at least in some cases), you should use model 1. If there are not multiple different MZ twin pairs any families then, I _think_, the two models you specified would be equivalent except that in model two, the two random effects would be allowed to be correlated with each other whereas they are independent in model one. — Macro, Aug 31 '12 at 22:17
@Marco, correct, there is only one twin pair per family. Thank you so. so. much for your help on this. Maybe the model I'll actually go with is lmer(HAPPY ~ Factor1*Factor2 + (1|Family)+(A/Family)). (Easier to just to work with the dummy variable A.) I THINK I should actually treat the random effects as independent; typically people think of the environmental (DZ) and genetic (MZ) effects as being additive. — R Frank, Aug 31 '12 at 22:35
Typo above, I meant ...(+1|Family)+(A|Family)). This is what my random effects results look like with the model mentioned above: Groups Name Variance Std.Dev. Corr Family (Intercept) 0.78679 0.88701 Family (Intercept) 1.48261 1.21763 A 2.07840 1.44167 -0.604 Residual 2.77799 1.66673 — R Frank, Aug 31 '12 at 22:42
Oops. Just noticed that I pinged you incorrectly above. @Macro. — R Frank, Aug 31 '12 at 23:01
@RFrank, you're very welcome! Note that `R` automatically includes intercepts in their linear model formulas, e.g. `(A|Family))` will automatically specify the model with a random slope in `A` **and** a random intercept, both grouped by `Family`. If you don't want the intercept you have to manually delete it, e.g. `(A-1|Family))`. It looks like you've included two random intercepts grouped by family - one which is correlated with the slope in `A`, and one which isn't. I'm pretty sure that is not what you intended - try `lmer(HAPPY ~ Factor1*Factor2 + (1|Family)+(A-1|Family))`. — Macro, Aug 31 '12 at 23:22
Ah! OK, great. Thanks again - this board is fantastic. No, I did not realize that. So I would characterize this when I describe it to others as a "random intercept random slope" model in that one of the random effects depends on twin-type status. — R Frank, Sep 01 '12 at 00:14
@RFrank, I think I would just call it a random effects model, followed by a description but I think what you said is still a fair description. — Macro, Sep 01 '12 at 14:51
Hi @Macro-not sure if you're still out there, but your assistance on this was extremely helpful to me. Now I have a new hiccup. The outcome variable ("HAPPY") is available at 2 more times. (Some subjects weren't available at all times.) There is some possibility that the slopes could differ across Factor1&2, which would be interesting. I'm wondering how to incorporate this. If I enter each individual's data at each timepoint on a separate line, add an individual ID variable, and add a "time" variable, would this work? lmer(HAPPY ~ Time*Factor1*Factor2 + (1|Individual)+(1|Family)+(A-1|Family)) — R Frank, Dec 07 '12 at 02:59

How to account for different degrees of familial relatedness (eg, twins) in a linear mixed model? Is heteroscedasticity a problem?

0 Answers0