1

First post here.

I'm pretty much a newbie both in statistics and using R, but nevertheless trying to fit a linear mixed model with the package nlme. My question is about transformations/variance functions. Pretty basic stuff I suppose, but I cannot find any good sources clearly explaining the concept. Also I recognize this question might as well be in SO, but I also need help with the statistical side of things.

My model thus far is this:

model=lme(fixed=A~B, random = ~1|C)

So, simply 1 fixed and 1 random.

During model validation, my pooled residuals vs. fitted values -graph and Q-Q -plots showed heterogeneity and non-normality.

The way I understand it, my choices now are either a transformation or a variance function for lme with the weights-argument (or am I missing some other ways of coping with the problem?). But simply using a variance function would only, hopefully, correct the heterogeneity, and it would not help normality, am I correct?. But using a transformation would also have its downsides. Which should I employ as a default, and why, or should I employ both if one doesn't work?

And most importantly, could someone please explain the basic idea behind the variance functions in complete layman's terms, and how I should go about implementing them? Practical examples with R-code would also be much appreciated!

And of course any referred literature would be of great help here, for a newbie.

Oh and thanks so much for this awesome forum by the way, I've learned a ton reading this stuff!

EDIT: After writing this I found an excellent reference explaining the various variance functions and how to use them in R (with practical examples) in Zuur et al. "Mixed Effects Models and Extensions in Ecology with R" pp. 72-86. I highly recommend reading this as a first source for anyone new to the consept. It's easy reading and after reading it you will understand all the basics.

tuhinokkaeläin
  • 81
  • 1
  • 1
  • 9
  • You are right that a variance function won't do anything about the non-normality. A transformation *might* fix both problems if you are lucky - so that's what I'd suggest. Is $A$ a positive measurement, so that $\log A$, $\sqrt A$, etc. would make sense? What do you see as the down side of using a transformation? – Russ Lenth Sep 22 '14 at 17:00
  • PS sometimes there is a natural choice for a transformation. Can you say more about what kind of measurement $A$ is? – Russ Lenth Sep 22 '14 at 17:01
  • @rvl Thanks for answering! But see my edit, I am now a bit more learned. And yes _A_ is a measurement of net photosynthesis that always (in the current setting) gets a positive value and no zeros. And I see now that if it didn't, it wouldn't make sense to use a fixed or a power function. I've used graphics and AIC-values to try and select the best function, but it could also be argued that the best one would be an identity function, because most (if not all) variation seems to be in the grouping factor _B_. – tuhinokkaeläin Sep 24 '14 at 07:52
  • Also, if I need to compare several models of _A_ as DV (if I run several models for slightly different needs during the measuring campaign) should I then always use the same variance function for each model of _A_, for the models to be equal and comparable, and to overall make sense together? – tuhinokkaeläin Sep 24 '14 at 07:59
  • As for transformations... their use is still a bit of a mystery for me, but I've been told to avoid them till the end. The reasons for this I don't know, but I've thought that using them makes inference and prediction tricky or impossible, and if you use something like the Box-Cox, then, like Rob Hyndman said [here](http://stats.stackexchange.com/questions/1713/express-answers-in-terms-of-original-units-in-box-cox-transformed-data) : "If the Box-Cox transformation yields a symmetric distribution, then the mean of the transformed data is back-transformed to the median on the original scale." – tuhinokkaeläin Sep 24 '14 at 08:10

0 Answers0