3

I want to analyze fish aquaculture data comparing growth rates by state and controlling for tank effects as a mixed effect. I will use the equation

$$ w = aL^b $$

but I want b to be a function of state of origin. I can do this using the nlme function in the nlme package. Here is a toy data set similar to mine.

tl <- seq(100, 300, 1/5)
state <- c(rep(x = c("DE", "TX", "FL", "VA", "SC"), times = length(tl)/5), "DE")

ai <- 0.000004
bi <- 3.3

lw <- NULL
set.seed(1234)
for(i in 1:length(tl)) {
  lw[i] <- log(ai) + bi * log(tl[i]) + rnorm(n = 1, 0, 0.1)
}

w <- exp(lw)    
df <- data.frame(cbind(w, tl))
df$state <- as.factor(state)
    df$tank <- as.factor(c(rep(1:15, length(tl)/15), 1:11))

This is the code I would use:

nl <- nlme(w ~ a * tl ^ b, fixed = list(a ~ 1, b ~ state), 
           random = b ~ 1|tank, data = df, start = c(a = rep(0.0000001, times = 1), 
           b = rep(3.1, times = 5)))
summary(nl)

However, a plot of the data reveals that the error is likely to be multiplicative on the arithmetic scale (variability in w (weight) increases with increasing tl (total length)):

    plot(tl, exp(lw))

My current model using nlme is

$$ w = aL^b + \epsilon \quad \epsilon \sim N(0, \sigma^2) $$

where b is a function of state. However, I want to make the equivalent log-linear model:

$$ w = aL^b\epsilon \quad \epsilon \sim N(0, \sigma^2) $$

$$ \log(w) = \log(a) + b \cdot \log(L) + \log(\epsilon) $$

If it weren't for the fact that I want b to be a function of state I could use the lme function in nlme:

lm1 <- lme(log(mass) ~ log(tl), data = df, random = ~ 1|tank)
summary(lm1)

How would I make the equivalent log-linear model to my nonlinear model except for the error distribution? Xiao et al. 2010 discuss this issue but it's for non-mixed models and for functions when a or b are a function of a factor.

http://www.esajournals.org/doi/pdf/10.1890/11-0538.1

The final part of the question would be whether I could use AIC to compare the two models or if I should just use the log-linear if the data clearly look to have that error structure?

Ben Bolker
  • 34,308
  • 2
  • 93
  • 126
djhocking
  • 1,701
  • 3
  • 17
  • 21
  • There is a little problem with your log-liner model, since $\log(\epsilon)$ is ill-defined when $\epsilon \sim N(0,\sigma^2)$. It should be instead $$ w = aL^b\exp(\epsilon),$$ where $\epsilon \sim N(0,\sigma^2)$. Then, $$ \log(w) = \log(a) + b\log(L)+\epsilon.$$ I don't know if there is a relation between ML estimators of the log-linear and non-linear models. – Alexandre Patriota Jan 01 '14 at 03:48
  • Oops, thanks. That is what I had written down on paper, I just typed it up incorrectly. I'll use the excuse that it was my first time writing an equation in LaTeX :P – djhocking Jan 01 '14 at 16:26

1 Answers1

2

This is actually pretty easy: in statistical terms, you just add an interaction between total length tl and state.

Regenerating data (with some tiny cosmetic/efficiency tweaks: you don't need the for loop):

tl <- seq(100, 300, 1/5)
state <- c(rep(x = c("DE", "TX", "FL", "VA", "SC"),
               times = length(tl)/5), "DE")
ai <- 0.000004
bi <- 3.3

set.seed(1234)
lw <- log(ai) + bi*log(tl) + rnorm(length(tl))
df <- data.frame(lw,w=exp(lw),tl,state=factor(state),
                 tank=factor(c(rep(1:15, length(tl)/15), 1:11)))

Fitting the nonlinear mixed model:

library(nlme)
nl <- nlme(w ~ a * tl ^ b, fixed = list(a ~ 1, b ~ state), 
      random = b ~ 1|tank, data = df,
      start = c(a = rep(0.0000001, times = 1), 
      b = rep(3.1, times = 5)))
fixef(nl)

Now as a linear mixed model: you don't have the intercept varying either by state or by tank in the model above: that might be inadvisable, but I've replicated it in the model below.

lmm <- lme(lw ~ 1 + log(tl):state ,
           random = ~ (log(tl)-1) |tank, data = df)
fixef(lmm)

library(ggplot2)
g0 <- ggplot(df,aes(x=tl,y=w,colour=state))+geom_point(alpha=0.3)+
     scale_y_log10()+scale_x_log10()+
     geom_line(aes(group=tank),alpha=0.3)+
     theme_bw()
nlpred <- predict(nl)
g0 + geom_line(data=transform(df,w=predict(nl)))+
     geom_line(data=transform(df,w=exp(predict(lmm))),lty=2)

enter image description here

As for comparing logged with unlogged data: the naive comparison will be wrong, but you can do this:

The adjustment to the AIC for log-transformation should be $2 \sum_i \log(y_i)$: as explained by Weiss, the adjustment to the likelihood is $d(\log(y))/dy = 1/y$, so the adjustment to the AIC is $-2 \sum \log(L) = -2 \sum(\log(1/y)) = 2 \sum \log(y)$.

Comparing negative log-likelihoods:

set.seed(101)
y <- rlnorm(100)
L1 <- -sum(dnorm(log(y),log=TRUE))+sum(log(y))
L2 <- -sum(dlnorm(y,log=TRUE))
all.equal(L1,L2)  ## TRUE
Ben Bolker
  • 34,308
  • 2
  • 93
  • 126
  • Thanks Ben! That was easy. I tried log(tl)*state and didn't think to try log(tl):state. Thanks streamlined data generation. I still think in for loops from learning C 15 yrs ago as an undergrad. I will also have state + log(tl):state to vary the intercept by state in the real model. I will have to think more about the random effects because it is a nuisance parameter that shouldn't have much effect since they are replicate aquaculture tanks in 1 room. – djhocking Jan 01 '14 at 16:46
  • In general (IMO) you should still include the random effects (go re-read Hurlbert 1984 ...) – Ben Bolker Jan 01 '14 at 16:56
  • I know I need to include a random effect, I just meant I have to think more about random intercept or if I should include a random slope for anything, random on a, b, or both ((log(tl)-1) |tank or ~1|tank, etc.). – djhocking Jan 01 '14 at 17:00
  • Note: all of Jack Weiss's excellent lectures and class exercises/notes are linked in [this post](https://stats.stackexchange.com/a/339330/80624) – theforestecologist Jun 05 '18 at 06:08