4

I am trying to model the relationship between forest age and individual tree mortality rate. The probability of mortality declines rapidly as forests go from being very young, and then creeps back up as they age (Juveniles more likely to die than intermediate aged trees. After this initial decline in mortality probability, mortality creeps back up as a function of age).

Here are some example data generated in R, and a plot of the relationship. I build the example data using two independently parameterized exponential functions, and then summing them together.

#age range, combine two curves to get J
age <- seq(1:200)
age.m1 <- 0.1    * exp(-age * .1) #mortality rate declines initially from juvenile stage
age.m2 <- 0.001 * exp(age* 0.015) #mortality rate creeps up as trees get older
age.m3 <- age.m1 + age.m2 #combine the two mortality rates
#generate some scatter in the data to plot
scatter <- age.m3 + rnorm(length(age.m3),0, 0.002)

#plot data and true relationship
plot(scatter, pch=16, ylab = 'tree mortality rate', xlab='forest age')
lines(smooth.spline(age.m3), lwd=3, col='purple')

The outcome looks like this: enter image description here

I'd like to fit a function that could capture this non-linear relationship, so that I could compare the parameters of these fits to different sets of data. Specifically, I'm asking:

  1. Which equation could fit an asymmetric curve like this using any statistical software?
  2. Can you demonstrate it works using the nls package, or similar, in R?
colin
  • 862
  • 1
  • 11
  • 27
  • 1
    There isn't enough information here to provide definitive answers. What's needed are some indications of the shape of *your* data, of the error structure, and any relevant theoretical suggestions. To illustrate what I am requesting, take a look at the considerations involved in the solutions at http://stats.stackexchange.com/a/64039/919 and http://stats.stackexchange.com/a/148166/919, both of which solve problems that have *exactly* the characteristics you describe here (parameterized nonlinear U-shaped curves being fit using `R`). – whuber Jun 02 '16 at 19:48
  • Is there anything theory suggests about a particular functional form for the non-linear relationship? If not you'd perhaps be as well to use a linear model - that's linear in the parameters - with a spline basis function. – Scortchi - Reinstate Monica Jun 02 '16 at 19:48
  • @Scortchi The theoretical basis is what leads to the summing of two different relationships between age and mortality rate, and the asymmetric curve. Will a spline function give me parameters such that I could compare multiple forests? – colin Jun 02 '16 at 19:50
  • @whuber your first link is useful! I got it! Sorry about the cross post. Would you suggest I remove this post? I didn't mean to offend you. I thought the shape of *my data* was clear based on the simulated data I provided, which I believe are a reasonable approximation. I also went through the relevant theoretical basis for why the relationship took this shape. I'm unsure what else to add. – colin Jun 02 '16 at 19:55
  • 1
    I don't see any cross-post: are you perhaps suggesting one of my links does answer your question? If so, that's fine and we can point this thread at it. I see you did provide a *narrative* description that leads to a U-shaped curve, but a more useful description would be *quantitative* (although I realize that can be problematic to come by). We should bear in mind that the model needn't be perfect, but only good enough. However, if we have the opportunity to adopt a model that is likely to match the reality and isn't overly complicated, then we should try to do so. – whuber Jun 02 '16 at 20:00
  • @whuber on second look, I don't see how the first example can really capture this relationship. You specify a combination of a linear, and a negative exponential response. This is a combination of a positive and negative exponential response. Its not intuitive to me how just putting two exponential terms into the model would solve the problem, as they would be redundant. – colin Jun 02 '16 at 20:01
  • @whuber Is the code that generates the relationship not a quantitative description that leads to the J-shaped curve, supported by the narrative description above? – colin Jun 02 '16 at 20:02
  • 1
    It is--but I had understood it as being only a qualitative illustration rather than as a statement of a specific model. Important information like that shouldn't be buried in code! (Not everybody is conversant with every software platform, but it's reasonable to expect readers to understand English and standard mathematical notation.) Furthermore, once I saw that your simulated errors were inappropriate for mortality rates (they should have a Poisson-like distribution, not a Normal one) I stopped paying any more attention to those details, figuring they weren't meant to be studied closely. – whuber Jun 02 '16 at 20:06
  • I thought the same. Just to answer your question on splines: you could of course compare two fitted curves or even include 'forest' and interactions as a parameter. But the parameter estimates wouldn't have much theoretical import - the parameters don't have any life outside the fitted model, someone said. – Scortchi - Reinstate Monica Jun 02 '16 at 20:28
  • Fit a polynomial? – Matthew Gunn Jun 03 '16 at 04:07
  • @MatthewGunn The [second link in my original comment](http://stats.stackexchange.com/questions/147329/how-to-represent-kwh-usage-by-year-against-average-temperature/148166#148166) systematically compares and contrasts a polynomial fit to one that is motivated by theoretical considerations. In addition to the disadvantages indicated there, polynomials tend to be too sensitive to individual and extreme values. IMHO although useful, they ought to be low down on one's list of procedures to consider. – whuber Jun 03 '16 at 13:19
  • 1
    What's wrong with the function you used to generate the data? In what way would it not already be a good answer to your question? – Glen_b Jun 03 '16 at 13:56
  • 1
    @Glen_b I generated the data by summing two exponential functions. If I attempt to fit two exponential functions in nls (or anything else), with identical format, it will throw a singularity error. – colin Jun 03 '16 at 14:32
  • That's the nub of your question, then - after 12 comments! – Scortchi - Reinstate Monica Jun 03 '16 at 18:38
  • @Scortchi what is a 'nub'? – colin Jun 03 '16 at 23:03
  • @colin [*The crux or central point*](https://www.google.com/search?q=nub+definition) – Glen_b Jun 04 '16 at 01:54

2 Answers2

4

FIRST APPROACH:

Hurst et al. use the following form for a similar problem (as far as I can see):

mortality = a + b * age * exp(c * age)

where a, b, and c are parameters. The call to nls would be along the lines

dat <- data.frame(age=age, mortality=scatter)
fit <- nls(
    mortality ~ a + b * age * exp(c*age), 
    data=dat, start=list(a=0.05, b=-0.001, c=-0.001)
)
plot(mortality~age, data=dat)
lines(predict(fit))

where dat is the data.frame with your data, including the columns mortality and age. The fit does not look very nice, though.

enter image description here

SECOND APPROACH:

To exploit the known functional relation, you can use constrained optimisation to get a reasonable fit:

ll <- function(par, dat) -sum(dnorm(dat[,"mortality"], 
        mean=par[1]*exp(par[2]*dat[,"age"]) +   
             par[3]*exp(par[4]*dat[,"age"]), 
        sd=par[5], log=TRUE)
)

fit_mle <- constrOptim(
    theta=c(1,-0.05,1, 0.05, 1), 
    f=ll, grad=NULL, 
    ui=diag(c(1,-1,1,1,1)), ci=rep(0,5), dat=dat
)

p <- fit_mle[["par"]]
y <- p[1]*exp(p[2]*age)+p[3]*exp(p[4]*age)
plot(mortality~age, data=dat)
lines(y)

The important constraints are that the second parameter needs to be negative and the fourth parameter positive.

enter image description here

Karsten W.
  • 667
  • 6
  • 25
3

You just need to choose some sensible options in nls. Here I used constrained nls, forcing the coefficients to have opposite signs. I ran this after your code above:

mfit=nls(scatter ~ a1*exp(-b1*age)+a2*exp(b2*age),
    start=list(a1=.01,a2=.02,b1=.04,b2=.04),lower=list(0,0,0,0),
    algorithm="port",trace=TRUE)
 lines(age,fitted(mfit),col="yellow2",lwd=2,lty=3)

enter image description here

The fitted curve is the light dashed line right up the middle of the purple curve

Glen_b
  • 257,508
  • 32
  • 553
  • 939
  • ah, I didn't realize you could force the coefficient signs that way, given that nls may explore a coefficient space that flips the sign. Great! – colin Jun 06 '16 at 00:52
  • 1
    Even without using formal constraints, with a bit of cleverness and reparameterization you may be able to achieve a similar effect (i.e. impose positivity), so similar models might be fitted in packages that don't have formal constraints as options. – Glen_b Jun 06 '16 at 01:17