suitable non-linear equation to capture a 'J-shaped' relationship between x and y

Question

I am trying to model the relationship between forest age and individual tree mortality rate. The probability of mortality declines rapidly as forests go from being very young, and then creeps back up as they age (Juveniles more likely to die than intermediate aged trees. After this initial decline in mortality probability, mortality creeps back up as a function of age).

Here are some example data generated in R, and a plot of the relationship. I build the example data using two independently parameterized exponential functions, and then summing them together.

#age range, combine two curves to get J
age <- seq(1:200)
age.m1 <- 0.1    * exp(-age * .1) #mortality rate declines initially from juvenile stage
age.m2 <- 0.001 * exp(age* 0.015) #mortality rate creeps up as trees get older
age.m3 <- age.m1 + age.m2 #combine the two mortality rates
#generate some scatter in the data to plot
scatter <- age.m3 + rnorm(length(age.m3),0, 0.002)

#plot data and true relationship
plot(scatter, pch=16, ylab = 'tree mortality rate', xlab='forest age')
lines(smooth.spline(age.m3), lwd=3, col='purple')

The outcome looks like this:

I'd like to fit a function that could capture this non-linear relationship, so that I could compare the parameters of these fits to different sets of data. Specifically, I'm asking:

Which equation could fit an asymmetric curve like this using any statistical software?
Can you demonstrate it works using the nls package, or similar, in R?

There isn't enough information here to provide definitive answers. What's needed are some indications of the shape of *your* data, of the error structure, and any relevant theoretical suggestions. To illustrate what I am requesting, take a look at the considerations involved in the solutions at http://stats.stackexchange.com/a/64039/919 and http://stats.stackexchange.com/a/148166/919, both of which solve problems that have *exactly* the characteristics you describe here (parameterized nonlinear U-shaped curves being fit using `R`). — whuber, Jun 02 '16 at 19:48
Is there anything theory suggests about a particular functional form for the non-linear relationship? If not you'd perhaps be as well to use a linear model - that's linear in the parameters - with a spline basis function. — Scortchi - Reinstate Monica, Jun 02 '16 at 19:48
@Scortchi The theoretical basis is what leads to the summing of two different relationships between age and mortality rate, and the asymmetric curve. Will a spline function give me parameters such that I could compare multiple forests? — colin, Jun 02 '16 at 19:50
@whuber your first link is useful! I got it! Sorry about the cross post. Would you suggest I remove this post? I didn't mean to offend you. I thought the shape of *my data* was clear based on the simulated data I provided, which I believe are a reasonable approximation. I also went through the relevant theoretical basis for why the relationship took this shape. I'm unsure what else to add. — colin, Jun 02 '16 at 19:55
I don't see any cross-post: are you perhaps suggesting one of my links does answer your question? If so, that's fine and we can point this thread at it. I see you did provide a *narrative* description that leads to a U-shaped curve, but a more useful description would be *quantitative* (although I realize that can be problematic to come by). We should bear in mind that the model needn't be perfect, but only good enough. However, if we have the opportunity to adopt a model that is likely to match the reality and isn't overly complicated, then we should try to do so. — whuber, Jun 02 '16 at 20:00
@whuber on second look, I don't see how the first example can really capture this relationship. You specify a combination of a linear, and a negative exponential response. This is a combination of a positive and negative exponential response. Its not intuitive to me how just putting two exponential terms into the model would solve the problem, as they would be redundant. — colin, Jun 02 '16 at 20:01
@whuber Is the code that generates the relationship not a quantitative description that leads to the J-shaped curve, supported by the narrative description above? — colin, Jun 02 '16 at 20:02
It is--but I had understood it as being only a qualitative illustration rather than as a statement of a specific model. Important information like that shouldn't be buried in code! (Not everybody is conversant with every software platform, but it's reasonable to expect readers to understand English and standard mathematical notation.) Furthermore, once I saw that your simulated errors were inappropriate for mortality rates (they should have a Poisson-like distribution, not a Normal one) I stopped paying any more attention to those details, figuring they weren't meant to be studied closely. — whuber, Jun 02 '16 at 20:06
I thought the same. Just to answer your question on splines: you could of course compare two fitted curves or even include 'forest' and interactions as a parameter. But the parameter estimates wouldn't have much theoretical import - the parameters don't have any life outside the fitted model, someone said. — Scortchi - Reinstate Monica, Jun 02 '16 at 20:28
@MatthewGunn The [second link in my original comment](http://stats.stackexchange.com/questions/147329/how-to-represent-kwh-usage-by-year-against-average-temperature/148166#148166) systematically compares and contrasts a polynomial fit to one that is motivated by theoretical considerations. In addition to the disadvantages indicated there, polynomials tend to be too sensitive to individual and extreme values. IMHO although useful, they ought to be low down on one's list of procedures to consider. — whuber, Jun 03 '16 at 13:19
What's wrong with the function you used to generate the data? In what way would it not already be a good answer to your question? — Glen_b, Jun 03 '16 at 13:56
@Glen_b I generated the data by summing two exponential functions. If I attempt to fit two exponential functions in nls (or anything else), with identical format, it will throw a singularity error. — colin, Jun 03 '16 at 14:32
@colin [*The crux or central point*](https://www.google.com/search?q=nub+definition) — Glen_b, Jun 04 '16 at 01:54

Karsten W. · Answer 1 · 2016-06-03T15:20:51.423

4

FIRST APPROACH:

Hurst et al. use the following form for a similar problem (as far as I can see):

mortality = a + b * age * exp(c * age)

where a, b, and c are parameters. The call to nls would be along the lines

dat <- data.frame(age=age, mortality=scatter)
fit <- nls(
    mortality ~ a + b * age * exp(c*age), 
    data=dat, start=list(a=0.05, b=-0.001, c=-0.001)
)
plot(mortality~age, data=dat)
lines(predict(fit))

where dat is the data.frame with your data, including the columns mortality and age. The fit does not look very nice, though.

SECOND APPROACH:

To exploit the known functional relation, you can use constrained optimisation to get a reasonable fit:

ll <- function(par, dat) -sum(dnorm(dat[,"mortality"], 
        mean=par[1]*exp(par[2]*dat[,"age"]) +   
             par[3]*exp(par[4]*dat[,"age"]), 
        sd=par[5], log=TRUE)
)

fit_mle <- constrOptim(
    theta=c(1,-0.05,1, 0.05, 1), 
    f=ll, grad=NULL, 
    ui=diag(c(1,-1,1,1,1)), ci=rep(0,5), dat=dat
)

p <- fit_mle[["par"]]
y <- p[1]*exp(p[2]*age)+p[3]*exp(p[4]*age)
plot(mortality~age, data=dat)
lines(y)

The important constraints are that the second parameter needs to be negative and the fourth parameter positive.

edited Jun 03 '16 at 15:20

answered Jun 03 '16 at 14:07

Karsten W.

667
6
25

this is progress! But yeah, the fit is not great, considering these are idealized data where we know the true functional form. – colin Jun 03 '16 at 14:33
Edited the answer with an second approach. – Karsten W. Jun 03 '16 at 15:26
you have nailed it! – colin Jun 03 '16 at 15:27
What's the difference between the optimization procedure in `nls` & `constrOptim`? (You can use constraints in both, I think.) – Scortchi - Reinstate Monica Jun 03 '16 at 18:45
Interesting! I did not know about constrains in nls – Karsten W. Jun 04 '16 at 09:30

score 3 · Accepted Answer · answered Jun 04 '16 at 05:57

3

You just need to choose some sensible options in nls. Here I used constrained nls, forcing the coefficients to have opposite signs. I ran this after your code above:

mfit=nls(scatter ~ a1*exp(-b1*age)+a2*exp(b2*age),
    start=list(a1=.01,a2=.02,b1=.04,b2=.04),lower=list(0,0,0,0),
    algorithm="port",trace=TRUE)
 lines(age,fitted(mfit),col="yellow2",lwd=2,lty=3)

The fitted curve is the light dashed line right up the middle of the purple curve

answered Jun 04 '16 at 05:57

Glen_b

257,508
32
553
939

ah, I didn't realize you could force the coefficient signs that way, given that nls may explore a coefficient space that flips the sign. Great! – colin Jun 06 '16 at 00:52
1

Even without using formal constraints, with a bit of cleverness and reparameterization you may be able to achieve a similar effect (i.e. impose positivity), so similar models might be fitted in packages that don't have formal constraints as options. – Glen_b Jun 06 '16 at 01:17

suitable non-linear equation to capture a 'J-shaped' relationship between x and y

2 Answers2