4

I have time-series count data $N_{i,j}$ (population sizes in site $i$ and year $j$) and I want to correlate year-to-year changes with the environmental conditions $x_{i,j}$. For this, I want to fit this model:

$$\begin{eqnarray} \mbox{log} ( \mu_{i,j+1} ) &=& \mbox{log} ( \mu_{i,j} ) + \alpha + \beta x_{i,j} + \gamma_j \\ \\ N_{i,j} &\sim& \mbox{Poiss} ( \mu_{i,j} ) \\ \gamma_{j} &\sim& \mbox{Norm} (0, \sigma ) \end{eqnarray} $$

So I'm interested in parameter $\beta$, the slope of the relationship. $\gamma_{j}$ is the random effect for year (as the residuals within single year were correlated).

This is an autoregressive model. How can I fit such a model? I tried to look at the PESTS project (PESTS R code here) but I am not able to find out if and how to fit my model using it.

(Note that I am trying to avoid Bayesian tools because of computation time - I have thousands of such models).

Tomas
  • 5,735
  • 11
  • 52
  • 93
  • If your question is *only* how to do this in R, this question would be off-topic for CV (see our [help center](http://stats.stackexchange.com/help/on-topic)). A similar question, but w/ a reproducible example might be on-topic on SO; you could also try the r-help-listserv. If you have a question about the substantive statistical issues here, or can reform your Q in a software neutral way, please edit; if not, this Q may be closed. – gung - Reinstate Monica Mar 20 '14 at 13:14
  • This question appears to be off-topic because it is about how to use R. – gung - Reinstate Monica Mar 20 '14 at 13:16
  • 2
    @gung, OK, I've edited the question (but, what about another [5,688 R questions?](http://stats.stackexchange.com/questions/tagged/r)). This question definitely needs a statistician, not a pure R programmer. Only statistician would know if for example PESTS models can be used to fit this. You need to understand those models very well. – Tomas Mar 20 '14 at 13:26
  • Thanks, @Curious. You're right, it's a gray area. There are questions that include elements about R (or SAS, Stata, etc), but if they are *only* what is the code / function / library to run this, they are off-topic here. This has been a matter of much discussion on meta.CV & the consensus is evolving. There are plenty of questions tagged / tagable w/ `[r]` that remain on-topic here. – gung - Reinstate Monica Mar 20 '14 at 13:34
  • The BayesX software can fit glmms, based on reml methods. There is also an R interface R2BayesX. Also, you need a model for $\mu_{i, 0}$ to complete the structure. – probabilityislogic Mar 26 '14 at 20:39
  • @probabilityislogic I can write the model in JAGS or WinBUGS, but it takes too long to compute. BayesX does the same? How can something called BayesX be based on REML, I thought it is frequentist methond. PS: $\mu_{i,0}$ can be treated as a model parameter if necessary (i.e. it would get an uninformative prior in bugs). – Tomas Mar 28 '14 at 12:32
  • BayesX, despite the name, also offers REML-based inference. But it does not offer the kind of AR-structure you've specified.... – fabians Mar 28 '14 at 13:10
  • Wouldn't it make more sense to have log(N_ij) as the offset term (i.e., the realized value of the population) not the expected value \mu_ij? Surely, what affecs the subsequent time period is not the theoretical/latent quantity \mu but the size of the pop. that actually was present previously... If so, this would be a simple offset, e.g. something like `glmer(N ~ x + (1|year), offset = log(Nminus1))`. – fabians Mar 28 '14 at 13:15
  • where `Nminus1` means "`N` in the previous period". Say disease kills most of the pop. in i=1 -- N_j2 wont depend on mu_j1 (i.e., a function of covariate x and the average level of the year), but on N_j1 (i.e., how many survived the disease in i=1). – fabians Mar 28 '14 at 13:23
  • @fabians - I was thinking on this line too but I came to conclusion that this is not correct way to do it, for several reasons: 1) you wrote: *"what affecs the subsequent time period is not the theoretical/latent quantity \mu but the size of the pop. that actually was present previously"* - well, you could say the same about the $\mu_{i,j+1}$ on the left side too - but this is how the model values $\mu$ work, they are theoretical model coefficients bound by the population model this way and I think it's correct. This is how GLM models work in general. – Tomas Mar 28 '14 at 13:53
  • @fabians and 2) if you rewrite the model as $log(\mu_{i,j+1}/\mu_{i,j}) = \alpha + \beta x_{i,j} + \gamma_j$, then it looks like you are modelling the population growth and it perhaps will make much more sense to you than if you wrote $log(\mu_{i,j+1}/N_{i,j}) = ...$. And finally 3) How would you do $log(N_{i,j})$ when $N_{i,j}$ is zero? This is the way GLM models work. – Tomas Mar 28 '14 at 13:57
  • @Curious: 1) the point is that it makes things vastly more complicated if you have a latent, unmeasurable quantity on the right hand side o the regressin equation. it's very different from the general GLM approach that says the latent, unobservable EV is a function of measured covariates (and maybe latent group characteristics) 2) well, only the population that was actually present in the previous period can procreate, so no, I don't think that makes more sense. 3) so use log1p(N) and accept a small loss in precision. – fabians Mar 28 '14 at 14:13
  • I'm skeptical you will find any method able to fit the model as specified quickly & at scale, because by using the EV of the previous period as offset, you will have to estimate this in a sequential fashion... This will be very hard to do in non-Bayesian approaches, IMHO. In my experience with data analysis it's better to start with a simpler model -- have you ever tried the approach I suggested on your data? Don't try something complicated unless simpler alternatives do not work anymore (ok, unless you have strong theoretical reasons to favor a certain model, but it doesn't sound like it) – fabians Mar 28 '14 at 14:16
  • @fabians I get your point about the simplicity. However I don't agree on points 1, 2 - I insist my approach goes along the GLM line (see my equation in pt. 2 above) and it doesn't make sense to bring $N$ instead of $\mu$ to the equation, be it right or left side. Well, but if you say "OK, your approach is theoretically correct but don't be so strict and just go for simple solution for practical reasons", I get your point! Maybe I complicate things to much, but I simply hate "solutions" like $log(x+1)$. Why 1? Why not 0.5 or 0.01 or 0.0001? I think there's a lot of dirt in the simple solution. – Tomas Mar 28 '14 at 14:37
  • @fabians note aside, the model I proposed is already used in the literature, [look at the citation here](http://stats.stackexchange.com/a/62543/5509). Unfortunatelly the transformation described there only works for simpler model without the random effect $\gamma_{j}$. – Tomas Mar 28 '14 at 14:39
  • Whether you use $log(x+1)$ or 0.1 or 0.001 or whatever won't make much difference unless you have lots of very small $N$. I'm merely pointing out that actually fitting feasible, appropriate models to your data to see if they work and gradually building up complexity as needed as opposed to specifying a maximally complex design from the get-go is a much more promising strategy for real data analysis problems. – fabians Mar 28 '14 at 14:54
  • @fabians yes, for some species I have a lots of very small $N_{i,j}$ - like 0, 1, 2, 3, i.e. small numbers per site (low density), yet they have a very reliable data because they are present at a large number of sites. Anyway, I get your point with gradually building complexity, but how do I know if my simpler model is good enough? Comparing with the result of "perfect" bayesian approach with the simple model on several species? – Tomas Mar 28 '14 at 15:38

0 Answers0