3

The data: Each year, during the months from January to July, a select number of plants had a certain "thing" measured. Each month, this was done almost every day for some plants, and maybe weekly or so, for other plants. This was done for 5 years, and for each year, a new selection of plants were measured (since the ones from the previous year were dead at that point).

Question: How do I model the variation over time and potential correlation structures?

So far, my only idea is to use "Year" and "Month" as factors, and e.g. use "Year + Month + Year*Month" in my regression equation. Then "plant" could be a random effect.

But, is this too simple? What would you do?

Bono
  • 31
  • 1
  • in addition to the commenter's suggestion of ARIMAX, Distributed lag models might also be worth checking – Carl PCH May 02 '17 at 14:45

1 Answers1

0

I would use a time series model for this type of analysis. That is, I would consider the data you've gathered as a broken, cyclical series of measurements. Specifically, since you're trying to measure the effect of a certain, presumably non-time-series-based "thing", I would use ARIMAX, which is a time series model that can incorporate the effects of exogenous variables (the 'x' in ARIMAX).

If you're trying to measure the difference between plant species or treatments of the same plant species (for example), you could use ARIMAX to attempt to measure the impact of the treatment or species difference on growth rate. If you wanted to include other variables to control out their impact, such as precipitation and/or temperature, that would be enabled by the use of ARIMAX as well.

Thomas Cleberg
  • 1,525
  • 10
  • 13