1

I would like to identify change points in a multivariate time-series with the help of ecp package in R. I work with the data that represent a collection of documents that was divided into ten groups. In the data, there are ten variables (groups), 90 time units (weeks) and thousands of documents per group and time unit. I calculated proportion of documents within each of the groups across these 90 weeks. What I want to check is whether there are any time points where the distribution of documents in the groups changed dramatically.

An example of the data:

> head (df, 3) week 1 2 3 4 5 6 7 8 9 10 1 week 1 0.168 0.185 0.092 0.099 0.056 0.075 0.070 0.071 0.087 0.097 2 week 2 0.186 0.159 0.101 0.115 0.063 0.062 0.074 0.066 0.070 0.104 3 week 3 0.183 0.149 0.078 0.107 0.058 0.069 0.079 0.086 0.093 0.098

I would be grateful for any help with the following questions:

  1. Is it appropriate to use change point analysis (and ecp package in particular) for this kind of data?
  2. I would like not only to detect change points, but also trend changes. Would it make sense to divide the data into the segments based on the change points and then perform multivariate Mann-Kendall test?
  3. Any other advice about possible solutions is highly appreciated (since I am not a statistician and completely new to change point and time series analysis).
wpooh
  • 11
  • 2
  • +1 I have had good results applying `ecp` to comparable multivariate time series as short as 15-20 observations and would therefore expect it to be a useful tool for your problem. Since you have composition data, you might be able to help `ecp` out by means of a preliminary transformation like the [CLR or ILR](https://stats.stackexchange.com/questions/259208). – whuber Jun 09 '20 at 13:59

0 Answers0