Number of segments to divide a time-series

Question

Suppose we have time-series $ X_t $ and it has the following decomposition

$$X_t=\mu + \varepsilon_t,$$

where $\mu$ is a mean and $\varepsilon_t$ - the error term.

The model complexity will increase if we divide this time-series in to some segments,say $k$, and repeat above process. As the model complexity increases the approximation accuracy also increases. So I want to introduce a regularisation term here which will help in deciding the number of segments $k$ in which we need to divide the time-series. The error in approximation can be defined as

$$ \epsilon_t= \frac{1}{k} \sum_{i=1}^k (\mu_{1}-X_{i})+\frac{1}{n-k} \sum_{i=n-k}^n (\mu_{2}-X_{i}), $$

here I have divided the time-series in 2 segments and $\mu_{1}, \mu_{2}$ are their respective means. Now I want to find out the optimal number of segments in general. Please note, that here I want to introduce a "regularisation" term which will help in deciding optimal number of segments.

Although this question is stated quite differently, it appears to be identical to http://stats.stackexchange.com/q/2432 . — whuber, May 24 '11 at 16:16
@whuber, good catch, judging from the comments to my answer, this looks like what OP actually wants. — mpiktas, May 25 '11 at 07:00

score 2 · Answer 1 · edited Apr 13 '17 at 12:44

2

Seems that you have a change point problem. Also look at change-point tag for related questions in this site. For fitting these type of models R for example has the packages segmented and strucchange. The relevant function to find the optimal number of segments in package strucchange is breakpoints. Here is the simple example:

e<-rnorm(100) ##errror term

##the mean, value 10 for first 20 observations, 
##then 30 for next 30 and 3 for last 50.
mu<-c(rep(10,20),rep(5,30),rep(3,50)) 

##generate time series X
x<-mu+e

> breakpoints(x~1,data=data.frame(x=x))

     Optimal 3-segment partition: 

Call:
breakpoints.formula(formula = y ~ 1, data = data.frame(y = y))

Breakpoints at observation number:
20 50 

Corresponding to breakdates:
0.2 0.5

edited Apr 13 '17 at 12:44

Community

1

answered May 24 '11 at 11:45

mpiktas

33,140
5
82
138

thnx for the reply, i want to understand the mathematical formula for this and then may be implement. – Amit May 24 '11 at 11:56
@Amit, the reasoning behind this formula is not simple, you can get the references from the **strucchange** [vignette](http://cran.r-project.org/web/packages/strucchange/strucchange.pdf). Why do you want to implement it yourself? – mpiktas May 24 '11 at 12:19
I want to understand how in general Regularisation can be applied to these kinds of problems – Amit May 24 '11 at 13:32
@Amit, why do you call this regularisation? Do you have any references? – mpiktas May 24 '11 at 13:37
Regularisation in general helps to overcome the problem of overfitting – Amit May 24 '11 at 14:40

Number of segments to divide a time-series

1 Answers1