2

Suppose we have time-series $ X_t $ and it has the following decomposition

$$X_t=\mu + \varepsilon_t,$$

where $\mu$ is a mean and $\varepsilon_t$ - the error term.

The model complexity will increase if we divide this time-series in to some segments,say $k$, and repeat above process. As the model complexity increases the approximation accuracy also increases. So I want to introduce a regularisation term here which will help in deciding the number of segments $k$ in which we need to divide the time-series. The error in approximation can be defined as

$$ \epsilon_t= \frac{1}{k} \sum_{i=1}^k (\mu_{1}-X_{i})+\frac{1}{n-k} \sum_{i=n-k}^n (\mu_{2}-X_{i}), $$

here I have divided the time-series in 2 segments and $\mu_{1}, \mu_{2}$ are their respective means. Now I want to find out the optimal number of segments in general. Please note, that here I want to introduce a "regularisation" term which will help in deciding optimal number of segments.

Amit
  • 743
  • 2
  • 6
  • 16
  • 1
    Although this question is stated quite differently, it appears to be identical to http://stats.stackexchange.com/q/2432 . – whuber May 24 '11 at 16:16
  • @whuber, good catch, judging from the comments to my answer, this looks like what OP actually wants. – mpiktas May 25 '11 at 07:00

1 Answers1

2

Seems that you have a change point problem. Also look at change-point tag for related questions in this site. For fitting these type of models R for example has the packages segmented and strucchange. The relevant function to find the optimal number of segments in package strucchange is breakpoints. Here is the simple example:

e<-rnorm(100) ##errror term

##the mean, value 10 for first 20 observations, 
##then 30 for next 30 and 3 for last 50.
mu<-c(rep(10,20),rep(5,30),rep(3,50)) 

##generate time series X
x<-mu+e

> breakpoints(x~1,data=data.frame(x=x))

     Optimal 3-segment partition: 

Call:
breakpoints.formula(formula = y ~ 1, data = data.frame(y = y))

Breakpoints at observation number:
20 50 

Corresponding to breakdates:
0.2 0.5 
mpiktas
  • 33,140
  • 5
  • 82
  • 138
  • thnx for the reply, i want to understand the mathematical formula for this and then may be implement. – Amit May 24 '11 at 11:56
  • @Amit, the reasoning behind this formula is not simple, you can get the references from the **strucchange** [vignette](http://cran.r-project.org/web/packages/strucchange/strucchange.pdf). Why do you want to implement it yourself? – mpiktas May 24 '11 at 12:19
  • I want to understand how in general Regularisation can be applied to these kinds of problems – Amit May 24 '11 at 13:32
  • @Amit, why do you call this regularisation? Do you have any references? – mpiktas May 24 '11 at 13:37
  • Regularisation in general helps to overcome the problem of overfitting – Amit May 24 '11 at 14:40