4

My first question on this excellent forum that has already helped me many times.

Here is my problem.

I want to estimate the effect of a large number (c. 900) of independent events that are distributed over 15,000 time intervals on a dependent variable. I have successfully fitted this model using dummy variables, one for each event, and gotten reasonable results (about one fifth are significant).

However, as each event has an effect over some periods (say 3), the model predicts poorly when events take place in close proximity to one another. Because I know the general shape of the delayed effect "curve" (like: lag1 is the biggest, lag0 is half of lag 1, lag2 0.2 of lag one), I want to fit a restricted model with incorporates this information.

The only way I've managed to do this is by using nls in R. With just two events, the model would look like:

Response=b0+
(r[1]*b1)*lag0(event1)+
(r[2]*b1)*lag1(event1)+
(r[3]*b1)*lag2(event1)+
(r[1]*b2)*lag0(event2)+ 
(r[2]*b2)*lag1(event1)+
(r[3]*b2)*lag2(event1)+
[error term]

Where r is a vector with prior info on the relative shape of the delayed impacts, for example (0.5, 1, 0.2).

This seems possible to estimate in theory, but as I have 2,700+ variables and 900+ parameters, the optimizer can't find an optimum. Even as I use OLS parameters from the model without lags as starting values for the optimization. (It takes by the way about 15 minutes for the optimizer to give up).

We never talked about restricted parameters in my econometrics classes, except for Kyock lags and such, and my maths skills are quite poor, so I don't even know if this could actually be fitted by a linear model or if I should use some sort of Lagrange estimator.

There might be other questions like this on this forum, but I don't even know what terms to search for, and I haven't found any help. There are lots of info on models where for example b1+b2=q, but my restriction is of the form b1/b2=q.

If nothing else, it would be of great help just to know what this sort of model is called, so that I can search for help.

Here is the start of the actual formula. The first 15 are lagged dependent. Then comes the dummy variables (with 3 lags) for the events.

Response1~b1001*RespL1+b1002*RespL2+b1003*RespL3+b1004*RespL4+b1005*RespL5+b1006*RespL6+b1007*RespL7+b1008*RespL8+b1009*RespL9+b1010*RespL10+b1011*RespL11+b1012*RespL12+b1013*RespL13+b1014*RespL14+b1015*RespL15+b0003/r[1]*imp0003+b0003/r[2]*imp0003L1+b0003/r[3]*imp0003L2+b0003/r[4]*imp0003L3+b0004/r[1]*imp0004+b0004/r[2]*imp0004L1+b0004/r[3]*imp0004L2+b0004/r[4]*imp0004L3+b0005/r[1]*imp0005+b0005/r[2]     
kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
  • 1
    Hi Björn and welcome to submitting to CV :) It would be really helpful if you could give out a part of your data and show a minimal example with R code on what you are trying to do. – Gumeo Dec 10 '15 at 12:46
  • 2
    Regarding the constraint you have, you can reformulate it as $b_1-q\cdot b_2 = 0$. Then you have a linear constraint! Hope this might help. – Gumeo Dec 10 '15 at 12:47
  • Thanks Gumeo! Never about how you can reformulate it like that. Could you tell me if this means I can use something with higher performance than NLS or ML, that both iterate to find a global optimum? I will try to give an example. The actual model I'm working on is pretty commercial stuff, so I will have to redo it somewhat. – Björn Backgård Dec 10 '15 at 14:21
  • 1
    It would be nice to have a minimal working example, but you can look at this [asnwer](http://stackoverflow.com/questions/12452108/r-how-to-add-constraints-for-a-model-to-be-estimated-via-lm-or-nls) to get an idea for how you can add the constraint into your model using nls. – Gumeo Dec 10 '15 at 14:24
  • Actually, I did get the nls model working with constraints and all (3k something of them). But the model doesn't solve. Probably because of the very large number of independent variables. – Björn Backgård Dec 10 '15 at 15:19
  • It reads like you have a time series with interventions. I would look into literature regarding this and look for R implementations and in particular Bayesian approaches. – Roland Dec 10 '15 at 16:02

1 Answers1

4

If the regression model is linear (or partly linear, like a GLM (generalized linear model), so $b_1 x_1+b_2 x_2$ is (part of) a linear predictor, you just write $\frac{b_1}{b_2}=q$ as $b_1 = q b_2$, then the (part of) the linear predictor becomes $ q b_2 x_1 = b_2 x_2$, so you can just eliminate $b_1$ from the model and introduce a new predictor $w=x_1 q+x_2$. If your model is non-linear, you should post the detailed model here and some similar trick could work out.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
  • 1
    Thanks Kjetil. I will look into how this could be done using GLM. But that would mean solving the model by ML right? So it might still be difficult. I noticed that I had made an error in the formula. The lag distribution curve was inverted. Will try again and update. – Björn Backgård Dec 10 '15 at 17:22
  • 1
    No, that 'trick' is not dependent on estimation method! – kjetil b halvorsen Dec 10 '15 at 18:02