0

For a sales prediction task, I have 2 datasets, one pertaining to the sales with each store occurring multiple times and the other that has extra information on each of the 1115 distinct stores. The second dataset has a variable called Promo2 which refers to the fact that whether the corresponding store is participating in a consecutive promo or not. There're 3 more variables that revolve around this information. A couple of them refer to the year and week since when a store started participating in promo2. One describes the consecutive intervals the promo2 is started.

The problem is not all stores participate in promo2, and for those not participating, the year, week, and interval is not defined therefore missing. For those that are participating, the year/week variables have discrete observations whereas the interval variable has categorical data - having 3 intervals.

The question is; How to deal with such predictors that are both missing - naturally - and have discrete observations otherwise.

My approach is to put 0s in the missing records of the year/week variables and to make dummies out of the interval variable.

  • Thank you @Alexis . But, I have a doubt. In his answer to [this question](https://stats.stackexchange.com/questions/372257/how-do-you-deal-with-nested-variables-in-a-regression-model) ; Ben says, "it is possible (and usually desirable) to remove the initial explanatory variable from the model altogether, and simply use the nested variable on its own". For a [similar question](https://stats.stackexchange.com/questions/6563/80-of-missing-data-in-a-single-variable), whuber says, the indicator variable, along with the nested variable must be kept. Which one makes more sense in my case? – Ritik P. Nayak Mar 02 '22 at 23:02

0 Answers0