2

Say, you have the following information for training and test data:

Response: $y$

Covariates: $X = (x_1, x_2, ..., x_p)$

However, you know that your covariates, $X$, are not enough to predict $y$ well. You know for sure that there are other covariates, $Z = (z_1, z_2, ..., z_m)$, that including would predict $y$ better. But you just can't get the information for $Z$.

What prediction methods are good for this scenario?

The only method I've come across is Prediction with Missing Data via Bayesian Additive Regression Trees. I'm looking for more methods since this didn't work that much better than linear regression and random forest for my data. Also, it did not include any missing data.

193381
  • 369
  • 2
  • 12

1 Answers1

1

Short answer: if you have too few covariates to predict $y$ well, you need to gather more information.

The problem you are facing is Omitted-variables bias. A question (and useful answers) about which variables to include can be found here. The paper you linked is about missing values within a covariate, but not entirely missing covariates.

Qaswed
  • 578
  • 4
  • 17