What are some prediction methods when you have missing covariate information?

Question

Say, you have the following information for training and test data:

Response: $y$

Covariates: $X = (x_1, x_2, ..., x_p)$

However, you know that your covariates, $X$, are not enough to predict $y$ well. You know for sure that there are other covariates, $Z = (z_1, z_2, ..., z_m)$, that including would predict $y$ better. But you just can't get the information for $Z$.

What prediction methods are good for this scenario?

The only method I've come across is Prediction with Missing Data via Bayesian Additive Regression Trees. I'm looking for more methods since this didn't work that much better than linear regression and random forest for my data. Also, it did not include any missing data.

It's not totally clear to me from your description but I guess we can assume you have partial information on the Z's? — Jorne Biccler, Jan 23 '17 at 10:46
No partial information. Just the knowledge that you don't have enough information to predict. — 193381, Jan 24 '17 at 01:30

score 1 · Answer 1 · answered Jun 26 '19 at 09:48

Short answer: if you have too few covariates to predict $y$ well, you need to gather more information.

The problem you are facing is Omitted-variables bias. A question (and useful answers) about which variables to include can be found here. The paper you linked is about missing values within a covariate, but not entirely missing covariates.

What are some prediction methods when you have missing covariate information?

1 Answers1