2

I had a model (made with VW, log loss) based on a set of base (p=1000's) predictors. It did not predict well.

I added set A of predictors (p=~5 predictors), and it improved immensely.

I added set B of predictors (p=1000's), without set A, and it was only a little bit better than the base model.

I then tried to predict based on the base predictors + A + B, and it performed terribly. Much worse than even the base model. In most of the models the coefficients range from ~-3.5 to 3.5. In the model with all the predictors, there is only one negative coefficient (-0.93), and the rest are positive, ranging up to 8.0.

I suspect that collinearity is the culprit. How should I test whether groups of predictors are collinear?

jarfa
  • 168
  • 5

0 Answers0