I had a model (made with VW, log loss) based on a set of base (p=1000's) predictors. It did not predict well.
I added set A of predictors (p=~5 predictors), and it improved immensely.
I added set B of predictors (p=1000's), without set A, and it was only a little bit better than the base model.
I then tried to predict based on the base predictors + A + B, and it performed terribly. Much worse than even the base model. In most of the models the coefficients range from ~-3.5 to 3.5. In the model with all the predictors, there is only one negative coefficient (-0.93), and the rest are positive, ranging up to 8.0.
I suspect that collinearity is the culprit. How should I test whether groups of predictors are collinear?