I am currently doing an analysis for my Master Thesis and encountered some results I cannot explain.
In my paper, I am trying to explore factors that decide whether people joined a local energy initiative or not. Since I have a lot of different variables, my instructor suggested a model building approach. Concretely, I am adding sets of predictors to my logistic regression and only keep those that are significant in the model, before adding the next set. To assess model fit, I was told to use classification tables.
My problem now is the following:
I start with a set of dummies to control for participants coming from different neighbourhoods. This basic model classifies 56% of cases correctly. Now I add the second set of predictors and some of them are significant, so I keep those in the model. If I now use the classification table again, my classification got worse. Even worse than chance! (48%).
How can I find significant predictors but my model gets worse than chance?
EDIT FOR ADDITIONAL INFO:
My Dataset consits of 636 cases. 318 are partakers of the initiative, 318 are not partakers. The sets of variables I use are structured as follows:
1) "Control": People come from 30 different neighbourhoods, so I added 29 dummy variables to control for differences due to neighbourhood membership (not the best approach, I know, but I´m just following orders on this one)
2) Individual predictors: 15 demographic and psychological variables
3) Assessment of group predictors: 8 variables that measure how individuals perceive the group of potential partakers
I used the classification tables on the same data that I used for building the model, unfortunately I only have this one dataset and I´m trying to figure out which predictors are most promising for future (causational) research.