Is it appropriate to split up to 100+ variables into three groups then running each group into separate decision trees then run the new created features into their own separate logistic models to help determine the most significant features that can be used in a final model? An example is the likelihood you or your girlfriend will buy something at any one of two stores.
Further, after running the first three logistic models and determining all of the significant features from them, is it correct to then run those features into three more logistic models that have somewhat different binary events? An example is a. you or your girlfriend buy something at the first store, b. You buy something at the second store, c. your girlfriend buys something at the second store. It’s important to note that the first store has different attributes from the second store like much smaller, different location, etc.
Does this method of variable selection introduce biases and could lead to overfitting? It seems incorrect to me. I feel like omitted variable bias is one issue that can arise.