Variable reduction with multiple groups of highly correlated variables, and variable-specific nonlinear interactions

Question

Are there any techniques to perform dimensionality reduction with multiple groups of highly correlated variables, and variable-specific nonlinear interactions? See the specific details below.

I am interested in efficiently identifying the variables that have interactions, as those will be selected.

Here is a relevant example:

there are 15 groups of variables
each group has 10 variables
the variables in each group are highly correlated with each other (.75-.9 per pair)
the variables in any given group are not significantly correlated with a variable from a different group (the groups of variables are independent)
With 8 of the groups (for example) there is one variable in the group that interacts with a variable outside its group, having a significant impact on the dependent variable. (And the other variables in that group aren't needed.)
Some of the groups without an interacting member are still relevant to the model, and the best variable from these groups can be effectively selected by running a univariate random forest.
There are 15-20 more variables which may positively contribute to the model. These are not part of correlated variable groups. There are low correlations with the other variables.

The variables are all real numbers.

The above is a simplification, as I'm searching for a practical solution, and the above is adequate. In reality, there could also be 3-way interactions, and a group could have a much less important 2nd or even a 3rd interacting variable (adding the same "real" information, so it would be redundant).

Random Forests are highly effective in this application, when the correct variables are known.

Is there a way, other than an exhaustive search (prohibitive computation time), to identify the variables that have interactions? Primarily for the 2-way interaction case, but also 3-way if possible.

You might look at the paper behind iterative Random Forests. There may be some relevance in identifying interaction terms. Just an idea. — Kyle, Feb 04 '22 at 01:57
@Dave Each group contains substantially the same variable, as they are based on very similar underlying information (and only one of them here will have a very meaningful interaction in places in the data set). Having ten versions of a variable overwhelms the model. If it wasn't for the interaction, a univariate RF model could pick the best 1 of 10, as the variable is relevant even without the interaction. Something not in the question, there are a lot of candidate variables that won't be relevant., but you won't know that until you build a model or do variable reduction. — Julie, Feb 04 '22 at 02:28

Variable reduction with multiple groups of highly correlated variables, and variable-specific nonlinear interactions

0 Answers0