Is it possible to use entropy maximisation to get around collinearity in linear regression?

Question

Say I have a linear regression problem with highly collinear predictors. Is it possible to use entropy maximisation rather than OLS to fit this model?

If so, which (preferably Free) software packages can do this?

(The predictors represent discrete quantities of the same thing so entropy maximisation of the thing between the predictors might be a reasonable assumption).

score 3 · Accepted Answer · answered Sep 03 '15 at 13:58

Principal components regression is one well respected way to handle collinear predictors, and my sense is that it has a relation to entropy maximization. I don't have time right now to think this through, and would welcome comments (or other answers here) that rigorously consider principal components regression in the context of entropy maximization.

The first principal component is the linear combination of predictors that captures the maximum amount of variance; the second is the linear combination orthogonal to the first component that captures the most remaining variance, and so on. (This is best done with predictors normalized to zero mean and unit variance.) As a result, collinear predictors tend to group together into the same principal component. Then the (mutually orthogonal) principal components rather than the original predictor variables are used as the independent variables for the regression. Often only a subset of principal components is included in the final model.

You can take this a step further and perform ridge regression, where the different principal components receive different weights (rather than all-or-none) when included in the model. This helps minimize problems from collinearity and overfitting and doesn't throw out any information from the predictors.

There are R functions for performing these calculations. The princomp function in the stats package and lm.ridge in the MASS package are in the base R distribution.

This paper presents a robust extension of principal components regression based on a maximum correntropy criterion.

If I do a PCA transform, then variable selection (not necessarily the highest variance components but the ones that best predict the target), is it then valid to back-transform the component $\beta$s from regression to variable $\beta$s? — Sideshow Bob, Sep 03 '15 at 14:35
Selecting a subset of principal components will tend to keep all of the original predictors in your model in any event, so ridge regression would be preferable. Although I presented it in terms of principal components, it provides back-transformed $\beta$ values for the original variables that have been penalized based on their predictive (in)abilities. You can choose an appropriate penalty based on cross-validation or bootstrapping. This has the best chance of generalizing well to samples other than the one on which your model is based. — EdM, Sep 03 '15 at 14:57

Is it possible to use entropy maximisation to get around collinearity in linear regression?

1 Answers1

Linked