Principal components regression is one well respected way to handle collinear predictors, and my sense is that it has a relation to entropy maximization. I don't have time right now to think this through, and would welcome comments (or other answers here) that rigorously consider principal components regression in the context of entropy maximization.
The first principal component is the linear combination of predictors that captures the maximum amount of variance; the second is the linear combination orthogonal to the first component that captures the most remaining variance, and so on. (This is best done with predictors normalized to zero mean and unit variance.) As a result, collinear predictors tend to group together into the same principal component. Then the (mutually orthogonal) principal components rather than the original predictor variables are used as the independent variables for the regression. Often only a subset of principal components is included in the final model.
You can take this a step further and perform ridge regression, where the different principal components receive different weights (rather than all-or-none) when included in the model. This helps minimize problems from collinearity and overfitting and doesn't throw out any information from the predictors.
There are R functions for performing these calculations. The princomp
function in the stats
package and lm.ridge
in the MASS
package are in the base R distribution.
This paper presents a robust extension of principal components regression based on a maximum correntropy criterion.