0

Question:

when we say correlation is 1 in boss Ridge and Elastic Net, does it only mean $x_1 = x_2?$

Story:

Ridge will trends to allocate the similar coefficients to the high correlated features

which is mentioned here:

Why Lasso or ElasticNet perform better than Ridge when the features are correlated

Recently I saw the similar result related in Elastic Net from LASSO in Wiki:

https://en.wikipedia.org/wiki/Lasso_(statistics)#cite_note-Zou_2005-5 enter image description here

Elastic Net can convert Ridge part into OLS and then become an equivalent LASSO. I believe above result must be related to Ridge.

Analysis:

Intuitively if $x_1=x_2,$ their correlation is 1. Then $$\beta_1x_1 + \beta_2x_2=(\beta_2+\beta_1)x_1.$$

Assume the sum is constant: $\beta_2+\beta_1 = C.$ To minimize the $L_2$-norm $\beta_1^2+\beta_2^2,$ the optimal solution is $$\beta_1 = \beta_2.$$ This deduction is also used to partially proof the sparsity for the correlated features in LASSO.

However when $2x_1=x_2,$ their correlation is still 1. In this case, the optimal solution should be $$2*\beta_1 = \beta_2.$$

So when we say correlation is 1 in boss Ridge and Elastic Net, does it only mean $x_1 = x_2?$

Test:

I tested the example in Sklearn. The result is consistent with my conclusion

from sklearn.linear_model import Ridge
import numpy as np
n_samples, n_features = 10, 5
rng = np.random.RandomState(0)
y = rng.randn(n_samples)
X = rng.randn(n_samples, n_features)

X[:,1] = X[:,0]
print('X[1] = X[0]:')
for x in range(10):
    clf = Ridge(alpha=x*0.5)
    clf.fit(X, y)
    print(clf.coef_[0:2])

X[:,1] = 2*X[:,0]
print('X[1] = 2*X[0]:')
for x in range(10):
    clf = Ridge(alpha=x*0.5)
    clf.fit(X, y)
    print(clf.coef_[0:2])

Here are the result. It seems the conclusion is correct.

If $x_0=x_1,$ we have $\beta_1 = \beta_2:$

Ridge(alpha=0.0, fit_intercept=False)
[0.03862135 0.03862135]
Ridge(alpha=0.1, fit_intercept=False)
[0.02039476 0.02039476]
Ridge(alpha=0.2, fit_intercept=False)
[0.00455329 0.00455329]
Ridge(alpha=0.30000000000000004, fit_intercept=False)
[-0.00931821 -0.00931821]
Ridge(alpha=0.4, fit_intercept=False)
[-0.02154448 -0.02154448]
Ridge(alpha=0.5, fit_intercept=False)
[-0.03238315 -0.03238315]
Ridge(alpha=0.6000000000000001, fit_intercept=False)
[-0.04204119 -0.04204119]
Ridge(alpha=0.7000000000000001, fit_intercept=False)
[-0.05068676 -0.05068676]
Ridge(alpha=0.8, fit_intercept=False)
[-0.0584579 -0.0584579]
Ridge(alpha=0.9, fit_intercept=False)
[-0.06546893 -0.06546893]

If $2x_0=x_1,$ we have $2\beta_1 = \beta_2:$

Ridge(alpha=0.0, fit_intercept=False)
[-1.27062478e+15  6.35312388e+14]
Ridge(alpha=0.1, fit_intercept=False)
[0.0082309  0.01646181]
Ridge(alpha=0.2, fit_intercept=False)
[0.00185263 0.00370526]
Ridge(alpha=0.30000000000000004, fit_intercept=False)
[-0.00381988 -0.00763976]
Ridge(alpha=0.4, fit_intercept=False)
[-0.00889342 -0.01778684]
Ridge(alpha=0.5, fit_intercept=False)
[-0.01345436 -0.02690873]
Ridge(alpha=0.6000000000000001, fit_intercept=False)
[-0.01757333 -0.03514665]
Ridge(alpha=0.7000000000000001, fit_intercept=False)
[-0.02130859 -0.04261718]
Ridge(alpha=0.8, fit_intercept=False)
[-0.02470868 -0.04941737]
Ridge(alpha=0.9, fit_intercept=False)
[-0.02781434 -0.05562869]
user6703592
  • 745
  • 3
  • 8

0 Answers0