5

Why is the output from Matlab and Python vary for ridge regression? I use the ridge command in Matlab and scikit-learn in Python for ridge regression.

Matlab

X = [1 1 2 ; 3 4 2 ; 6 5 2 ; 5 5 3];
Y = [1 0 0 1];
k = 10 % which is the ridge parameter

b = ridge(Y,X,k,0)

The coefficients are estimated as

b =    0.3057    -0.0211    -0.0316    0.1741

Python

import numpy as np
X = np.array([[1, 1, 2] , [3, 4, 2] , [6, 5, 2] , [5, 5, 3]])
Y = np.r_[1,0,0,1].T

from sklearn import linear_model

clf = linear_model.Ridge(alpha=10)
clf.fit(X, Y)       

b = np.hstack((clf.intercept_, clf.coef_))

The coefficients are estimated as

 b =  0.716   -0.037   -0.054    0.057

Why is this difference observed?


EDIT: For people who think that centering and scaling is the issue. The input data is not scaled or centered as I had used the scaled parameter as 0 as observed from

b = ridge(Y,X,k,0)

and ridge regression in scikit-learn by default does not do normalization

>>clf
Ridge(alpha=10, copy_X=True, fit_intercept=True, max_iter=None,   normalize=False, solver='auto', tol=0.001)

And here is the Matlab output when it is normalised b = ridge(Y,X,k,1):

 b = -0.0467   -0.0597   0.0870
amoeba
  • 93,463
  • 28
  • 275
  • 317
prashanth
  • 3,747
  • 4
  • 21
  • 33
  • 1
    Does the discussion at http://stats.stackexchange.com/questions/23060/ answer your question? – Juho Kokkala Dec 10 '15 at 11:58
  • Thanks for the comment. No it does not answer the question. The post tells about centering and scaling of data. In the above problem, in both Matlab and Python, the input data is not scaled and centered. Ideally it should be giving same results. – prashanth Dec 10 '15 at 12:05
  • 3
    Did you notice that the answer to the question I linked tells that Matlab's ridge automatically scales and centers the inputs. If scikit-learn's ridge does not, that explains why the results are different. – Juho Kokkala Dec 10 '15 at 12:18
  • 1
    Yes you are right. But in the command 'b = ridge(Y,X,k,0)' I had used the scaled parameter as 0 which does not do the scaling and centering. In the post if you see, the scaled parameter is specified as 1 which does the centering and scaling. And scikit by default does not do scaling and centering as observed from the normalise = false flag, as seen here....Ridge(alpha=10, copy_X=True, fit_intercept=True, max_iter=None, normalize=False, solver='auto', tol=0.001) – prashanth Dec 10 '15 at 12:27
  • 1
    If software is not standardizing all non-constant variables, then it's not doing Ridge Regression--it's doing some *ad hoc* variation of it. This is an especially important and subtle point whenever interactions are included, because (by the Cauchy-Schwarz inequality) a standardized interaction is never the same as the interaction of standardized variables. – whuber Dec 15 '15 at 20:49

1 Answers1

8

MATLAB always uses the centred and scaled variables for the computations within ridge. It just back-transforms them before returning them. As you have a really small matrix this probably makes a noticeable difference. You can reproduce the Python results in MATLAB easily:

X = [1 1 2 ; 3 4 2 ; 6 5 2 ; 5 5 3];
Y = [1 0 0 1];
k = 10; % which is the ridge parameter     
Xn = [ones(4,1), X];

(Xn'*Xn +  diag([0,k,k,k]))\ (Xn'*Y')  %Same as sklearn

ans =
    0.7165
   -0.0377
   -0.0544
    0.0572
usεr11852
  • 33,608
  • 2
  • 75
  • 117
  • Thanks that helps. So this is like a bug in Matlab as it should be giving same results. – prashanth Dec 10 '15 at 12:50
  • 4
    I am glad I could help. I would not call it a bug; I am sure they use a particular reference. It be would almost impossible for something like that to go unnoticed by past users and by their internal unit-tests. Probably one of the 1970's reference in the `ridge` [docs](https://www.mathworks.com/help/stats/ridge.html) does this procedure and in small samples the difference is more pronounced. – usεr11852 Dec 10 '15 at 13:00
  • 6
    The penalty in ridge regression makes little sense if predictor variables are in different scales. I'd be more inclined to call the scikit-learn default behavior a "bug"; at best, it leaves a serious trap for the unwary. – EdM Dec 10 '15 at 14:17
  • 2
    @EdM You could say the same for the rest of scikit learn, IMHO. – Sycorax Dec 10 '15 at 15:16
  • @EdM so do you mean to say that normalization is a recommended procedure in ridge regression. If yes, why? – prashanth Dec 11 '15 at 06:48
  • @apt-getinstallhappyness: Yes, it is recommended because your features might be in vastly different scales. In that case the influence of $\lambda$/regularisation is very different between features. If you regularise the features beforehand all feature are regularalised "roughly on the same degree". – usεr11852 Dec 11 '15 at 08:28
  • 4
    @apt-getinstallhappyness : absolutely! Ridge regression applies the same penalty to the squares of all coefficients. So unless all predictor variables are in comparable scales, different predictors will be penalized differently and it will matter whether you measure lengths, say, in inches, feet, millimeters, or kilometers. That's why MATLAB normalizes variables to do the regression even though it then back transforms to the original scales. – EdM Dec 11 '15 at 08:30
  • Ok. The coefficients obtained for original data and normalized data are very different. Now if normalization is a recommended procedure, how do i normalize a new test data for whom i have to predict y. Do I have to use the same mean and SD from the training data or they are to be computed from the test data itself? And if the coefficients from the normalized data are used, the predictions are completely out of scale. How do we tackle this issue? – prashanth Dec 11 '15 at 09:06
  • @EdM Ok. The coefficients obtained for original data and normalized data are very different. Now if normalization is a recommended procedure, how do i normalize a new test data for whom i have to predict y. Do I have to use the same mean and SD from the training data or they are to be computed from the test data itself? And if the coefficients from the normalized data are used, the predictions are completely out of scale. How do we tackle this issue? – prashanth Dec 11 '15 at 09:47
  • 5
    @apt-getinstallhappyness : You want to run the ridge regression on the normalized data, then back-transform the coefficients to the original scales. (Don't forget the intercept, too.) Then you can use the back-transformed coefficients for predictions on new data points. As I understand it, that's what the MATLAB procedure does automatically if you provide it data in the original scales to start; check the manual. Absolutely no experience with scikit-learn. – EdM Dec 11 '15 at 18:36
  • @apt-getinstallhappyness: I am of the same opinion as EdM (+1). If you are curious how MATLAB's `ridge` works just type `edit ridge`; it is an open-sourced function anyway. – usεr11852 Dec 11 '15 at 21:07