I am learning about ridge regression, so I am implementing it in MATLAB as practice. However, I am having trouble finding a structure of data where ridge regression performs better than an ordinary least squares.
Reading up I've found that data that is collinear is often better to be regularized. However when I implemented this in the below code least squares is performing just a well as ridge regression (the best lambda parameter is in the range e-10, almost non-existent!). MATLAB tells me that X is rank deficient (rank=2) when using the built in function for least squares, however it still performs well?
I was wondering if anyone knew why this was performing this way, is my data perhaps not collinear enough to show a real performance difference, or have I misunderstood something?
% Generate data;
clear;
Nt = 100;
X(:,1) = randn(Nt,1);
X(:,2) = 2*X(:,1) + 6;
X(:,3) = 12*X(:,2) + 16;
p=[0.74,3,4.5];
y = X*p' + randn(Nt,1);
% Least Squares;
pLS = X\y;
%pLS = pinv(X'*X)*(X'*y);
nmseN = sum((X*pLS-y).^2)/length(y)/var(y);
% Tikhonov;
lspace = logspace(-10,-1,1000);
bestNMSE = inf;
bestLambda = -1;
I=eye(size(X, 2));
for lambda=1:length(lspace)
prLS = pinv(X'*X + lspace(lambda)*(I'*I))*(X'*y);
nmse = sum((X*prLS-y).^2)/length(y)/var(y);
if nmse<bestNMSE
bestNMSE=nmse;
bestLambda=lspace(lambda);
end
end
prLS = pinv(X'*X + bestLambda*(I'*I))*(X'*y);
nmseR = sum((X*prLS-y).^2)/length(y)/var(y);