For learning purposes I have been using a subset of the MNIST data--just the training set and just the digits 3 and 8. I want to try linear regression for doing this classification. (Yes I realise that logistic regression is more appropriate). In R syntax (this is not an R specific question), I do:
b<-qr.solve(cbind(1,x),y)
qr.solve() solves systems of equations via the QR decomposition
x is the data matrix, one row per handwritten digit, each col is one pixel (28x28)
the cbind() adds a column of 1s onto x so that I can fit an intercept
y is the true identity of each digit, 0=3, 1=8
b is the vector of fitted coefficients
The problem is that qr.solve() fails, giving the error that x is singular. I guess that this means the columns of x are linearly dependent. This can be caused multiple ways. I have removed duplicate columns, zero columns, and columns that contain a single value. After doing this, qr.solve() still chokes and says x is singular.
How can I determine what columns are causing trouble? How to modify x so that linear dependence is gone? (I realise that lasso and ridge regression would be good options here to regularise x [down-weight linearly dependent cols].) One thing that works is to add noise to x, but that seems fishy to me.