This question will be a little wordy - I'll try to summarize at the end.
I'm currently working on a machine learning library and I'm implementing GLMs. To fit my models I've been implementing an Iteratively Reweighted Least Squares algorithm. (As described here around slide 26. I'll keep this notation when discussing below.)
However in practice I'm having some numerical issues. In my practice case I'm trying to train a logistic regression model - i.e. a Bernoulli distribution with a logit link function. This also simplifies some of the maths in the algorithm.
The problem I'm running into is that my values on the w
diagonal get very close to zero, which when inverting the matrix causes numerical overflow. From what I can tell, if w
is very close to zero then z
is very close to the fitted values and the two inverses should cancel each other out, and the algorithm has converged. However computationally this is not happening.
How is this algorithm generally implemented in practice? Is it easier to implement in the form b_i+1 = b_i + STUFF
? If so how do I choose my initial value b_0
?
EDIT: I've noticed that if I use a lower number of iterations I can achieve a well fitted model. Maybe implementing a stopping condition would be enough but I'd like to know if it's possible to have the algorithm converge and be numerically stable.
For reference my source code is here. It's likely to change in the future.