1

I have stumbled across these two questions and accepted answers:

(1) Do we need gradient descent to find the coefficients of a linear regression model?

(2) Why use gradient descent for linear regression, when a closed-form math solution is available?

The answers to both questions are contrary.

In the answer to question one it is stated that gradient descent is the worst algorithm for solving linear regression and should never be used.

On the other hand, in the answer to question two it is argued that gradient descent would be much more efficient for huge systems than using QR, e.g.

How do these questions relate to each other? Is one of the answers incorrect / outdated? Or did I miss the difference between the two questions?

user3617992
  • 111
  • 1
  • Hi, welcome. You have provided a solution yourself: *"that gradient descent would be much more efficient for huge systems than using QR."* – Jim Nov 17 '18 at 15:36
  • OK, that's what I had guessed as well. But the tenure of the answer in link (1) seemed to be that gradient descent was the WORST solution, which noone would use EVER to solve a linear regression problem. – user3617992 Nov 17 '18 at 18:16
  • Is there an estimate on the size of the matrix, when it makes sense to switch from QR (or SVD) to a gradient descent method? – user3617992 Nov 17 '18 at 18:17
  • I personally would prefer to use no more than than half of the available system memory, so that would guide me in estimating the maximum matrix size. – James Phillips Nov 17 '18 at 18:28
  • I agree most with Mark Stone's answer. Linear algebra based linear system solvers are very fast and very good. I can solve a 100,000 observation, 1,000 variable least squares problem using QR decomposition in about 25 seconds. 100,000 observations, 100 variables takes about 0.3 seconds. A symmetric 10,000 variable linear system solved using cholesky takes about 3 seconds. Basically, I'd use mature, linear algebra based linear system solvers unless they don't finish in a reasonable amount of time. – Matthew Gunn Nov 17 '18 at 22:14
  • What if you have to solve >20 linear algebra system wtih a >1k*1k Matrix (which are all different) and that not just one time, but inside an optimization problem which doesn't have a closed form solution anyway? (that is, you have to use a gradient descent to solve the optimization problem and solve the linear system in each iteration). It's an honest question, because I've tried implementing both (solving with tensorflow solve/solve_ls and using gradient descent), but it's difficult to say which is faster, because i couldn't prove the solutions are correct... What about singularity? – user3617992 Nov 18 '18 at 07:25

0 Answers0