6

I know the computational costs for the closed form of linear regression is $O(n^3)$, but I can't find a similar cost comparison to gradient descent.

There are some similar questions here with people "talk" about how gradient descent is more efficient and present some formulas that are not in the form of $O(\cdot)$ and do not include where they got their information.

So to reiterate, I am looking for the computational complexity for gradient descent in the form of $O(\cdot)$, something where $O(\cdot) < O(n^3)$.

It's possible I'm thinking about this wrong and there is no big $O$ comparison. If so please let me know. Thank you.

QuantStyle
  • 61
  • 1
  • 1
  • 2
  • 1
    You can estimate parameters of linear regression using gradient descent, you probably meant OLS https://en.wikipedia.org/wiki/Ordinary_least_squares – Tim May 12 '19 at 10:05

3 Answers3

5

The computational cost of gradient descent depends on the number of iterations it takes to converge. But according to the Machine Learning course by Stanford University, the complexity of gradient descent is $O(kn^2)$, so when $n$ is very large is recommended to use gradient descent instead of the closed form of linear regression.

source: https://www.coursera.org/learn/machine-learning/supplement/bjjZW/normal-equation

  • 3
    Is it correct to assume that $k$ here is the number of iterations? Otherwise, what's the meaning of $k$? – gc5 Mar 18 '21 at 18:30
2

The number of iterations a gradient method takes to reach a local optimum for a prescribed tolerance is problem dependent: depends on the shape of the surface you are exloring and the initial guess. Hence, no general O() expression for complexity can be given.

F. Tusell
  • 7,733
  • 19
  • 34
  • Aside from what I said in my answer, where do you get from the $O(n^3)$ complexity for closed form linear regression? Check https://datascience.stackexchange.com/questions/35804/what-is-the-time-complexity-of-linear-regression. – F. Tusell Mar 26 '20 at 18:08
  • In that case, we can give it in terms of $t$ where $t$ is the number of iterations. as example, $O(tN^5)$ – Zeel B Patel Mar 06 '21 at 01:20
0

Gradient descent has a time complexity of O(ndk), where d is the number of features, and n Is the number of rows. So, when d and n and large, it is better to use gradient descent.