5

Ridge regression has been around since at least the 70's (earliest paper I could find) but we've only recently started using it, as of January 2020, to the best of my knowledge. With Stata it takes me several seconds to run on a simple data set so I can understand it probably had been running for days 20 years ago.

My question is: Are there models currently available, theoretically, that we simply can't run because of computing power? I would be intrigued to hear of some.

Paze
  • 1,751
  • 7
  • 21
  • 3
    I don't know about Stata but I have seen ridge regression in use since the 1980s at least. There are many methods that cannot be run precisely. For example k-means clustering. All available algorithms provide an approximate solution that depends on initialisation. Finding a global optimum of the objective function (which would be the proper k-means solution) is NP-hard and can only be done for datasets of size $<30$ or so (not sure about the precise number). There are many, many methods like this. – Christian Hennig Jan 08 '20 at 23:08
  • You could argue that ridge regression (at least with that sort of terminology) originates in 1962; see the references of [this](https://stats.stackexchange.com/a/151351/805) answer – Glen_b Jan 09 '20 at 01:16
  • You could maybe make a case for Manus Foster (1961) *An Application of the Wiener-Kolmogorov Smoothing Theory to Matrix Inversion* or perhaps even Tikhonov earlier than that (though before this I think his work on regularization wasn't explicitly regression related; that might be worth checking) – Glen_b Jan 09 '20 at 01:25
  • 1
    It surprises me to hear that ridge would take several seconds in Stata, given that there is a closed-form solution that seems no more expensive than plain OLS: https://stats.stackexchange.com/questions/276983/ridge-regression-with-r – Christoph Hanck Jan 09 '20 at 07:19
  • There are lots of procedures that will remain computationally infeasible, such as exhaustive feature selection, where the number of models that must be evaluated grows exponentially with the number of features. Of course it is probably not a sensible thing to do anyway, but even for simple linear regression, it quickly becomes infeasible. – Dikran Marsupial Jan 09 '20 at 10:19
  • 1
    @ChristophHanck it could be that the implementation generates the ridge trace (investigating different values of the ridge parameter), but if done in canonical form, even that shouldn't take seconds for simple datasets. – Dikran Marsupial Jan 09 '20 at 10:21

1 Answers1

2

This reminds me of a recent conversation with my dad who was surprised to hear that I still run models taking days which he did decades ago:

Are you still doing computations that take longer than a day? Shouldn't everything run fast now? What are you people (physicists) improving on modeling nowadays, when there have already been satisfying models in the past? If a model is sufficient why improve it?

Probably he was joking/provoking. There are always more problems with extra layers of complexity or that require more precise answers.

Computing power is never enough to be able to solve all problems. And there will always be problems that are conceived before computational power is sufficient. (the ultimate question)


There are a lot of examples that are beyond the current limits of computational power. These examples are continuously changing. A search (e.g. on google) will give you a reasonable list*:

https://scholar.google.com/scholar?as_ylo=2019&q=supercomputer+days

Often these problems involve some sort of high dimensionality:

You could split this dimensionality into two types. Either the problem/question is larger (increase system complexity and size that is being investigated) or the method to solve them is larger (like introducing hyperparameters or using monte carlo method, sometimes these might not be neccesary and a more direct analytical solution/approximation exists). The ridge regression is a bit of both: it is applied to problems of large size (many covariates) and it is introducing a hyperparameter that needs to be optimized (the regularization parameter).


*Although that list reveals more current state of the art. It is what is currently at/near the limits of computational power (and thus, a lot of that will still develop when computational increases and can be considered ahead of computational power).

The search does not so much reveal theory that is so far ahead of time such that it is not even applied on supercomputers in a simple form (or only rarely). Because, if it is not yet applied on a supercomputers then this will not easily generate hits when searching with the searchterm 'supercomputer'.

A very particular example of some method that is ahead of time that you will not find easily in that particular google search is quantum computing.

Sextus Empiricus
  • 43,080
  • 1
  • 72
  • 161