When Not To Use Linear Regression?

Question

Can I get some examples where Linear Regression might give inaccurate prediction? Preferably with Python code example. I have obtained one example in Anscombe's Quartet. Any others?

Obviously, if the relationship between the variables is not linear, then linear regression is not going to be terribly useful,. There are lots of non-linear relationships. — , Dec 14 '21 at 20:29
You obtained *four* examples with Anscombe's Quartet: that's what "quartet" means! You can construct an infinitely varied set of such examples by following the procedure at https://stats.stackexchange.com/a/152034/919 (which includes working `R` code). — whuber, Dec 14 '21 at 20:51
I think you have to be very much clearer on your question. I can approximate any non linear relationship by adding transformed input variables. eg x_1^2 *sin(x_2). — seanv507, Dec 14 '21 at 21:48
anscombe's quartet https://en.wikipedia.org/wiki/Anscombe%27s_quartet has a) linear relationship b) quadratic relationship and 2 examples with outliers where I assume *robust* linear regression would identify the expected relationship. — seanv507, Dec 15 '21 at 08:03

Phoenix · Answer 1 · 2021-12-24T06:22:36.300

I am going to assume you are talking about using a linear regression model in Machine Learning (as in creating a linear equation to predict the outputs associated with some future unknown inputs). Instead of "accuracy," we instead often think about minimizing risk (thus maximizing accuracy).

So your question is essentially asking when would a linear regression model give us high risk (more specifically high risk on unforeseen data, often estimated using Structural risk). The answer to that question involves many factors, which are described in detail here and here.

One factor which could be considered the most important, and that I think is what you are really asking, is whether the output value for a given input value can be found through a linear combination of the input variables – either in the original state of the input variables or after transformation of the input variables, although of course depending on the transformation done one must be careful not to overfit.

As a general proposition, whether the input variables enjoy any linear relationship among each other is *completely irrelevant* in linear regression. I believe what you meant to say is that the *response variable* must be reasonably well described by a linear combination of the explanatory (input) variables. — whuber, Dec 21 '21 at 23:49

When Not To Use Linear Regression?

1 Answers1