-4

Can I get some examples where Linear Regression might give inaccurate prediction? Preferably with Python code example. I have obtained one example in Anscombe's Quartet. Any others?

Dave
  • 28,473
  • 4
  • 52
  • 104
  • 3
    Obviously, if the relationship between the variables is not linear, then linear regression is not going to be terribly useful,. There are lots of non-linear relationships. –  Dec 14 '21 at 20:29
  • 3
    You obtained *four* examples with Anscombe's Quartet: that's what "quartet" means! You can construct an infinitely varied set of such examples by following the procedure at https://stats.stackexchange.com/a/152034/919 (which includes working `R` code). – whuber Dec 14 '21 at 20:51
  • 2
    I think you have to be very much clearer on your question. I can approximate any non linear relationship by adding transformed input variables. eg x_1^2 *sin(x_2). – seanv507 Dec 14 '21 at 21:48
  • @seanv507 What about all of Anscombe's Quartet? – Dave Dec 14 '21 at 21:49
  • anscombe's quartet https://en.wikipedia.org/wiki/Anscombe%27s_quartet has a) linear relationship b) quadratic relationship and 2 examples with outliers where I assume *robust* linear regression would identify the expected relationship. – seanv507 Dec 15 '21 at 08:03

1 Answers1

0

I am going to assume you are talking about using a linear regression model in Machine Learning (as in creating a linear equation to predict the outputs associated with some future unknown inputs). Instead of "accuracy," we instead often think about minimizing risk (thus maximizing accuracy).

So your question is essentially asking when would a linear regression model give us high risk (more specifically high risk on unforeseen data, often estimated using Structural risk). The answer to that question involves many factors, which are described in detail here and here.

One factor which could be considered the most important, and that I think is what you are really asking, is whether the output value for a given input value can be found through a linear combination of the input variables – either in the original state of the input variables or after transformation of the input variables, although of course depending on the transformation done one must be careful not to overfit.

Phoenix
  • 36
  • 4
  • As a general proposition, whether the input variables enjoy any linear relationship among each other is *completely irrelevant* in linear regression. I believe what you meant to say is that the *response variable* must be reasonably well described by a linear combination of the explanatory (input) variables. – whuber Dec 21 '21 at 23:49