I am trying to use exploratory data analysis to decide which model to use with my data for prediction either linear regression/neural networks etc, basically I am focusing on linear and non linear models. What kind of exploratory analysis can I do to decide whether a linear model is good enough for me or a non linear model would be good?
-
possible duplicate of [Exploratory analysis](http://stats.stackexchange.com/questions/69495/exploratory-analysis) – Peter Flom Sep 08 '13 at 20:03
-
1@PeterFlom to my reading, that's a quite different question. The other one was rightly put on hold, but this is asking something different from that one, and I think it's quite possible to answer this one. – Glen_b Sep 08 '13 at 23:09
-
@Glen_b OK, we'll leave it open – Peter Flom Sep 08 '13 at 23:33
1 Answers
The analysis as to whether a linear model is adequate is often done post-hoc via diagnostic analysis of residuals.
One arguably exploratory approach would be to use partial regression plots, also called added variable plots. While they're often conceived as post-hoc diagnostics, if you haven't any variables in a model yet (investigating relationship between $y$ and a single $x$, say), they're exploratory in nature.
A second approach would be via Tukey's ladder; if transformations of $x$'s alone can achieve reasonable linearity (as long as the other assumptions are feasible), linear regression may be fully adequate.
There are other possible choices.
If you rule out transformation, you might look at loess/lowess plots or other scatterplot smoothers as an indicator of a nonlinear relationship.
If you have multiple predictor variables, it becomes quite tricky to assess non-linearity without already having adjusted for the other predictors.

- 257,508
- 32
- 553
- 939
-
1From that wikipedia page: *"Since the strengths and weaknesses of partial regression plots are widely discussed in the literature, it is not discussed in any detail here."* Useful. Is there any discussion of the uses/interpretation/shortcomings of partial regression plots anywhere easily accessible? I don't have access to the books mentioned on wikipedia. – naught101 Aug 17 '16 at 02:21
-
@naught101 Silverfish's answer [here](http://stats.stackexchange.com/questions/125561/what-does-an-added-variable-plot-partial-regression-plot-explain-in-a-multiple) may help, and the list of useful properties [here](http://www.itl.nist.gov/div898/software/dataplot/refman1/auxillar/partregr.htm) might also be helpful – Glen_b Aug 17 '16 at 02:40
-
Heh, that second link actually has the same question in it verbatim - looks like a significant part of the wikipedia page has been copied directly from there. Silverfish's answer is quite good though :) – naught101 Aug 17 '16 at 05:07
-
1@naught Yeah, I nearly reported it as a copyright violation, but there's a notice at the bottom of the wikipedia page "This article incorporates public domain material from websites or documents of the National Institute of Standards and Technology." ... I presume it's legit, then. – Glen_b Aug 17 '16 at 08:41