5

I'm wondering what the difference is between:

  1. 'predicted by residual plot' where I plot the residuals of the regression with the predicted values of the regression ;
  2. the case where I plot the residuals with the predictor variables.

Also I'm wondering how to make such a plot in R in the case of multiple regression. Do I have to make a plot for each predictor separately?

chl
  • 50,972
  • 18
  • 205
  • 364
upabove
  • 2,657
  • 10
  • 30
  • 37
  • [Here](http://www.miabella-llc.com/demo.html) is an web-based, interactive tool for [plotting regression results in three dimensions](http://www.miabella-llc.com/demo.html). You can enter your data set through an online form at the bottom of the page. This 3-D plot works with one dependent variable and two explanatory variables. You can also set the intercept to zero (i.e., remove the intercept from the regression equation). The graphics require a WebGL-capable browser. The most recent versions of all major desktop browsers support WebGL (although Safari's WebGL might be disabled by default, a – Mountains Jan 15 '14 at 21:15
  • @Android3D Because you already posted this answer at http://stats.stackexchange.com/questions/73320/how-to-visualize-a-fitted-multiple-regression-model/82396#82396, I am converting it to a comment here. – whuber Jan 15 '14 at 21:19

2 Answers2

7

A plot of residuals versus predicted response is essentially used to spot possible heteroskedasticity (non-constant variance across the range of the predicted values), as well as influential observations (possible outliers). Usually, we expect such plot to exhibit no particular pattern (a funnel-like plot would indicate that variance increase with mean). Plotting residuals against one predictor can be used to check the linearity assumption. Again, we do not expect any systematic structure in this plot, which would otherwise suggest some transformation (of the response variable or the predictor) or the addition of higher-order (e.g., quadratic) terms in the initial model.

More information can be found in any textbook on regression or on-line, e.g. Graphical Residual Analysis or Using Plots to Check Model Assumptions.

As for the case where you have to deal with multiple predictors, you can use partial residual plot, available in R in the car (crPlot) or faraway (prplot) package. However, if you are willing to spend some time reading on-line documentation, I highly recommend installing the rms package and its ecosystem of goodies for regression modeling.

chl
  • 50,972
  • 18
  • 205
  • 364
  • and in the case of plotting the residuals with the predictors to check linearity that means that you plot the same residuals with the different predictoes. does it need to be linear with each predictor? – upabove Nov 11 '11 at 14:32
  • 3
    No, when you have multiple predictors, you don't use the regular residuals but instead the partial residuals; this is known as either a component-plus-residual plot (`crplot` in car) or a partial residual plot (`prplot` in faraway); however, curvature from other predictors can "leak" into these plots, so they're not guaranteed to give you exactly what you want. A CERES plot (also in car) is better but still not perfect. These plots are indeed created for each predictor in the model separately, I believe car has a function to do them all at once and faraway may too. – Aaron left Stack Overflow Nov 11 '11 at 16:41
  • 3
    Here's a function to make all the prplots for a given lm fit, using ggplot2: https://gist.github.com/3041812. – Quantitative Historian Jul 03 '12 at 22:18
3

After you fit an lm object, you can plot it.

e.g.:

model <- lm(y~x,data=data.frame(y=rnorm(25),x=rnorm(25)))
plot(model)
?plot.lm

edit: example 2, which you should have posted yourself:

rm(list = ls(all = TRUE)) #CLEAR WORKSPACE
library(foreign)
Data <- read.dta('http://dl.dropbox.com/u/22681355/child.iq.dta')
model <- lm(ppvt~momage+educ_cat, Data)
plot(model)
Zach
  • 22,308
  • 18
  • 114
  • 158
  • thanks! can you explain what each plot refers to? are these all predicted by residual plots? – upabove Nov 11 '11 at 15:01
  • @DBR: Each plot is accurately titled, and you can read more in `?plot.lm`. Plot 1 is the "predicted by residual plot" you are looking for. Plot 2 is to test the residuals for normality, and plots 3 and 4 are more advanced. – Zach Nov 11 '11 at 15:07
  • http://dl.dropbox.com/u/22681355/plot.tiff this is what mine looks like so its not really titled and I can't really figure out which one is which. plot.lm doesnt seem to help? – upabove Nov 11 '11 at 15:17
  • @Dbr: Please post a reproducible example. Whatever you're plotting, it's not an `lm` object. Did you try the code I posted? – Zach Nov 11 '11 at 16:16
  • http://dl.dropbox.com/u/22681355/child.iq.dta this is the datafile. All I'm doing is model – upabove Nov 11 '11 at 16:23
  • @Dbr I can't open that data file. What version of R are you using? Perhaps save it as a csv file. – Zach Nov 11 '11 at 19:52
  • let us [continue this discussion in chat](http://chat.stackexchange.com/rooms/1768/discussion-between-zach-and-dbr) – Zach Nov 11 '11 at 19:53
  • @Dbr what happens when you run the example code I posted? Do you get the correct charts? – Zach Nov 11 '11 at 19:54
  • it gives me a single residual vs fitted which would be perfect. You can read my datafile using read.dta in R. sorry about the format, its an example file from Gelman & Hill. – upabove Nov 11 '11 at 21:16
  • @Dbr you need to practice posting **reproducible examples.** Just this once, I've edited my post to demonstrate the sort of thing you need to do in the future. – Zach Nov 12 '11 at 02:38
  • thank you, I'll do that in the future, but could you figure out what's missing? because it gives me 9 plots and I cant tell which is which – upabove Nov 12 '11 at 07:55
  • When you run example 2 in my post, it still doesn't work? Uninstall R, and re-install the latest version. Copy and paste my code into a fresh R console. If that doesn't work, you've got a problem I can't help you with. – Zach Nov 12 '11 at 18:34
  • +1 for your follow-up. @Dbr Please remember advices that were given to you on SO too: a reproducible example is far better than a public link to a Stata datafile that will likely be dead in few weeks. – chl Nov 12 '11 at 19:21