Questions tagged [residuals]

The residuals of a model are the actual values minus the predicted values. Many statistical models make assumptions about the error, which is estimated by the residuals.

In the context of regression, the $i^{th}$ residual is defined as:

$$\epsilon_i = y_i - \hat{y}_i$$

Where $y_i$ is the actual value of the $i^{th}$ observation and $\hat{y}_i$ is its estimated or fitted value.

The sum of residuals in a least squares regression is 0, i.e. $\sum_i \epsilon_i = 0$. And the goal of least squares is to find the $\beta$ which minimizes the sum of squared residuals (SSR), i.e.

$$\hat{\beta}_{OLS} = \underset{\beta} {\text{argmin}} \sum_i \epsilon_i^2 = \underset{\beta} {\text{argmin}} \sum_i ( y_i - \sum_j x_{ij}\beta_j)^2$$

1529 questions
131
votes
3 answers

What if residuals are normally distributed, but y is not?

I've got a weird question. Assume that you have a small sample where the dependent variable that you're going to analyze with a simple linear model is highly left skewed. Thus you assume that $u$ is not normally distributed, because this would…
MarkDollar
  • 5,575
  • 14
  • 44
  • 60
103
votes
5 answers

Diagnostic plots for count regression

What diagnostic plots (and perhaps formal tests) do you find most informative for regressions where the outcome is a count variable? I'm especially interested in Poisson and negative binomial models, as well as zero-inflated and hurdle counterparts…
99
votes
1 answer

Interpreting plot.lm()

I had a question about interpreting the graphs generated by plot(lm) in R. I was wondering if you guys could tell me how to interpret the scale-location and leverage-residual plots? Any comments would be appreciated. Assume basic knowledge of…
Guest
  • 991
  • 2
  • 7
  • 3
86
votes
5 answers

What do the residuals in a logistic regression mean?

In answering this question John Christie suggested that the fit of logistic regression models should be assessed by evaluating the residuals. I'm familiar with how to interpret residuals in OLS, they are in the same scale as the DV and very clearly…
russellpierce
  • 17,079
  • 16
  • 67
  • 98
59
votes
7 answers

Are residuals "predicted minus actual" or "actual minus predicted"

I've seen "residuals" defined variously as being either "predicted minus actual values" or "actual minus predicted values". For illustration purposes, to show that both formulas are widely used, compare the following Web searches: residual…
Tripartio
  • 1,517
  • 1
  • 13
  • 19
57
votes
3 answers

ANOVA assumption normality/normal distribution of residuals

The Wikipedia page on ANOVA lists three assumptions, namely: Independence of cases – this is an assumption of the model that simplifies the statistical analysis. Normality – the distributions of the residuals are normal. Equality (or "homogeneity")…
Roman Luštrik
  • 3,338
  • 3
  • 31
  • 39
56
votes
5 answers

Regression when the OLS residuals are not normally distributed

There are several threads on this site discussing how to determine if the OLS residuals are asymptotically normally distributed. Another way to evaluate the normality of the residuals with R code is provided in this excellent answer. This is another…
53
votes
2 answers

Why is a Bayesian not allowed to look at the residuals?

In the article "Discussion: Should Ecologists Become Bayesians?" Brian Dennis gives a surprisingly balanced and positive view of Bayesian statistics when his aim seems to be to warn people about it. However, in one paragraph, without any citations…
Mankka
  • 633
  • 5
  • 8
53
votes
2 answers

How to read Cook's distance plots?

Does anyone know how to work out whether points 7, 16 and 29 are influential points or not? I read somewhere that because Cook's distance is lower than 1, they are not. Am, I right?
Platypezid
  • 1,197
  • 3
  • 13
  • 16
50
votes
4 answers

Normality of dependent variable = normality of residuals?

This issue seems to rear its ugly head all the time, and I'm trying to decapitate it for my own understanding of statistics (and sanity!). The assumptions of general linear models (t-test, ANOVA, regression etc.) include the "assumption of…
DeanP
  • 841
  • 2
  • 11
  • 11
49
votes
5 answers

What is residual standard error?

When running a multiple regression model in R, one of the outputs is a residual standard error of 0.0589 on 95,161 degrees of freedom. I know that the 95,161 degrees of freedom is given by the difference between the number of observations in my…
ustroetz
  • 741
  • 1
  • 8
  • 14
46
votes
3 answers

What is the relationship between the mean squared error and the residual sum of squares function?

Looking at the Wikipedia definitions of: Mean Squared Error (MSE) Residual Sum of Squares (RSS) It looks to me that $$\text{MSE} = \frac{1}{N} \text{RSS} = \frac{1}{N} \sum (f_i -y_i)^2$$ where $N$ is he number of samples and $f_i$ is our…
Josh
  • 3,408
  • 4
  • 22
  • 46
43
votes
2 answers

Interpreting the residuals vs. fitted values plot for verifying the assumptions of a linear model

Consider the following figure from Faraway's Linear Models with R (2005, p. 59). The first plot seems to indicate that the residuals and the fitted values are uncorrelated, as they should be in a homoscedastic linear model with normally distributed…
Evan Aad
  • 1,221
  • 2
  • 12
  • 18
43
votes
3 answers

R - Confused on Residual Terminology

Root mean square error residual sum of squares residual standard error mean squared error test error I thought I used to understand these terms but the more I do statistic problems the more I have gotten myself confused where I second guess…
user3788557
  • 1,479
  • 4
  • 22
  • 24
40
votes
5 answers

What is the difference between errors and residuals?

While these two ubiquitous terms are often used synonymously, there sometimes seems to be a distinction. Is there indeed a difference, or are they exactly synonymous?
Constantin
  • 1,117
  • 1
  • 9
  • 24
1
2 3
99 100