0

I ran a regression analysis to predict performance on an exam, calculated residuals and took an average and sum of the residuals and the result was ~ -0.011 (for average) and -2.34 for the sum of residuals. This was when the calculation was done on Excel. Using SPSS, when I requested for residuals to be saved while running the regression and then did a descriptive statistics for those residuals, the average was 3.4 E-14 and the sum was 0.0000. I'm not sure why two programs would result in vastly different values. Also, a tad confused as to what could be causing the negative value for sum of residuals despite the plots not revealing any glaring homoscedasticity or outliers.

With regards to the sum/average of residuals being equal to zero, does the value have to exactly zero or does it have to be something with a really negative exponent to qualify as zero? What would be considered close to zero or almost zero in terms of a threshold? If there are references that could be provided for this threshold or range, that would be much appreciated too. Thanks!

user28687
  • 61
  • 3
  • 3
    Both the sum and mean of residuals in a least-squares regression are exactly zero so long as an intercept term is included in the model. This is a consequence of the "normal equations" that are solved to find the estimates of the regression coefficients - see e.g. [this answer](http://stats.stackexchange.com/a/263327/22228). Any discrepancy from this is purely due to numerical error in the calculation. This holds true regardless of homoskedasticity/heteroskedasticity or outliers, and even regardless of the distribution of the true error term (which the residuals are an estimate of). – Silverfish Feb 27 '17 at 01:35
  • 2
    This question might be considered a duplicate of [Why do residuals in linear regression always sum to zero when an intercept is included?](http://stats.stackexchange.com/q/189584/22228) If that doesn't answer your question, you might want to edit this one to clarify this one. It sounds to me like you are interested whether the mean of the true error term (usually denoted $\varepsilon$ or, particularly in econometrics, $u$) is zero... unfortunately we can't deduce this by looking at the residuals (usually denoted $\hat \varepsilon$) – Silverfish Feb 27 '17 at 01:38
  • 1
    The only correct answer is 0 for both the sum and average, whatever remains is error. When the errors are large, they tend to be mistakes, when they are small, they tend to be machine error, and can be made smaller by extending precision. – Carl Feb 27 '17 at 02:54
  • When comparing Excel's results to those of any statistical package, the first two hypotheses to evaluate are (1) the possibility of user error and (2) that Excel is incorrect. These are far more likely than the alternatives; namely, (3) that the other software is incorrect or (4) there is more than one correct answer. – whuber Feb 27 '17 at 14:33

0 Answers0