2

As far as I understand, estimating the error of a model, say an artificial neural network, requires to know the "true" model. Wikipedia says in its article "Errors and residuals": "The error (or disturbance) of an observed value is the deviation of the observed value from the (unobservable) true value of a quantity of interest". While "the residual of an observed value is the difference between the observed value and the estimated value".

However, to know the true value is in practice commonly not possible. Therefore, shouldn't the root mean square error, which evaluates the mean square deviation of the observed values from the predicted values, be called root mean square residual, because it evaluates the difference between observed value (the measurement) and estimated value (the model output)?

Nick Cox
  • 48,377
  • 8
  • 110
  • 156
Funkwecker
  • 2,432
  • 5
  • 24
  • 43
  • 4
    You have a very good case. Nevertheless in literature I scan RMSE is common and RMS residual is (I think) not seen at all. SE of residuals is more common. Statistical terminology is more or less a nightmare if you start to think about it: most statistical people might agree that in principle we should start again, simplify, and throw out poorly chosen terms. The problem is agreeing on what is good. You could start with "standard error" which as seen is also estimated and what does "standard" mean any way? (Answer: I presume it is intended to echo standard deviation.) – Nick Cox Nov 24 '15 at 12:21
  • 2
    I'm not at all sure that the term "error" is consistently used, *pace* Wikipedia. For instance, the Merriam-Webster dictionary actually uses the phrase "residual error" for the residual and several of the top Google hits in a search for "error residual" (focusing on medicine and investment) define "error" as the residual itself. – whuber Nov 24 '15 at 14:35
  • 3
    As usual @whuber is correct. Distinctions between error and residual are most likely to be 20th or 21st century attempts to make mathematically rigorous the difference between the underlying process (or population) and a sample result. I'll bet that going back to 18th and 19th and early 20th century thinkers would underline that they were not in doubt that there was a difference between the number they had and what it estimated, but they didn't see much point in elaborating the issue with a complicated notation or terminology. – Nick Cox Nov 24 '15 at 15:06

0 Answers0