1

Context: When we talk about the performance of liner regression then we take help of performance metrics like:

  1. Mean Absolute Error
  2. Mean Square Error
  3. Root Mean Square Error

We calculate MAE to find an average error and then we compare that error with mean of our data. After that, we try to look at MSE and RMSE. Why we use MSE and RMSE? I read a article where it was written that by using MSE and RMSE we penalize the outlier.

  1. What is punish/penalize the outliers means?

  2. Why do we need to check the spread using MSE and RMSE?

  3. If our MAE is large that means we have outliers in data-set right?

  4. If we know our data set have outliers then why we calculate MSE and RMSE? we simply can remove them if not necessary!

  5. Suppose our error is 1.2, mean is 14 and RMSE is 1.4 then according to me RMSE is telling that prediction has deviated 1.4 from actual which is exactly what error is telling us.

  6. MAE = Mean Absolute Error

  7. MSE = Mean Square Error

  8. RMSE = Root Mean Square Error

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
  • 2
    I gues that punishing the outliers means that you slap them with a stick such that they become less misbehaving. But for the rest I am confused about this question. Who tells you that you need to punish the outliers and why? What is the background of this question? – Sextus Empiricus Jan 27 '21 at 09:50
  • @Xi'an MAE = Mean absolute error MSE = Mean Squared error RMSE = Root Mean squared error – ketan gangal Jan 27 '21 at 10:50
  • @SextusEmpiricus I am studying performance matrices for linear regression there I got to know about these three terms and this punish term. – ketan gangal Jan 27 '21 at 10:51
  • Could you tell some more background? What is the problem where you are applying it? What is the point of these terms? – Sextus Empiricus Jan 27 '21 at 10:53
  • @SextusEmpiricus I am reading ISLR(An Introduction to Statistical Learning) there is a data set known as sales data. Which contains Tv, Newspaper, Radio, Sales I trained the model but I am on performance evaluation part where we use MSE, MAE, RMSE to check performance of our model – ketan gangal Jan 27 '21 at 11:06
  • 1
    I've added the tag `outliers`. At that tag are many helpful threads. There is an attitude, which seems close to the surface here, that outliers are bad data points that should be removed or at least discounted. In contrast my favourite definition of outliers runs that they are data points surprising given your model and your understanding of the problem. Sometimes they really are bad data points, or points irrelevant to your goal, but often they just imply that you might need a different model. – Nick Cox Jan 27 '21 at 11:16
  • Outliers can be removed when they are the result of experimental error rather than reflecting the population under investigation. For instance, a designer wants to know the average height of some population in order to make furniture with optimal measures for that group. Then he could have some sample like measurements in cm's 153, 164, 146, 154, 161, 202. This 202 is an outlier (how do you think the analyst knows that?), it reflects an error of the model, and presumably it is an error of the measurement and we might decide to discard this measurement and continue analysis with the remainder. – Sextus Empiricus Jan 27 '21 at 11:31
  • What performance of the model do you evaluate? And what is the confusion? You ask for instance *"what does punish the outliers means?"*. To answer that it would be helpful if you explain in what problem this is done. Without context punishment might mean to put the outliers into jail or give them a fine. – Sextus Empiricus Jan 27 '21 at 11:33
  • @SextusEmpiricus see question now! – ketan gangal Jan 27 '21 at 15:37
  • @NickCox see question now! – ketan gangal Jan 27 '21 at 15:37
  • You post a lot of questions. Some of these have already been answered in other questions and others might relate to wrong assumptions. What is your specific confusion? – Sextus Empiricus Jan 27 '21 at 16:16
  • @SextusEmpiricus just tell me why we calculate MSE and RMSE? what is the significance of this calculation in performance evaluation? – ketan gangal Jan 27 '21 at 16:21
  • These measures are being used in a multitude of ways. It is unclear what you are referring to. It is broad, even when you specify it with 'performance evaluation', which still can be many different things. Did you look up some general introductions about MSE and RMSE? in those introductions they tell why MSE and RMSE are being used. Why are these introductions confusing you? How do you interpret these definitions? What is confusing you? What do you understand about 'error' but not anymore about MSE/RMSE? Do you have a concrete example/application that you do not understand? – Sextus Empiricus Jan 27 '21 at 16:52
  • @SextusEmpiricus Read this article https://medium.com/human-in-a-machine-world/mae-and-rmse-which-metric-is-better-e60ac3bde13d. can you explain the Difference and conclusion part? – ketan gangal Jan 27 '21 at 17:43
  • Does this help? https://stats.stackexchange.com/questions/118/why-square-the-difference-instead-of-taking-the-absolute-value-in-standard-devia – Sextus Empiricus Jan 27 '21 at 22:14
  • @SextusEmpiricus Thanks for the link. I got all the answers. Problem is solved, sir. – ketan gangal Jan 28 '21 at 03:00

0 Answers0