1

How to determine whether a mean squared error is low or high? For example: In my linear regression problem if I get mean squared error as 21.67, how do I decide whether the error is low or high? Is there a bench mark ??

  • Mean absolute deviation (MAD), or mean absolute error (MAE), would be easier to interpret as they use the same scale as the data itself. Also, you could look at $1-R^2$ or $1-R^2_{adj.}$ which also indicates how large your errors are as compared with the data itself. – Richard Hardy Jan 15 '15 at 10:38
  • @RichardHardy what about residual sum of squares?? – Elizabeth Susan Joseph Jan 15 '15 at 10:41
  • I think it shares the same problem as MSE has; it is a relative measure and is also measured on a scale different than that of the data. After all, $RSS=MSE*N$ where $N$ is the number of data points. – Richard Hardy Jan 15 '15 at 10:49
  • @RichardHardy - thats great, This is the first time I came across a team called mean absoulte deviation. So is there any resources to learn more about it? and I am implementing linear regression in python – Elizabeth Susan Joseph Jan 15 '15 at 10:54
  • MAD is quite a simple thing: take the absolute values of all errors and calculate the mean. In MSE, you square the errors first and then calculate the mean, whereas in MAD you take absolute values instead of squaring. Consequently, the interpretation is as straightforward as it can be. There must hundreds of sources from which you can learn about MAD, cannot recommend any particular one. Regarding Python - sorry, I have no experience with it (I use R). But once you have your errors, calculating MAD manually is very simple. – Richard Hardy Jan 15 '15 at 11:02
  • @RichardHardy thanks a lot. So I will use MAD to measure the performance of the model. – Elizabeth Susan Joseph Jan 15 '15 at 11:35
  • Well, this is a deviation from your original question; easy interpretation does not make MAD a universally preferred measure of model performance. But good luck anyway! Just be careful and do not draw stronger conclusions than the particular tool you are using warrants :) – Richard Hardy Jan 15 '15 at 11:41
  • @RichardHardy is there a resource to study more on linear regression?? – Elizabeth Susan Joseph Jan 15 '15 at 11:43
  • Let us [continue this discussion in chat](http://chat.stackexchange.com/rooms/20265/discussion-between-richard-hardy-and-elizabeth-susan-joseph). – Richard Hardy Jan 15 '15 at 11:45
  • @RichardHardy $\text{RMSE}=\sqrt{\text{MSE}}$ would also be on the same scale as the data – Glen_b Jan 15 '15 at 11:49
  • @Glen_b - so which one should I implement ?? – Elizabeth Susan Joseph Jan 15 '15 at 11:53
  • There's no 'should' about it. It depends on what you're interested in looking at. I just wanted to point out that there was no problem with using MSE, since you can use RMSE if you're after something in the units of the data. In terms of your original question, there's no bench mark. – Glen_b Jan 15 '15 at 11:56
  • @Glen_b - but even rmse returns a single number right?? so if I get an rmse 4.34 how do I know whether this error is low or high. – Elizabeth Susan Joseph Jan 15 '15 at 11:59
  • That would be a matter for you and your specific application. I can't tell you what's low or high for your purposes. I already explained that there's no benchmark. The fact that it's only a single number does nothing to change that. – Glen_b Jan 15 '15 at 14:46

1 Answers1

1

MSE is a relative measure. If $y_i$ is your data point and $\hat{y}_i$ is an estimate for this data point, then MSE is:

$$MSE = \frac{1}{N} \sum^N_{i=1} \left( \hat{y}_i - y_i \right)^2$$

If $y$ is measured in meters it will give different results than if it is measured in kilometers etc. You can read more about similar measures of fit in here.

So MSE is low or high comparing to some other model.

Tim
  • 108,699
  • 20
  • 212
  • 390
  • I was doing an housing prices problem in which I got an mse of 24. So how do I compare it to some other model? can you be more specific. I am still confused. – Elizabeth Susan Joseph Jan 15 '15 at 10:39
  • 1
    Why do you insist on comparing models using MSE? You could use something else instead (as I suggested in a comment to the original post). – Richard Hardy Jan 15 '15 at 10:51
  • @ElizabethSusanJoseph as Richard said - why do you insist on MSE? MSE *won't* tell you that your model is "correct", you could use it just to say if it is worse (or better) than other model. – Tim Jan 15 '15 at 11:02
  • But yes, you can compare two models using MSE; the model with the smaller MSE produces fitted values that are closer to the true values (which is what you want). However, be careful; you can always build a very rich model that has a very low MSE but that won't mean your rich model is necessarily better than a parsimonious model with larger MSE. How to go about that? It's a broad topic... :) – Richard Hardy Jan 15 '15 at 11:06
  • @Tim - I was working through a problem, in that problem they were calculating mean squared error. this is the first time I am implementing linear regression. – Elizabeth Susan Joseph Jan 15 '15 at 11:37
  • @ElizabethSusanJoseph [here](http://ww2.coastal.edu/kingw/statistics/R-tutorials/simplelinear.html) you have a tutorial on linear regression in R, you can find other similar tutorials on the web, however it would be the best for you to start with a statistics handbook to understand it better so you can use it. – Tim Jan 15 '15 at 11:41
  • @Tim- Thanks a lot. I will definitely get a stats hand book – Elizabeth Susan Joseph Jan 15 '15 at 11:57