1

Assume that all values are real. let $Y$ be a vector of observations and $\hat{Y}$ be a vector of predictions. Then the MSE of the predictions is

$$1/n\sum_{i=1}^n(\hat{Y}_i-Y_i)^2$$

Let $$S = \{s~|~s = (\hat{Z},~Z)= (a + b\hat{Y},~a +bY),~b\neq0~\}$$ be a set of transformations of the pair of prediction and observation vectors.

we can see that $MSE(s_k),~s_k\in S$ does not have the same value for all $s_k$, despite the fact that, 'proportionally', all $s_k$ are equally accurate.

What is an error function that has the same value for all $s_k$? More generally, I'm interested in error functions that will help us to compare the accuracy of predictions from different data sets.

Matt Munson
  • 660
  • 1
  • 6
  • 14
  • By introducing subscripts on $a$ and $b$ you are suggesting that $S$ is either finite or countable; thus, you seem not to be looking at all such transformations, but only a predetermined set of them. That could greatly complicate the answer. Could you clarify what you're trying to ask? Regardless, could you explain the sense in which $R^2$ does not answer your question? That might help us understand better what you really mean by "accuracy of predictions." – whuber May 06 '16 at 14:57
  • @whuber sorry, I wasn't aware of the convention about subscripting, I was just trying to say that $a$ and $b$ are not constants. Its looks like $R^2$ might be close enough to what I'm looking for, but I can't really digest it fast enough to answer immediately. I'm surprised though that I keep seeing $MSE$ for predictions and not $R^2$. Regardless, It looks like $R^2$ still has problems when some values are negative and others positive, especially when the mean of $Y$ approaches $0$. – Matt Munson May 06 '16 at 15:48
  • That last comment suggests you should double-check what you think $R^2$ is. There are no problems whatsoever in computing it, regardless of the signs of the data or projections, unless the observations are perfectly constant. – whuber May 06 '16 at 15:54
  • @whuber Ah, I see what you mean. $R^2$ is exactly what I'm looking for. So what's so great about the $MSE$ then (if you will forgive my ignorance)? – Matt Munson May 06 '16 at 16:16
  • The MSE is essential for understanding how accurate *on an absolute scale* a regression (or predictor) might be. Contrast these two conclusions: "We are able to predict a company's annual income with an $R^2$ of $0.99$" and "We are able to predict a company's annual income to within a range of $10$ million Euros." (These could be perfectly consistent statements.) The former is primarily a statement about how your model fits your particular dataset whereas the latter tells us in practical terms how good the predictions really are. – whuber May 06 '16 at 18:47
  • @whuber now I'm not as sure about $R^2$. Don't we get a more sensible value if we use $|y_i-\bar{y}|$ and $|e_i|$ instead of the squares? – Matt Munson May 06 '16 at 20:56
  • There are many, many alternatives of that sort. Since you use MSE as your point of departure, though, it practically screams for using $R^2$ as the normalized version you are seeking. – whuber May 06 '16 at 21:15
  • @whuber Fair enough. I suppose we could just as easily replace MSE with the mean absolute error. I don't understand why squared versions are always preferred for these kinds of things. – Matt Munson May 06 '16 at 21:30
  • For some insight to why people like squares, you might enjoy starting with http://stats.stackexchange.com/questions/118/ . In your situation I think there may be deeper reasons related to a decision-analytic formulation of these problems. It turns out in many cases that squaring doesn't matter, but the behavior of a *loss function* near the origin is fundamentally important. Much can be proven for loss functions that behave quadratically near the origin--and that covers a lot of practical ground, where it doesn't hurt much to make small errors but larger errors should get larger penalties. – whuber May 06 '16 at 21:38
  • @whuber That's a very interesting page. I have heard some of the arguments before about the math being more convenient and the properties of SD as it relates to the normal distribution. I trust those arguments make sense. However, if I want the estimate as an end in itself, and not an intermediate, I always come back to absolute values. If you tell me the SD of the uniform $[-1,1]$ is $\sqrt{1/3}=0.5773...$, is that actually an insightful value? Whereas $MAD=1/4$ is a value that I can actually reason with (and not just because its a nice number). And likewise for other distributions. – Matt Munson May 07 '16 at 02:17
  • @whuber I have to say, though, the idea of the residuals as a vector $v$ in $\mathbb{R}^n$, which then gives $SD = ||v||$ is a pretty cool insight. I hadn't heard that before. – Matt Munson May 07 '16 at 02:33

0 Answers0