What is the semantic difference between Mean Squared Error (MSE) and Mean Squared Prediction Error (MSPE)?
2 Answers
The difference is not the mathematical expression, but rather what you are measuring.
Mean squared error measures the expected squared distance between an estimator and the true underlying parameter:
$$\text{MSE}(\hat{\theta}) = E\left[(\hat{\theta} - \theta)^2\right].$$
It is thus a measurement of the quality of an estimator.
The mean squared prediction error measures the expected squared distance between what your predictor predicts for a specific value and what the true value is:
$$\text{MSPE}(L) = E\left[\sum_{i=1}^n\left(g(x_i) - \widehat{g}(x_i)\right)^2\right].$$
It is thus a measurement of the quality of a predictor.
The most important thing to understand is the difference between a predictor and an estimator. An example of an estimator would be taking the average height a sample of people to estimate the average height of a population. An example of a predictor is to average the height of an individual's two parents to guess his specific height. They are thus solving two very different problems.

- 281,159
- 54
- 637
- 1,101

- 15,190
- 4
- 24
- 40
-
1But the wiki page of MSE also gives an example of MSE on predictors,http://en.wikipedia.org/wiki/Mean_squared_error – avocado Dec 26 '13 at 13:09
-
1Not sure estimator vs predictor is meaningful here. Both are metrics that measure actual y vs f(x) where f(x) is meant to approximate y from feature vector x – Terence Parr Dec 10 '18 at 18:28
-
1This answer would be better if it addressed the possibility that MSE may be used to mean different things in different contexts. – eric_kernfeld Feb 10 '19 at 21:12
Typically, MSE involves only training data. The error here refers to how far the observed training response data is from the fitted response data (based on a model fit on the training data itself).
On the other hand, MSPE typically involves a testing set that was not part of the model training. The error here refers to how far the predicted testing data (predicted based on a model already fit on the training data) is from the observed testing data.

- 593
- 4
- 7