What criteria tell us that the prediction of a model is reliable

Question

What criteria can be used to tell whether the prediction of a model will be more reliable than other specifications.

Background:

We have data with $N$ computers.
However, prices available only for, approx., $N/2$ computers.
I build bunch of different models using these $N/2$ observations.
Using one of these model (the "best" one) I want to predict the prices which is not available (they doesn't exist in real) in my data.
What criteria can be considered to tell whether some model is the "best" in my case if comparison with other specifications?

I am inclined to believe that $R^2_{adj.}$ is an appropriate measure here. Is that right?

Please, look at my answer below.

@Glen_b I added some clarifications in the question. In other words, I have bunch of models and I want to look at some measures and decide that this model is better than that model. What are these measures? $R^2_{adj.}$? or else? — Vladimir Iashin, Jan 25 '15 at 04:12
Now you need to define what you mean by 'best'. We can't choose *your* criteria for what's 'reliable' or 'best' *for your purposes* -- at least not without some clear identification of what you need to achieve. However, if you seek out-of-sample performance of some kind, you probably don't want to rely on $R^2$ (including adjusted $R^2$). You might look into cross-validation, but you *still need a criterion to optimize*. — Glen_b, Jan 25 '15 at 06:37
@Glen_b I'm sorry, but it seems that we have some misunderstanding. Exactly, I want to find that _criterion_ that ceteris paribus gives me some understanding that my model do well in comparison to other specifications. — Vladimir Iashin, Jan 25 '15 at 07:51
I think that adjusted $R^2$ is nice criterion. But [not always](http://people.duke.edu/~rnau/rsquared.htm). Also see [my related answer](http://stats.stackexchange.com/a/131217/31372) and links within. Additionally, you can consider information theory-based measures, such as AIC/BIC. — Aleksandr Blekh, Jan 25 '15 at 09:25
We've gone from 'reliable' to 'best' to 'do well'. You still need to define what it is you want to do well *at*. There's an infinite number of choices of things that *all* - when you optimize them - count as being 'best' or 'doing well' ... *at that criterion*. If people give you a criterion to optimize, they're *choosing your preferences for you*. Let's say I suggested MSE as a criterion and you use cross-validation so it has out-of-sample validity. Why would that be better than mean absolute error? — Glen_b, Jan 25 '15 at 16:58

score 2 · Answer 1 · answered Jan 24 '15 at 18:30

2

How well your model works and how well your model fits are different questions.

Have a look at the Wikipedia page on "goodness of fit". $R^2$ is a good start. You should also check properties of the residuals. E.g. quantile plots, residual power, etc. They'll help you understand how errors are distributed and why. That'll help you understand how well your model fits.

I suggest dividing the data you do have prices for, fitting your model with some of it and using the rest to cross-validate: an 80/20 split is conventional. Performance on the validation set can then be evaluated using the same techniques as mentioned above. That'll help you understand how well you model might work, assuming you have enough data and the future instances you're predicting are like the past.

The only real answer to how well your model actually works is to forward test it and evaluate. I.e. try it and see. The true answer is always retrospective.

answered Jan 24 '15 at 18:30

Emir

351
1
3

I'd recommend using Mean Squared Predicted Error to determine how well your model predicts. See: http://en.wikipedia.org/wiki/Mean_squared_prediction_error – StatsStudent Jan 24 '15 at 18:42
Thanks for the tips. I am also want to note that I do not want to predict the future. I want to predict values, that just does not exist. – Vladimir Iashin Jan 24 '15 at 19:40
That may change things a lot since it could imply that you are interpolating. Is your sample patterned in some way? Do you just have half the available observations randomly? – Emir Jan 24 '15 at 21:47
Unfortunately, the values of the price does not exist at all, since this observations of computers that were enter the market after that period that I care about. My aim is to predict how much some computer would cost like it was existing by the moment of time I care about. Yes, there is some biasedness that these computers are newer (by half of a year) than those which I use as a base for my models. – Vladimir Iashin Jan 25 '15 at 04:13
I'd suggest proceeding as I've suggested in the answer. It seems to apply entirely to your problem as I've understood it. If you think that there is a systematic relationship with time and you have enough data, then you can try and capture it by time as a variable. – Emir Jan 25 '15 at 06:32
@StatsStudent thanks, it nice to know about that method. However it assumes that I have the true prices, but, in fact, I don't – Vladimir Iashin Jan 25 '15 at 07:56
@Emir undoubtedly, your answer is reasonable and I will try the cross validation. I just want some preciseness in the answer that I'm looking for, but not general thoughts [sorry, for criticism] – Vladimir Iashin Jan 25 '15 at 07:59

score 1 · Answer 2 · edited Apr 13 '17 at 12:44

1

If we want to compare the predictive wellness of models. Here is what I gather from the replies:

Absolute values

AIC / BIC. These criteria can be used when a dependent variable is the same. Moreover, BIC-value is useful only when the number of observations is equal among the candidate model. These are appropriate for nested models Is there any reason to prefer the AIC or BIC over the other?. See also AIC & BIC vs. Crossvalidation
RESET. Use PRESS, not R squared….
$R^2_{adj.}$. What’s a good value for R-squared?.

Tests

Cross-validation techniques.
Lack-of-fit sum of squares. wiki.

Thanks to @Emir and @AleksandrBlekh

edited Apr 13 '17 at 12:44

Community

1

answered Jan 25 '15 at 11:16

Vladimir Iashin

153
9

1

Sample size might make AICc preferable. I also like the analytic simplicity of AIC. If you have a good closed-form for BIC then use it. These are derived from the Kullback-Leibler divergence, which comes from Shannon's mathematical theory of information. These criteria come from the same root source, and are all approximations to the log-likelihood. Their differences arise from the assumptions and substitutions used to in the derivation. Be cautious about how you use AIC to compare models. It must apply to the same data. Cross validation doesn't do that. – EngrStudent Jan 25 '15 at 18:31

What criteria tell us that the prediction of a model is reliable

2 Answers2