Correlation significancy measure for multiple predictions

Question

I'm not sure what statistical test to use in the following situation:

I have three different algorithms producing regressions for an observation.

I now want to know, which of the algorithm yields the best prediction for the true values, i.e. which algorithms output correlates best to the observation. Specifically: is one of the algorithms significantly better than the others?

Sounds as if you have correlations between the observed values and each of three variables that comprise predicted values. Where are you getting stuck? — rolando2, Nov 05 '11 at 18:37

score 1 · Answer 1 · edited Jun 11 '20 at 14:32

I'm a little uncertain if you what you mean by algorithms but if you mean different regression models used for prediction and since I'm myself currently in the process of comparing different models and I figured I'll try to summarize what I've found so far (I've made this a community wiki since I'm far from experienced in this field).

Background

In general, when you create a regression model consisting of k parameters, the larger k gets the larger the risk of over-fitting, if you have a unique parameter for each individual you will have a perfect fit but it will be the same as naming each of the studied individuals and you won't be able to infer you results onto a studied population.

Problem

As I see it there are two questions:

How many parameters can I have with a certain sized dataset?
How do I measure the additional value of a parameter while at the same time avoiding over-fitting?

Number of possible parameters

If you have continuous outcome your usually allowed more parameters than dichotomous since a continuous variable contains more information that just yes/no. You can test for effect modification and polynomials but for each test you should be aware that you actually add a risk for over-fitting - you should be think of the possible effect modifications before testing them at random.

When working with dichotomous outcomes there is a "rule of thumb":

$m\le\frac{n}{10}\ for\ n<100$
$m\le\sqrt{n}\ for\ n\ge100$

Where m is maximum parameters used and n is number of outcomes or non-outcomes depending on which is smallest.

Selecting your parameters

It sounds that you've already created your models but if your trying to figure out which parameters to choose you can use stepwise forward/backward. You work through your parameters either by including sequentially the most significant or excluding the least. This methodology has though been listed as a statistical sin and this question examined the possible alternatives. Here's also a question discussing the use of forward selection in survival settings.

Choosing a model

I guess this is the heart of the question. The methods that I've seen so far are AIC, BIC & ROC. Frank Harrel also mentioned in one of my questions the adequacy index as also a possible index. I'm waiting for his book to arrive but this article from him seems discuss the problem in a comprehensive way.

BIC & AIC

The BIC and AIC might be good model selectors, this question discusses them in detail. To summarize:

$AIC = -2*ln(likelihood) + 2*k$
$BIC = -2*ln(likelihood) + ln(N)*k$

where:

k = model degrees of freedom
N = number of observations

since $ln(8) > 2$ the BIC penalizes harder per parameter when the number of observations is large. The choice between them seems to be debated - I'm still struggling to find some easy to understand examples.

ROC & C-statistic

In my own research I was hoping on using the area under the ROC-curve (AUC = C-statistic) as a model selector. I've found though in my case that it's not as discriminative as I would have wanted but the theory is pretty easy to understand and it has some easy interpretations:

Area under ROC = .5: no discrimination = flipping a coin
Area under ROC ≥ .7: acceptable
Area under ROC ≥ .8: excellent discrimination
Area under ROC ≥ .9: outstanding discrimination - very rare

The ROC curve does unlike the BIK/AIC not take into account the number of parameters.

score 1 · Answer 2 · answered Dec 06 '11 at 13:39

1

Cross validate the out-of-sample accuracy for each regression, and compare them. The better model will likely be more accurate out-of-sample.

answered Dec 06 '11 at 13:39

Zach

22,308
18
114
158