Statistical Significance of a learning Model

Question

I built a learning model (for classification) based on a Random Forest classifier and i am asked to assess the statistical significance of its performances.

Up to now, i trained and tested it on two different datasets A and B, respectively.

What kind of test can i use?

score 1 · Answer 1 · edited Apr 13 '17 at 12:44

1

You can get an un-biased estimate of the classification error with the out-of-bag error estimate. See explanation here: http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#ooberr

I suppose you could fit the model many times with different random seeds. If your classification error is better than expected by pure chance at least 95% of your trials, then your model is significant (at an alpha level of 0.05).

You may not even have to go to all that trouble -- your out-of-bag error should converge to some value as more trees are added. I do not know how to estimate a confidence interval for it without the above procedure, but someone smarter than I am may know...

Edit: This thread looks relevant to the question of estimating a confidence interval on OOB classification error - Bootstrapping estimates of out-of-sample error

edited Apr 13 '17 at 12:44

Community

1

answered Dec 14 '13 at 19:47

ahwillia

2,406
1
14
26

I already computed the OOB classification Error, and it gave me satisfactory results (in my point of view). But i want to use a more powerful analysis tool. Someone advised me to use something like ANOVA; is it possible to use it here? – WildThing Dec 14 '13 at 20:00
More powerful in what sense? If you want to estimate a p-value or a confidence interval for OOB error, I would stand by my answer above. I'm not sure how one would apply an ANOVA, or what that would tell you in addition to the OOB error. I'm pretty sure that RF is not designed to be mixed with hypothesis testing anyways. – ahwillia Dec 14 '13 at 20:05
**'I'm pretty sure that RF is not designed to be mixed with hypothesis testing anyways'** can you please argue? – WildThing Dec 14 '13 at 20:12
Well, hypothesis tests typically make several assumptions (most universally, that the data are normally distributed). RF is generally used to tackle datasets that are more complex than this and therefore do not satisfy these assumptions. My answer above suggests that you use bootstrapping, which doesn't rely on any normality assumption. Still, I don't think this is typically done -- the classification error or other measures of model performance are usually seen as more important than estimating a p-value. I could be wrong though. – ahwillia Dec 14 '13 at 20:29
What about ANOVA variants like Kruskal-Wallis or Friedman tests? They are generaly used when the distributions are not normal. – WildThing Dec 15 '13 at 13:57
I don't see why you would want to apply that here. What Alex William States I beleive to be true, hypothesis testing is not really meant to be applied to these models. They are algorithmic models based on prediction precision and not tests. I would definitely read Leo Breimans "Statistical Modeling: The two cultures" to get a grasp of this. You can generate a prediction interval using the quantregForest package in R. The only issue I can think of is that this validation only applies to this particular (labeled) sample, if your population has a different distribution you can be in trouble. – JEquihua Dec 15 '13 at 17:15
@JEquihua can you provide more details on the quantregForest package and how it generates the prediction interval? – ahwillia Dec 15 '13 at 18:18
1

I would refer you to the original article: N. Meinshausen (2006) "Quantile Regression Forests", Journal of Machine Learning Research 7. http://jmlr.org/papers/volume7/meinshausen06a/meinshausen06a.pdf – JEquihua Dec 15 '13 at 20:17

Statistical Significance of a learning Model

1 Answers1