7

Breiman et al recommend the 1-SE rule, and show that it is successful in screening out noise variables. At page no. 80 of their book, I get confuse about the '1 S.E. Rule'. $$R'[T(k_l)]\leq R'[T(k_0)]+S.E\{R'[T(k_0)]\} $$

Where $T_1, T_2, ... $ are the number of sequence of trees (number of variables) and the corresponding estimates of K-fold cross validation prediction error are $R'[T_1], R'[T_2], ... $ Then, the tree selected is $T(k_l)$, where $k_l$ is the maximum $k$ satisfying above equation. Please note that $R'[T(k_0)]=min_kR'[T_k]$

My question is that how I will calculate the $S.E\{R'[T(k_0)]\}$ ? because it is only one value, Please correct me where I am wrong.

Biostat
  • 1,791
  • 2
  • 19
  • 21

1 Answers1

2

Isn't it as simple as calculating error of mean of $R'[T_i]$ (for a given i) using each cross validation fold as an "independent" measurement? (i.e. calculating standard deviation of $R'[T_i]$ (across K folds) and then dividing by $\sqrt{K-1}$ gives a reasonable resampling-based proxy of that standard error)

Yevgeny
  • 1,422
  • 12
  • 11
  • what are $i$ and $K$? – Biostat Nov 22 '11 at 17:56
  • _K_ is the number of resamples (which is equal to the number of folds in the case of a single replication of K-fold cross validation.) _i_ corresponds to the model with lowest value of metric _R'_ (k0 in your notation above). Basically, resamapling is used to estimate both R' and it's SE. – Yevgeny Nov 22 '11 at 18:05
  • Can I use the following estimate for SE? $SD(X_m)$, where $X_m=mean(R[T_i])$ and $m$ denote replication of k-fold CV. – Biostat Nov 22 '11 at 18:45
  • That is likely to cause a way too parsimonious model to be chosen (since it overestimates the standard error of R' due to the fact that you are not using [error of the mean](http://en.wikipedia.org/wiki/Standard_error_(statistics)#Standard_error_of_the_mean)). If you are in doubt you could try to calculate SE with sqrt(k-1) and without and see what kind of models you get. – Yevgeny Nov 22 '11 at 19:07
  • 1
    By the way, any reason why you are not using R package _caret_ that supports this feature? (and lots of other goodies) - see bottom of the page 19 of [caretTrain vignette](http://cran.r-project.org/web/packages/caret/vignettes/caretTrain.pdf). – Yevgeny Nov 22 '11 at 19:13