1

This post is related to an early post here published a few days ago, whereby I was having issues with prediction error rates; that is, the classification tree that I grew underperforms a naive intercept-only model, in which there is no predictors and one simply bets on the majority class of the zero-one coding of the binary response variable.

Then I went ahead and did cross-validation of the model, and the results below show that a singleton model is the best and has the least amount of prediction error. Does this align with the problem in the post that I was referring to previously? And how to resolve it? Thanks a lot!

> set.seed(47306)
> cv.h2 <- cv.tree(tree.h2, FUN=prune.misclass)
> cv.h2
$size
[1] 26  9  6  4  1

$dev
[1] 270 270 270 270 270

$k
[1] -Inf 0.00 1.00 2.50 2.67

$method
[1] "misclass"

attr(,"class")
[1] "prune"         "tree.sequence"
> min.error = which.min(cv.h2$dev) 
> min.error
[1] 1
> table(usedta[class.train,]$h2)

1poorHlth 0goodHlth 
      270      1305
WaterWood
  • 11
  • 1
  • Here the real problem, I guess, is all the deviance are equal regardless of the number of terminal nodes or how I grow the tree. Any thoughts? – WaterWood Aug 27 '20 at 15:35

0 Answers0