I have a data set with 1962 observations and 46 columns. Column 46 is the target with 3 classes 1, 2, 3. 6 of the other columns are nominal variables and the rest are ordinal variables. I have preprocessed them using as follows:
for (i in c(1:4,6,9,46)){
cw_alldata_known[,i] <- as.factor(cw_alldata_known[,i])
}
for (i in c(5,7,8,10:45)){
cw_alldata_known[,i] <- as.ordered(cw_alldata_known[,i])
}
Then I divide them 50/50 into training and test sets.
I fitted a decision tree using party
package of R:
cw.ctree <- ctree(cr~.,data = cw.train)
Then I also fitted a random forest model using randomForest
package:
cw.forest <- randomForest(credit.rating ~ ., data=cw.train,ntree=107)
I have tried other ntree
values but 107 seems to be the best.
The accuracy on the test set of decision tree is around 61%, while random forest is only 56%. I read that random forest is often more robust and reliable. Why doesn't it perform better than decision tree in this case?