0

I need an library, or something that is already done for SVM and Random Forest algorithms. Can you give me some ideas? I don't have experience and I don't know what to choose.

The restriction of my classification problem is: 27 dimensions, 9 classes, 50.000 entries in the training set, 150.000 in test set.

yonutix
  • 293
  • 3
  • 8

1 Answers1

2

I, too, would suggest the 'caret' package in R

You can built a lot of models and compare their performances

http://topepo.github.io/caret/training.html

By the way, usually the ratio of the training set to the test set is a bit higher than that you have.

Let have a look at this discussion: https://stackoverflow.com/questions/13610074/is-there-a-rule-of-thumb-for-how-to-divide-a-dataset-into-training-and-validatio

user3875022
  • 71
  • 1
  • 5
  • 1
    But few people have the luxury of such a large data set. $n_{train} = 50000 : p = 27$ is a pretty comfortable training set size, and 1.5e5 test cases is nice as well (assuming these are independent cases...) – cbeleites unhappy with SX Apr 11 '15 at 16:42
  • I understand your point, but in the machine learning community, usually the training set is suggested to be larger than the test set, not the other way around. – user3875022 Apr 12 '15 at 18:14
  • There are several discussions on this point: http://stats.stackexchange.com/questions/23331/why-is-there-an-asymmetry-between-the-training-step-and-evaluation-step – user3875022 Apr 12 '15 at 18:15
  • Also see this: http://www.quora.com/Can-the-validation-set-be-larger-than-the-training-set – user3875022 Apr 12 '15 at 18:15