I am trying to get upto speed with R. I eventually want to use R libraries for doing text classification. I was just wondering what people's experiences are with regard to R's scalability when it comes to doing text classification.
I am likely to run into high dimensional data (~300k dimensions). I am looking at using SVM and Random Forest in particular as classification algorithms.
Would R libraries scale to my problem size?
Thanks.
EDIT 1: Just to clarify, my data set is likely to have 1000-3000 rows (perhaps a bit more) and 10 classes.
EDIT 2: Since I am very new to R, I will request the posters to be more specific where possible. For example, if you are suggesting a workflow/pipeline, please be sure to mention the R libraries involved in each step if possible. Some additional pointers (to examples, sample code etc.) would be icing on the cake.
EDIT 3: First off, thanks everyone for your comments. And secondly, I apologize, perhaps I should have given more context for the problem. I am new to R but not so much to text classification. I have already done pre-processing (stemming, stopword removal, tf-idf conversion etc.) on my some part of my data using tm package, just to get a feel for things. tm was so slow even on about 200docs that I got concerned about scalability. Then I started playing with FSelector and even that was really slow. And that's the point at which I made my OP.
EDIT 4: It just occurred to me that I have 10 classes and about ~300 training documents per class, and I am in fact building the termXdoc matrix out of the entire training set resulting in very high dimensionality. But how about reducing every 1-out-of-k classification problem to a series of binary classification problems? That would drastically reduce the number of training documents (and hence dimensionality) at each of the k-1 steps considerably, wouldn't it? So is this approach a good one? How does it compare in terms of accuracy to the usual multi-class implementation?