Because randomForest is a collection of independent carts trained upon a random subset of features and records it lends itself to parallelization. The combine()
function in the randomForest package will stitch together independently trained forests. Here is a toy example. As @mpq 's answer states you should not use the formula notation, but pass in a dataframe/matrix of variables and a vector of outcomes. I shameless lifted these from the docs.
library("doMC")
library("randomForest")
data(iris)
registerDoMC(4) #number of cores on the machine
darkAndScaryForest <- foreach(y=seq(10), .combine=combine ) %dopar% {
set.seed(y) # not really needed
rf <- randomForest(Species ~ ., iris, ntree=50, norm.votes=FALSE)
}
I passed the randomForest combine function to the similarly named .combine parameter( which controls the function on the output of the loop. The down side is you get no OOB error rate or more tragically variable importance.
Edit:
After rereading the post I realize that I talk nothing about the 34+ factor issue. A wholey un-thought out answer could be to represent them as binary variables. That is each factor a column that is encoded 0/1 -level factor about its presence/non-presence. By doing some variable selection on unimportant factors and removing them you could keep you feature space from growing too too large.