I am trying to use a random forest model in R
RF<- randomForest(as.factor(y)~., data=train, importance=TRUE, proximity=FALSE,
ntree=1000, keep.forest=TRUE)
Y is binary. Model runs without any issues. However i have some questions
I have about 50,000 rows of data and 300 variables. As per other theards using proximity=TRUE will likely run very slow or requires a lot of memory.
- Using proximity=TRUE will it give me different results if memory was not an issue ?
- I am interested in probability of predicting my 1s as my final outcome. when i do predicted=predict(RF)..i get 1's and 0s are my prediction. I would rather like probabilites.
- How do we implement a random forest model ? I have 1000 trees. If i want to implement this model real time, how do we code this ? Do we code 1000 decision trees to make prediction for each new case ?
Thank you.