The NaiveBayes()
function in the klaR package obeys the classical formula
R interface whereby you express your outcome as a function of its predictors, e.g. spam ~ x1+x2+x3
. If your data are stored in a data.frame
, you can input all predictors in the rhs of the formula using dot notation: spam ~ ., data=df
means "spam
as a function of all other variables present in the data.frame
called df
."
Here is a toy example, using the spam
dataset discussed in the Elements of Statistical Learning (Hastie et al., Springer 2009, 2nd ed.), available on-line. This really is to get you started with the use of the R function, not the methodological aspects for using NB classifier.
data(spam, package="ElemStatLearn")
library(klaR)
# set up a training sample
train.ind <- sample(1:nrow(spam), ceiling(nrow(spam)*2/3), replace=FALSE)
# apply NB classifier
nb.res <- NaiveBayes(spam ~ ., data=spam[train.ind,])
# show the results
opar <- par(mfrow=c(2,4))
plot(nb.res)
par(opar)
# predict on holdout units
nb.pred <- predict(nb.res, spam[-train.ind,])
# raw accuracy
confusion.mat <- table(nb.pred$class, spam[-train.ind,"spam"])
sum(diag(confusion.mat))/sum(confusion.mat)
A recommended add-on package for such ML task is the caret package. It offers a lot of useful tools for preprocessing data, handling training/test samples, running different classifiers on the same data, and summarizing the results. It is available from CRAN and has a lot of vignettes that describe common tasks.