Sorry if the question is elementary, I am a beginned both in R and in statistics.
I am reading from An Introduction to Machine Learning with R. We are looking at a dataset called Sonar. It contains 61 variables; the first 60 are numerical, and the last one is an unordered factor with levels M and R, (meaning mine and rock). We want to create a model that predicts weather a certain observation is a rock or a mine:
library("mlbench")
data(Sonar)
## 60/40 split
tr <- sample(nrow(Sonar), round(nrow(Sonar) * 0.6))
train <- Sonar[tr, ]
test <- Sonar[-tr, ]
model <- glm(Class ~ ., data = train, family = "binomial")
p <- predict(model, test, type = "response")
summary(p)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 0.7265 0.5079 1.0000 1.0000
cl <- ifelse(p > 0.5, "M", "R")
From the last line, it seems clear that each entry of p is the probability that the corresponding observation is a mine.
My Question: What, in the code, determines that the entries of p are the probabilities of mines, and not of rocks?