Interpreting the results for the glm() function?

Question

Sorry if the question is elementary, I am a beginned both in R and in statistics.

I am reading from An Introduction to Machine Learning with R. We are looking at a dataset called Sonar. It contains 61 variables; the first 60 are numerical, and the last one is an unordered factor with levels M and R, (meaning mine and rock). We want to create a model that predicts weather a certain observation is a rock or a mine:

library("mlbench")
data(Sonar)
## 60/40 split
tr <- sample(nrow(Sonar), round(nrow(Sonar) * 0.6))
train <- Sonar[tr, ]
test <- Sonar[-tr, ]
model <- glm(Class ~ ., data = train, family = "binomial")
p <- predict(model, test, type = "response")
summary(p)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.0000  0.7265  0.5079  1.0000  1.0000
cl <- ifelse(p > 0.5, "M", "R")

From the last line, it seems clear that each entry of p is the probability that the corresponding observation is a mine.

My Question: What, in the code, determines that the entries of p are the probabilities of mines, and not of rocks?

@SmallChess By that point p is already created. So by that point a high p already means a high chance it's a mine. If we had made the last line as cl 0.5, "R", "M") then we would be wrong, because we would misidentify the mines as rocks and rocks as mines. — Ovi, Apr 09 '18 at 04:10
https://stats.stackexchange.com/questions/8254/choose-factor-level-as-dummy-base-in-lm-in-r — SmallChess, Apr 09 '18 at 04:15
This isn't quite clear. If you are only asking about R code, it is probably off topic here. If you are asking how to interpret the output from a logistic regression model in R, you can find that here: [Interpretation of R's output for binomial regression](https://stats.stackexchange.com/q/86351/). — gung - Reinstate Monica, Apr 09 '18 at 14:57

score 1 · Accepted Answer · answered Apr 09 '18 at 04:17

1

I believe R used your first level as the reference, which was "M". You can reorder it yourself. Google R reference factor level.

answered Apr 09 '18 at 04:17

SmallChess

6,764
4
27
48

@Ovi Please accept my answer. Your confusion come from how R found a reference level for statistics. You can always reorder it if you like to. – SmallChess Apr 09 '18 at 04:31
Yeah ${}{}{}{}{}{}$ – Ovi Apr 09 '18 at 04:32

Interpreting the results for the glm() function?

1 Answers1