0

Sorry if the question is elementary, I am a beginned both in R and in statistics.

I am reading from An Introduction to Machine Learning with R. We are looking at a dataset called Sonar. It contains 61 variables; the first 60 are numerical, and the last one is an unordered factor with levels M and R, (meaning mine and rock). We want to create a model that predicts weather a certain observation is a rock or a mine:

library("mlbench")
data(Sonar)
## 60/40 split
tr <- sample(nrow(Sonar), round(nrow(Sonar) * 0.6))
train <- Sonar[tr, ]
test <- Sonar[-tr, ]
model <- glm(Class ~ ., data = train, family = "binomial")
p <- predict(model, test, type = "response")
summary(p)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.0000  0.7265  0.5079  1.0000  1.0000
cl <- ifelse(p > 0.5, "M", "R")

From the last line, it seems clear that each entry of p is the probability that the corresponding observation is a mine.

My Question: What, in the code, determines that the entries of p are the probabilities of mines, and not of rocks?

Ovi
  • 373
  • 2
  • 8
  • You said it yourself, the last line. What's your question? – SmallChess Apr 09 '18 at 04:03
  • @SmallChess By that point p is already created. So by that point a high p already means a high chance it's a mine. If we had made the last line as cl 0.5, "R", "M") then we would be wrong, because we would misidentify the mines as rocks and rocks as mines. – Ovi Apr 09 '18 at 04:10
  • https://stats.stackexchange.com/questions/8254/choose-factor-level-as-dummy-base-in-lm-in-r – SmallChess Apr 09 '18 at 04:15
  • This isn't quite clear. If you are only asking about R code, it is probably off topic here. If you are asking how to interpret the output from a logistic regression model in R, you can find that here: [Interpretation of R's output for binomial regression](https://stats.stackexchange.com/q/86351/). – gung - Reinstate Monica Apr 09 '18 at 14:57

1 Answers1

1

I believe R used your first level as the reference, which was "M". You can reorder it yourself. Google R reference factor level.

SmallChess
  • 6,764
  • 4
  • 27
  • 48