This question is based on Everitt et al. (A Handbook of Statistical Analyses Using R) and I am trying to answer these questions:
Load the
Default
dataset fromISLR
library. The dataset contains information on ten thousand customers. The aim here is to predict which customers will default on their credit card debt. It is a four-dimensional dataset with 10000 observations. The question of interest is to predict individuals who will default . We want to examine how each predictor variable is related to the response (default). Do the following on this dataset:a) Perform descriptive analysis on the dataset to have an insight. Use summaries and appropriate exploratory graphics to answer the question of interest.
b) Use R to build a logistic regression model.
c) Discuss your result. Which predictor variables were important? Are there interactions?
However, I am more interested in understanding when one should use -1 and the relevance of excluding intercept in a model. Here is the data summary:
# Set up data
data("Default", package = "ISLR")
#create default binary
default_binary <-
ifelse(regexpr('Yes', Default$default) == -1, 0, 1)
dflt_str <-
ifelse(regexpr('Yes', Default$default) == -1,
"Not Defaulted",
"Defaulted")
stdn <- ifelse(regexpr('Yes', Default$student) == -1, 0, 1)
stdn_str <-
ifelse(regexpr('Yes', Default$student) == -1, "Not-Student", "Student")
blnc <- Default$balance
incm <- Default$income
df <-
data.frame(default_binary, dflt_str, stdn, stdn_str, blnc, incm)
# with intercept
fm0 <- default_binary ~ stdn + blnc + incm
# no intercept as indicated by -1
fm1 <- default_binary~-1+stdn+blnc+incm
regression_model_without_minus_1 <- glm(fm0, family = binomial())
regression_model_with_minus_1 <- glm(fm1, family = binomial())
and for summary of the model, I get:
Can someone please explain me the difference between results with -1 and without -1 in these models with merits and drawbacks. Thanks for helping me!