0

I'm currently reviewing the provided solution for a GLM problem and I'm completely confused by the answers.

The training data is staged as such: X.train <- model.matrix(purchase ~ age + I(age^2) + job + marital + edu_years + housing + loan + phone + month + weekday + PC1, data = data_train). Note that edu_years is a numeric variables developed from a factor. As such, "University Degree" means edu_years = 16 and "High School" => edu_years = 12.

We applied Elastic Net shrinkage with family = Binomial, alpha = 0.5, and optimal lambda value for minimum CP:

    m <- cv.glmnet(
     x = X.train,
     y = data_train$purchase,
    family = "binomial",
    type.measure = "class",
    lambda = m$lambda.min,
    alpha = 0.5
    )

which produced the following list of variables:

    35 x 1 sparse Matrix of class "dgCMatrix" 
    s0 
    (Intercept)       . 
    age               -0.0401357217 
    I(age^2)          0.0004797536 
    jobblue-collar   -0.0084693625 
    jobentrepreneur  -0.1595694892 
    jobhousemaid     -0.0289770492 
    jobmanagement    -0.1120492444 
    jobretired        0.1929430562 
    jobself-employed  0.0323896279 
    jobservices      -0.1139322164 
    jobstudent        0.4306243751 
    jobtechnician     0.0166998929 
    jobunemployed     . 
    jobunknown       -0.0163235431 
    maritalmarried   -0.0274773044 
    maritalsingle     0.0113559891
    edu_years         0.0261161381 
    housingyes       -0.0620277221
    housingunknown    0.2380700634 
    loanyes           . 
    phonelandline     0.1167420258 
    monthaug          0.1068785237 
    monthdec          0.4249593236 
    monthjul          0.5236570756 
    monthjun          0.3557837676 
    monthmar          1.1081939172 
    monthmay         -0.6986577159 
    monthnov         -0.0726119578 
    monthoct          1.1145412898 
    monthsep          0.5499560819 
    weekdaymon       -0.1925877915 
    weekdaythu        . 
    weekdaytue       -0.0001003185 
    weekdaywed        0.0636492095 
    PC1               0.6756501552

The solution then makes a few statements:

  • Start by assuming every call has a 48.2% chance of resulting in a purchase. This is the same as an odds ratio of 0.93.

  • Multiply by the odds ratio by the following odds factors for the job of who is being called:

    • Retired: 1.21
    • Student: 1.13
    • Other jobs: 1.00
    • Blue-collar: 0.96
  • Then multiply the result by the following odds factors for education, a slight adjustment with more education leading to higher purchase rates (not all shown):

    • University degree: 1.06
    • High school: 1.05
    • 6 years of education: 1.02

Can someone please explain to me where these statements are coming from??? Specifically, I have 3 questions:

  1. How were the multiplicative odds factors created? I thought the formula for odds factors would be e^(x/(1-x)) because we're using a binomial family but that doesn't produce the same results. I also noticed that e^(x) replicates the "retired" multiplicative factor but none of the others and I'm just not sure what to think.
  2. Why are we assuming every call has a 48.2% chance of resulting in a purchase? I thought that would be the intercept but in the output above the intercept is 0.
  3. I noticed (0.482)/(1-0.482) = 0.93. Why aren't we doing e^[0.482/(1-0.482)] instead?
  • Does https://stats.stackexchange.com/questions/133623/ help with (1) and (3)? – whuber Dec 08 '20 at 21:48
  • @whuber Thanks for linking me to that! I think it helps with #3, in that I'm now interpreting the base case has an odds ratio of 0.93, which is what gets multiplied by the variable adjustments. So if I had a case of a blue-collar job with HS education, the log odds would be 0.96 * 1.05 * Odds(Base) = 0.96 * 1.05 * 0.93 = 0.93744, and the probability of success for this candidate would be Odds(Case) / (1 + Odds(Case)) = 48.38%? Is it also implying that for #2 the chance of the base case isn't reflected in the coefficients, and that 48% comes form somewhere else? For #1, it doesn't help me. – Burton_Gustice Dec 08 '20 at 23:06
  • I believe (2) comes from somewhere else. The values given for the odds ratios do not seem related to the values you posted for the output. – whuber Dec 08 '20 at 23:49

0 Answers0