6

I have performed logistic regression (using 'LOGIT') on variables from titanic dataset. The formula used is:

survived ~ age + sex + pclass

I have obtained results as follows:

==================== Summary() ====================
                           Logit Regression Results                           
==============================================================================
Dep. Variable:               survived   No. Observations:                  714
Model:                          Logit   Df Residuals:                      710
Method:                           MLE   Df Model:                            3
Date:                Mon, 20 Jul 2020   Pseudo R-squ.:                  0.3289
Time:                        14:29:27   Log-Likelihood:                -323.65
converged:                       True   LL-Null:                       -482.26
Covariance Type:            nonrobust   LLR p-value:                 1.860e-68
===============================================================================
                  coef    std err          z      P>|z|      [0.025      0.975]
-------------------------------------------------------------------------------
Intercept       5.0560      0.502     10.069      0.000       4.072       6.040
sex[T.male]    -2.5221      0.207    -12.168      0.000      -2.928      -2.116
age            -0.3693      0.076     -4.841      0.000      -0.519      -0.220
pclass         -1.2885      0.139     -9.253      0.000      -1.561      -1.016
===============================================================================

==================== Summary2() ====================
                         Results: Logit
=================================================================
Model:              Logit            Pseudo R-squared: 0.329     
Dependent Variable: survived         AIC:              655.2909  
Date:               2020-07-20 14:29 BIC:              673.5745  
No. Observations:   714              Log-Likelihood:   -323.65   
Df Model:           3                LL-Null:          -482.26   
Df Residuals:       710              LLR p-value:      1.8597e-68
Converged:          1.0000           Scale:            1.0000    
No. Iterations:     6.0000                                       
------------------------------------------------------------------
              Coef.   Std.Err.     z      P>|z|    [0.025   0.975]
------------------------------------------------------------------
Intercept     5.0560    0.5021   10.0692  0.0000   4.0719   6.0402
sex[T.male]  -2.5221    0.2073  -12.1676  0.0000  -2.9284  -2.1159
age          -0.3693    0.0763   -4.8415  0.0000  -0.5188  -0.2198
pclass       -1.2885    0.1393   -9.2528  0.0000  -1.5615  -1.0156
=================================================================

Edit: I want to explain results in lay terms. I want to determine how much odds of survival change with changes in each predictor variable. To clarify, I want to know:

  1. What are the odds of a male surviving as compared to a female?

  2. How do odds change for every 1 year increase in age of the person?

I understand it is a very basic question, but it is important to have reliable knowledge about it.

Robert Long
  • 53,316
  • 10
  • 84
  • 148
rnso
  • 8,893
  • 14
  • 50
  • 94
  • 2
    Logit = Log-odds, that solves your question. – TrungDung Jul 20 '20 at 09:47
  • Pl see edit in my question above where I have clarified my questions. – rnso Jul 20 '20 at 11:06
  • 1
    With a lay audience I wonder if your bigger problem might be distinguishing "odds" from "probability." Otherwise, as @TDT put it, a coefficient is the change in the log of the odds per unit change of the predictor, so exponentiate to get the associated change of odds. – EdM Jul 20 '20 at 12:12

1 Answers1

7

The question title is:

How to get log odds from these results of logistic regression

The estimates are already on the log-odds scale. All you have to do is read the relevant entry.

What are the odds of a male surviving as compared to a female?

The log-odds of a male surviving compared to a female is -2.5221, holding the other variables constant. If we exponentiate this we get

> exp(-2.5221)
[1] 0.0803

and this is the odds ratio of survival for males compared to females - that is the odds of survival for males is 92% lower than the odds of survival for females

How do odds change for every 1 year increase in age of the person?

Every 1 year increase in age is associated with a 0.3693 decrease in log-odds of survival holding the other variables constant. If we exponentiate this:

> exp(-0.3693)
[1] 0.691

So every 1 unit increase in age is associated with a decrease in the odds of survival of 31%, holding the other variables constant.

Robert Long
  • 53,316
  • 10
  • 84
  • 148
  • This is exactly what I wanted - to put results in simple lay terms. Can we apply same method to confidence intervals in output to get CI of odds? – rnso Jul 20 '20 at 12:26
  • Sure, just exponentiate the CI limits. Note that this results in an asymmetrical CI relative to the odds ratio itself. Remember that survival is being analysed on the log-odds scale, with statistical tests performed and the CI defined on that scale. The transformation to odds ratio is really just a convenience. – Robert Long Jul 20 '20 at 12:55
  • +1 although with respect to CI, as in the comments, that's hard enough explaining to a scientific audience. How does one help a lay audience understand frequentist CI? – EdM Jul 20 '20 at 13:34
  • 2
    @EdM To a lay audience I always explain a frequentist CI as "a plausible range" for the parameter. I don't think there is any way to express the techincally correct meaning. Also, it nicely avoids having to explain to a lay audience what a p value is. – Robert Long Jul 20 '20 at 14:07
  • Why are log odds generally given and not odds ratios, when latter are more used in discussions. – rnso Jul 20 '20 at 16:46
  • 2
    For the reason I gave in my first comment. The analysis is fundamentally on the log-odds scale. It's up to the useR to interpret the results in the way that suits them best. – Robert Long Jul 20 '20 at 17:13
  • @EdM : By lay persons I meant professionals who are not statisticians but who `use` statistics. They need not know how results are being calculated but only how to interpret them for practical purposes. Probably `lay` is not the best word for them. – rnso Jul 20 '20 at 17:18