0

This is my first time building a model outside of school. I cleaned the data and ran Cohen's Kappa and cutoffs/ROC as well as did random forest. The accuracy of predicting the 1 outcome is about 37% which is not great. The data is unbalanced at 2% of the data being a 1 outcome. Since I am new I do not know any complex codes to handle imbalanced data so I am working with what I have learned in school. My biggest issue is that when trying to deploy this model because of the -15 intercept and very small coefficients for the ind variables it isn't even possible to get a probability at 50%.

Predicting probability of getting a 1 outcome is the main point of this. Does anybody have any recommendations on what I may be doing wrong/can fix?

Call:
glm(formula = reportf ~ logoov + proptypef + attyrankf + agentrankf + 
    logcyv + bldgclassf, family = binomial, data = dataraw)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.8890  -0.1883  -0.1044  -0.0666   3.5935  

Coefficients:

                        Estimate Std. Error z value

    (Intercept)        -14.96646    1.20137 -12.458

    logoov              -0.05469    0.01894  -2.888
    proptypefHotel       1.22973    0.46980   2.618 
    proptypefLand        0.50910    0.43122   1.181 <br/>
    proptypefMedical     1.00078    0.39476   2.535 <br/>
    proptypefOffice      0.43494    0.30130   1.444 <br/>
    proptypefOther       1.35004    0.59146   2.283 <br/>
    proptypefRetail      0.55648    0.28549   1.949 <br/>
    proptypefWarehouse  -0.14890    0.31039  -0.480 <br/>
    attyrankf2           1.66554    0.24403   6.825 <br/>
    attyrankf3           3.05391    0.50576   6.038 <br/>
    attyrankf4           6.46433    0.36854  17.541 <br/>
    attyrankf5           1.71835    0.30571   5.621 <br/>
    agentrankf3         -0.37274    0.27457  -1.358 <br/>
    agentrankf4         -0.18359    0.29121  -0.630 <br/>
    agentrankf5         -0.29880    0.20285  -1.473 <br/>
    logcyv               0.67442    0.06809   9.905 <br/>
    bldgclassfB         -1.15206    0.32494  -3.546 <br/>
    bldgclassfC         -0.36693    0.38989  -0.941 <br/>
    bldgclassfD        -14.02519  374.15157  -0.037 <br/>
    bldgclassfE          0.11344    0.22446   0.505 <br/>
    bldgclassfX          0.73890    0.33774   2.188 <br/>

    Signif. codes:  
    0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

    (Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2267.0  on 8570  degrees of freedom <br/>
    Residual deviance: 1401.3  on 8549  degrees of freedom <br/>
    AIC: 1445.3
kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
rl123
  • 1
  • Have a look at https://stats.stackexchange.com/questions/25389/obtaining-predicted-values-y-1-or-0-from-a-logistic-regression-model-fit – mdewey Jul 15 '19 at 16:15
  • log(odds)=-14.966 when logoov=0, proptypef=0, attyrankf=0, agentrankf=0, logcyv=0, bldgclassf=0. – user158565 Jul 15 '19 at 17:08
  • Sorry my question was written incorrectly. I know how to interpret, my issue is that with the intercept being -14 no matter how many of those coefficients are being applied the results will be negative or near 1. I never get a 90% probability even for results I am aware of if that makes sense. – rl123 Jul 15 '19 at 17:11
  • Try to plug in the largest available values of logoov and logcyv. – user158565 Jul 18 '19 at 02:42

0 Answers0