This is my first time building a model outside of school. I cleaned the data and ran Cohen's Kappa and cutoffs/ROC as well as did random forest. The accuracy of predicting the 1 outcome is about 37% which is not great. The data is unbalanced at 2% of the data being a 1 outcome. Since I am new I do not know any complex codes to handle imbalanced data so I am working with what I have learned in school. My biggest issue is that when trying to deploy this model because of the -15 intercept and very small coefficients for the ind variables it isn't even possible to get a probability at 50%.
Predicting probability of getting a 1 outcome is the main point of this. Does anybody have any recommendations on what I may be doing wrong/can fix?
Call:
glm(formula = reportf ~ logoov + proptypef + attyrankf + agentrankf +
logcyv + bldgclassf, family = binomial, data = dataraw)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.8890 -0.1883 -0.1044 -0.0666 3.5935
Coefficients:
Estimate Std. Error z value
(Intercept) -14.96646 1.20137 -12.458
logoov -0.05469 0.01894 -2.888
proptypefHotel 1.22973 0.46980 2.618
proptypefLand 0.50910 0.43122 1.181 <br/>
proptypefMedical 1.00078 0.39476 2.535 <br/>
proptypefOffice 0.43494 0.30130 1.444 <br/>
proptypefOther 1.35004 0.59146 2.283 <br/>
proptypefRetail 0.55648 0.28549 1.949 <br/>
proptypefWarehouse -0.14890 0.31039 -0.480 <br/>
attyrankf2 1.66554 0.24403 6.825 <br/>
attyrankf3 3.05391 0.50576 6.038 <br/>
attyrankf4 6.46433 0.36854 17.541 <br/>
attyrankf5 1.71835 0.30571 5.621 <br/>
agentrankf3 -0.37274 0.27457 -1.358 <br/>
agentrankf4 -0.18359 0.29121 -0.630 <br/>
agentrankf5 -0.29880 0.20285 -1.473 <br/>
logcyv 0.67442 0.06809 9.905 <br/>
bldgclassfB -1.15206 0.32494 -3.546 <br/>
bldgclassfC -0.36693 0.38989 -0.941 <br/>
bldgclassfD -14.02519 374.15157 -0.037 <br/>
bldgclassfE 0.11344 0.22446 0.505 <br/>
bldgclassfX 0.73890 0.33774 2.188 <br/>
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 2267.0 on 8570 degrees of freedom <br/>
Residual deviance: 1401.3 on 8549 degrees of freedom <br/>
AIC: 1445.3