I was playing around with some data on the presidential elections in 2016 and I got a result that doesn't seem to make sense.
I am running a Logit model on percentage voted for Trump as dependent and my two independent variables are average minimum wage and rate of unemployment from 2012 to 2015.
Here is my code:
import statsmodels.api as sm
import pandas as pd
df = pd.read_csv("Data_sets/pres_and_unemp_data.csv", index_col=0)
y = df["pct"]
X = df[["min_wage", "Rate"]]
result = sm.Logit(y, X).fit()
print(result.summary2())
print(df.corr())
And this is the output:
Optimization terminated successfully.
Current function value: 0.647033
Iterations 4
Results: Logit
================================================================
Model: Logit Pseudo R-squared: -0.383
Dependent Variable: pct AIC: 2123.6788
Date: 2020-03-03 17:08 BIC: 2134.4813
No. Observations: 1638 Log-Likelihood: -1059.8
Df Model: 1 LL-Null: -766.38
Df Residuals: 1636 LLR p-value: 1.0000
Converged: 1.0000 Scale: 1.0000
No. Iterations: 4.0000
------------------------------------------------------------------
Coef. Std.Err. z P>|z| [0.025 0.975]
------------------------------------------------------------------
min_wage 0.0418 0.0191 2.1860 0.0288 0.0043 0.0794
Rate 0.0182 0.0212 0.8555 0.3923 -0.0234 0.0598
================================================================
Rate min_wage pct
Rate 1.000000 0.233336 -0.131478
min_wage 0.233336 1.000000 -0.310230
pct -0.131478 -0.310230 1.000000