Say I build a linear regression model to identify linear dependencies between variables in my data. Some of these variables are categorical variables.
If I want to evaluate the contribution of a given predictor, how do I evaluate it? Can I compare the coefficients directly? I read in the answers that the |t| value gives us a sense of the strength of this predictor, how is this exactly?
I understand that for a given category with
K
values, onlyK-1
dummy variables are created and that this is standard to avoid obvious multi-collinearity, but how can I still identify the contribution associated with the values (predictors) dropped?
Here is the model:
mod = smf.ols('dependent ~ first_category + second_category + object_price', data=df).fit()
And the output
mod.summary()
OLS Regression Results
==============================================================================
Dep. Variable: dependent R-squared: 0.227
Model: OLS Adj. R-squared: 0.226
Method: Least Squares F-statistic: 261.7
Date: Thu, 04 Sep 2014 Prob (F-statistic): 0.00
Time: 14:59:24 Log-Likelihood: -86099.
No. Observations: 17866 AIC: 1.722e+05
Df Residuals: 17845 BIC: 1.724e+05
Df Model: 20
===========================================================================================
coef std err t P>|t| [95.0% Conf. Int.]
-------------------------------------------------------------------------------------------
Intercept 27.6888 1.017 27.235 0.000 25.696 29.682
first_category[T.o] -1.3250 0.848 -1.562 0.118 -2.987 0.337
first_category[T.v] -10.4557 1.125 -9.294 0.000 -12.661 -8.251
second_category[T.SL0004] 21.9987 0.808 27.213 0.000 20.414 23.583
second_category[T.SL0005] -2.3710 2.458 -0.965 0.335 -7.188 2.446
second_category[T.SL0006] 7.2716 3.609 2.015 0.044 0.197 14.346
second_category[T.SL0007] 20.1545 1.495 13.482 0.000 17.224 23.085
second_category[T.SL0008] 13.3333 0.794 16.788 0.000 11.777 14.890
second_category[T.SL0009] 18.5605 2.189 8.478 0.000 14.270 22.851
second_category[T.SL0010] 6.7351 1.158 5.817 0.000 4.465 9.005
second_category[T.SL0011] 2.6791 0.689 3.888 0.000 1.329 4.030
second_category[T.SL0012] -0.8159 3.811 -0.214 0.830 -8.285 6.654
second_category[T.SL0014] 8.2550 11.359 0.727 0.467 -14.010 30.520
second_category[T.SL0016] 1.6220 1.229 1.320 0.187 -0.787 4.031
second_category[T.SL0017] -14.3253 2.642 -5.422 0.000 -19.504 -9.147
second_category[T.SL0018] 1.4823 3.193 0.464 0.643 -4.777 7.741
second_category[T.SL0019] 20.0228 2.850 7.024 0.000 14.436 25.610
second_category[T.SL0020] -11.7478 8.691 -1.352 0.176 -28.782 5.287
budget -0.5682 0.014 -40.828 0.000 -0.595 -0.541
object_price 0.0037 0.000 33.192 0.000 0.003 0.004
hour -0.9244 0.040 -23.244 0.000 -1.002 -0.846
==============================================================================
Omnibus: 2997.054 Durbin-Watson: 1.001
Prob(Omnibus): 0.000 Jarque-Bera (JB): 4758.803
Skew: 1.183 Prob(JB): 0.00
Kurtosis: 3.892 Cond. No. 1.59e+05
==============================================================================
Warnings:
[1] The condition number is large, 1.59e+05. This might indicate that there are
strong multicollinearity or other numerical problems.