0

I carried out a binary logistic regression using glm. Below you can see the (modified) output. I included -1 to display all values, even the baseline which is used as reference category.

My questions would be: 1) There are still some groups missing: For example "stfwrk" (there are 7, it only shows 6), or "Statemplyd" (there are 2, dichotomous).

2) What do the suffixes after the "SEF"-categories mean? (".L", ".Q", "^4" etc.) Why doesn't it display the defined levels, such as for example in "eduScndary"? (The variable containing the "edu"-items has been created quite similar to the SEF item, only that it's a character one). I assume it has to do with the cut- and labels-command. Is there another way to have these labels assigned at the exact cut-points? Or how do I get it to display "A", "B" etc. in the model?

This is how I created the "SEF" item: The original variable "SEFcls" is a Factor w/ 488 levels. I used cut to match the 488 levels into 5 groups:

  SEF<-as.numeric(SEFcls)
  cut(SEF, breaks = c(1,8,52,171,279),
           labels = c("A", "B", "C", "D", "E"), ordered=T,
           right = TRUE)

The variable SEF is now an Ord.factor w/ 5 levels. When computing table or summary I get the correct results with the correct, previously assigned labels:

 A     B     C     D    E 
 3411  2098  1744 1120  141

Output:

Deviance Residuals: 
Min       1Q   Median       3Q      Max  
-2.2818  -0.4392  -0.2883  -0.0802   3.4007  

Coefficients:
                            Estimate Std. Error z value Pr(>|z|)    
(Intercept)               -1.146810   0.390622  -2.936  0.00333 ** 
age                       -0.010576   0.001831 -5.777 7.59e-09 ***
gendrFem                  -0.297994   0.058344  -5.108 3.26e-07 ***
stfwrk1                    0.084032   0.384767   0.218  0.82712    
stfwrk2                   -0.332194   0.319690  -1.039  0.29875    
stfwrk3                   -0.157778   0.291956  -0.540  0.58891    
stfwrk4                    0.326489   0.282949   1.154  0.24855    
stfwrk5                    0.125305   0.265983   0.471  0.63757    
stfwrk6                    0.299977   0.269225   1.923  0.05448 .  
hlthGood                  -0.033777   0.071748  -0.471  0.63780    
hlthFair                   0.169100   0.084155   2.009  0.04450 *  
hlthBad                   -0.054457   0.132281  -0.412  0.68058    
hlthVery bad               0.176113   0.240020   0.734  0.46311    
SEF.L                      0.694256   0.214092   3.243  0.00118 ** 
SEF.Q                     -0.042545   0.218816  -0.194  0.84584    
SEF.C                     -0.165405   0.196888  -0.840  0.40085    
SEF^4                     -0.048406   0.154774  -0.313  0.75447    
SEF^5                     -0.037543   0.125438  -0.299  0.76471    
SEF^6                      0.004365   0.096144   0.045  0.96379    
SEF^7                      0.175295   0.083457   2.100  0.03569 *  
eduSecondary              -0.058418   0.152025  -0.384  0.70078    
eduSnrClass                0.151239   0.126941   1.191  0.23349    
eduSnrClass               -0.081495   0.117589  -0.693  0.48828    
eduThirdlvl               -0.581437   0.131859  -4.410 1.04e-05 ***
eduDctrl                  -0.836041    0.390912 -2.139  0.03246 *
StatEmplyd                -0.155013   0.088234  -1.757  0.07894 .  
sclLess than once a month -0.044219   0.238654  -0.185  0.85301    
sclOnce a month            0.183115   0.236095   0.776  0.43799    
sclSeveral times a month   0.108922   0.231849   0.470  0.63850    
sclOnce a week            -0.009763   0.233962  -0.042  0.96671    
sclSeveral times a week   -0.031426   0.233323  -0.135  0.89286    
sclEvery day               0.072457   0.242567   0.299  0.76516    
cntryB                    -1.560045   0.180680  -8.634  < 2e-16 ***
cntryCz                   -1.683876   0.194952  -8.637  < 2e-16 ***
cntryGer                  -1.282113   0.150699  -8.508  < 2e-16 ***
cntryDen                   0.151659   0.137432   1.104  0.26980    

---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 13733  on 20443  degrees of freedom
Residual deviance: 10185  on 20384  degrees of freedom
(19741 observations deleted due to missingness)
AIC: 10305

Number of Fisher Scoring iterations: 17
  • 1
    It would help if you can attach a sample of the original data, and also make the question as concise as possible by removing all the extraneous categorical variables in your regression. – Alex Feb 15 '17 at 23:50
  • This seems as a question for stackoverflow.com, as it is about R programming, not statistics or machine learning – user31264 Feb 16 '17 at 00:21
  • I think it belongs here because it is about interpreting a statistical method... – bethanyP Feb 16 '17 at 05:48
  • I'm voting to close this as off topic since it is about regular usage of R and does not provide a reproducible example. – Tim Feb 16 '17 at 09:03
  • 1
    You need to revise some introductory material on how R handles ordered factors. – mdewey Feb 16 '17 at 13:45
  • See also [Qualitative variable coding in regression leads to “singularities”](http://stats.stackexchange.com/q/70699/17230), [Polynomial contrasts for regression](http://stats.stackexchange.com/q/105115/17230), & the very useful http://www.ats.ucla.edu/stat/r/library/contrast_coding.htm. – Scortchi - Reinstate Monica Feb 16 '17 at 14:48

2 Answers2

0

First, nothing is missing. One level is the reference category. There are different ways of parameterizing categorical variables in regression, but nothing here is wrong.

Second, the L, Q etc. stand for linear, quadratic, cubic etc.

Peter Flom
  • 94,055
  • 35
  • 143
  • 276
0

The currently represented parameter estimates represent the difference with a reference category(the one for which no parameter is shown, since it is evidently 0). This is called treatment-coding, the default in R. For a factor with $n$ levels you only need to estimate $n-1$ parameters. You could also get a representation with all parameter estimates, the reference category being the average of all the categories. This is called sum-coding, for which the contrast matrix can be called with contr.sum(). The parameter estimates are then restricted to sum to 0. See the ?contrasts function for more options.

Knarpie
  • 1,522
  • 9
  • 22