Produce a GLM intercept that does not include reference levels for categorical variables?

Question

I realize that a similar question to this has been asked, but it was not ultimately resolved. I have tried the suggestions posted to that question here, but have had no success. I am using the following code:

allinfa4.exp = glm(survive ~ year + julianvisit + class + sitedist + roaddist
+ ngwdist, family = binomial(logexp(alldata$expos)), data=alldata)
summary(allinfa4.exp)

 Call:
glm(formula = survive ~ year + julianvisit + class + sitedist + 
roaddist + ngwdist, family = binomial(logexp(alldata$expos)), 
data = alldata)

Deviance Residuals: 
Min       1Q   Median       3Q      Max  
-2.6435   0.3477   0.4164   0.4960   0.9488  

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept)  4.458e+00  7.117e-01   6.265 3.74e-10 ***
year2013     3.680e-01  1.862e-01   1.976  0.04819 *  
year2014     2.136e-02  1.802e-01   0.119  0.90564    
julianvisit -5.714e-03  3.890e-03  -1.469  0.14192    
classb       2.863e-02  2.194e-01   0.131  0.89615    
classc      -2.394e-01  2.277e-01  -1.051  0.29304    
classd      -1.868e-01  2.479e-01  -0.754  0.45109    
classe      -4.500e-01  2.076e-01  -2.167  0.03021 *  
classf      -5.728e-01  2.005e-01  -2.858  0.00427 ** 
classg      -8.495e-01  3.554e-01  -2.390  0.01684 *  
classh      -1.858e-01  2.224e-01  -0.835  0.40351    
classi      -3.196e-01  4.417e-01  -0.724  0.46932    
sitedist    -2.607e-04  5.043e-04  -0.517  0.60520    
roaddist     6.768e-05  4.311e-04   0.157  0.87525    
ngwdist     -5.751e-05  9.456e-05  -0.608  0.54306

The main thing to note here is that I have two categorical variables, year and class, and R has combined the first level of each (2012 and class a) into a reference level intercept term. Not only do I need to know the intercept term for these levels individually, but I also need to know the base intercept terms itself (beta0), just as SAS produces.

I have tried changing the contrasts and deviation coding to accomplish this, but although doing so allows me to extract different levels, it changes the way they are calculated and still does not produce beta0. I've also tried adding +0 and -1, but this also does not provide what I need. Is what I'm trying to do simply impossible in R? It may seem like a strange request, but beta0 is necessary to convert the results of logistic exposure (special kind of logistic regression for nest-survival data) to daily survival rates. Any help would be hugely appreciated. Thanks!

Here is an example of SAS output I want to emulate (taken from a similar analysis done by my lab mate) :

Parameter Estimates
Parameter   Estimate    Standard Error  DF  t Value Pr > |t|              
beta0       7.8404      2.8479          19  2.75    0.0127  
NT         -3.8786      1.8831          19  -2.06   0.0534  
bgdensity  -0.1127      0.1614          19  -0.70   0.4935  
nwh         1.3466      1.4625          19  0.92    0.3687      
NRD        -2.6981      1.9496          19  -1.38   0.1824      
NAGW       -0.4898      2.2518          19  -0.22   0.8301

The case where there a multiple categorical predictors is fundamentally different from the case with a single categorical predictor (in the linked question). Can you show us a simple reproducible example, and can you show us what SAS produces? Have you looked into the `lsmeans` package? — Ben Bolker, Nov 18 '14 at 18:49
The family argument looks weird. Generally one would use `binomial` or `"binomial"`. Supplying it as a returned value from a function call seems odd. A further puzzlement; there is no `logexp` function in in base:R. You can supply an interaction term `year*class` if you want to disaggregate the intercept-contribution of class, but you will still be getting an "all-factor intercept" (due to treatment constrasts) if any of the other terms are discrete. You may want to read up on `?contrasts` and choose the type of contrast that you are familiar with. — DWin, Nov 18 '14 at 19:09
@BondedDust: the OP is doing an analysis along the lines of [this rpubs example](http://www.rpubs.com/bbolker/logregexp), or the [other question they have asked on SO](http://stackoverflow.com/questions/22875502/valid-starting-values-error-for-glm-logistic-regression) — Ben Bolker, Nov 18 '14 at 19:19
It sounds like `(Intercept)` coefficient would be the "beta0" that the questioner was asking for if he used an interaction model. BTW the RPubs site has no author attribution. Is that you? — DWin, Nov 18 '14 at 19:33
@BenBolker: thanks. I've added a SAS output table to demonstrate what I need. I don't actually have access to SAS currently, so the data set is different. Sorry, I don't post here often; what are you looking for re: reproducible example? I've tried to install Ismeans but it does not seem to be available anymore. — , Nov 18 '14 at 20:28
@BondedDust: Thanks for the suggestion. I have looked into contrasts quite a bit but there is no option that I can find that will produce what I've described. I'm a little confused on your comments about interaction terms. I did try adding the interaction to see what would happen, but it does not disaggregate the intercept. I also want to avoid introducing unnecessary complexity into the model as that interaction does not happen to be a variable of interest. — , Nov 18 '14 at 20:32
Time for a reproducible example. Appears to me the "beta0" is the same thing as the "(Intercept)" although a value of 7 in a logistic regression seems pretty outlandish. — DWin, Nov 18 '14 at 20:37
For the SAS output it looks like all of the predictors are continuous, which also changes the problem qualitatively; in this case a single intercept makes sense, if the variables are centered (see Schielzeth 2010 MEE "Simple means to improve the interpretability of regression coefficients"). Info on reproducible examples is [here](http://tinyurl.com/reproducible-000). lsmeans package looks [alive and well](http://cran.r-project.org/web/packages/lsmeans/index.html) ?? — Ben Bolker, Nov 18 '14 at 21:34
PS value of 7 in a logistic regression is reasonable (but not necessarily sensible) *if* the range of continuous predictors in the data set is far from zero — Ben Bolker, Nov 18 '14 at 21:36

Produce a GLM intercept that does not include reference levels for categorical variables?

0 Answers0