3

I want to extract standard deviation of residual from glmer() function in R .

So I wrote :

lmer_obj = glmer(Y ~ X1 + X2 + (1|Subj), data=D, family=binomial)
sigma(lmer_obj)

I noticed that the last command sigma(lmer_obj) returns always 1 irrespective of the data That is, whether I used the cbpp data or my own simulated data from multilevel logistic distribution, the residual standard error is always 1.

How can I get the residual standard deviation from glmer() function?

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
user81411
  • 731
  • 1
  • 7
  • 14
  • Please define "residual" first in the context of a logistic regression setting. Do you expect its standard deviation to be a single value, no matter the regressors? – Michael M Jul 15 '15 at 13:23
  • @MichaelM In logistic regression $logit(\pi_i)=\beta_0+\beta_1 x_i$ , I see no residual . But in two-level logistic regression model $logit(p_{ij})=\gamma_{00}+\gamma_{10}x_{ij}+\gamma_{01}z_{j}+\gamma_{11}x_{ij}z_j+u_{0j}+u_{1j}x_{ij}$ , here variance of $u_{1j}$ is residual variance $\sigma_1^2$ , and I expect this to be a single value . – user81411 Jul 15 '15 at 13:33
  • @MichaelM Could you please give me some reference in the context of your comment ? – user81411 Jul 15 '15 at 13:40
  • 1
    Although asked in the context of R, this is really a conceptual misunderstanding about logistic regression & GLMMs. This Q should be considered on topic here. – gung - Reinstate Monica Jul 15 '15 at 13:44

1 Answers1

7

Logistic regression models, whether they include random effects or not, do not have an error term (see: Logistic Regression - Error Term and its Distribution). This is generally true of most GLiMs (although linear regression is a special case of GLiMs and does have an error term).

Of course, given a predicted probability and an observed response value, various residuals can be formed (see: Logistic regression and error terms). People naturally want to use these to assess their models, because that's what you do with linear models. However, looking at these residuals will typically lead you astray (see: Interpretation of plot(glm.model)).

Thus, you are better off abandoning your quest for the standard deviation of the residuals, in my honest opinion. That said, if you are dead set on this, you could try:

library(lme4)
data(cbpp)
m1 = glmer(cbind(incidence,size-incidence)~period+(1|herd), family=binomial, data=cbpp)
sd(residuals(m1, type="deviance"))  # [1] 1.133436
sd(residuals(m1, type="response"))  # [1] 0.1001946

(Edit: the question is somewhat ambiguous.)
If you want the estimated standard deviation of the population from which the sample was drawn, you could try:

attr(summary(m1)$varcor$herd, "stddev")
# (Intercept) 
#   0.6420699 

If you want the standard deviation of the predicted random effects (these are not necessarily the same as above, see: Why do the estimated values from a Best Linear Unbiased Predictor (BLUP) differ from a Best Linear Unbiased Estimator (BLUE)?), you could try:

sd(ranef(m1)$herd[,1])  # [1] 0.5342242

If you are interested in the dispersion, it is almost always necessarily $1$ for logistic regression models. If the response data are binomial with $n>1$ (i.e., not Bernoulli), you can have over- (or, less likely, under-) dispersion, but introducing the random effects here gets you back to $1$. That piece of information is what the sigma(m1) is giving you (i.e., not the standard deviation of the residuals); you could also get it via:

summary(m1)$sigma  # [1] 1
gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
  • 1
    A comment by the OP to the question leads me to suspect that `sigma(...)` is intended in a different way: namely, to obtain an estimate of the SD of a latent random component of this mixed model. – whuber Jul 15 '15 at 20:43
  • @whuber: it is used to measure under- or overdispersion (if quasibinomial etc. family was chosen). Maybe this is compatible with your comment? – Michael M Jul 15 '15 at 21:01
  • @Michael Yes, that is what I thought Munira is looking for. Munira: if that's the case, it would help to edit your question to explain precisely what you were hoping the `sigma` function would calculate. – whuber Jul 15 '15 at 21:09
  • A two-level logistic regression : $$\text{logit}(p_{ij})=\pi_{0j}+\pi_{1j} x_{ij}$$ $$\pi_{0j}=\gamma_{00}+\gamma_{01}z_j+u_{0j}$$ $$\pi_{1j}=\gamma_{10}+\gamma_{11}z_j+u_{1j}$$ where ,$$ \begin{bmatrix} u_{0j} \\ u_{1j} \\ \end{bmatrix} =N \begin{pmatrix} \begin{bmatrix} 0 \\ 0 \\ \end{bmatrix},\begin{bmatrix} \sigma_0^2&\sigma_{01} \\ \sigma_{01}&\sigma_1^2 \\ \end{bmatrix} \end{pmatrix} $$ . – user81411 Jul 17 '15 at 11:06
  • @whuber `lmer_obj = glmer(Y ~ X*Z + (1|cluster), data=D, family=binomial)` I expected `attr(summary(lmer_obj)$varcor$herd, "stddev")` would calculate $\sigma_0$ and `sigma(lmer_obj)` would calculate $\sigma_1$ . – user81411 Jul 17 '15 at 11:07
  • Still I am not understanding how can I calculate $\sigma_1$ ? – user81411 Jul 17 '15 at 11:11
  • In your example with `cbpp` data , `summary(m1)$residuals` gives $56$ residual values . What are those actually if logistic regression do not have error term ? – user81411 Jul 17 '15 at 11:15
  • See http://stats.stackexchange.com/questions/1432. – whuber Jul 17 '15 at 12:14
  • @Munira, please read the linked threads. You may also want to read this: [What is the difference between errors and residuals?](http://stats.stackexchange.com/a/133391/7290) There are 56 residual values b/c there are 56 observations & 56 predicted probabilities. I used the `cbpp` example b/c you referred to it. Note that it does not have random slopes, $u_{1j}$, & thus no $\sigma_1$ (in your terms). It only has random intercepts, which you might call $\sigma_0$. At this point there is a distinction b/t the estimate of the population SD & the SD of the REs. Both are shown above. – gung - Reinstate Monica Jul 17 '15 at 14:52
  • That is , `sd(ranef(m1)$herd[,1])` calculates $\sigma_0$ . Aren't I right ? – user81411 Jul 18 '15 at 13:58
  • @Munira, it depends on what you mean by $\sigma_0$. The code you show calculates the sample SD of the predicted random effects. If you want the estimated SD of the population from which the sample was drawn, you need `attr(summary(m1)$varcor$herd, "stddev")`. Because predicted random effects are 'shrunk' towards 0, the former won't typically equal the latter. From your comment above with the model displayed with $\LaTeX$, I would guess you may be after the latter. – gung - Reinstate Monica Jul 18 '15 at 14:34
  • I have understood that `m1 = glmer(cbind(incidence,size-incidence)~period+(1|herd), family=binomial, data=cbpp) ` is a random intercept model . But in case of random slope model (model in which regression coefficients vary randomly between level 2 units , such as, `lmer_obj = glmer(Y ~ X*Z + (1|cluster), data=D, family=binomial)`) , could you please tell how can I extract the estimates of random slopes ? From one of my comments above with the model displayed with latex , I am telling $u_{1j}$ as random slopes . `ranef(lmer_obj)` gives estimates of random intercept , i.e., $u_{0j}$ . – user81411 Jul 19 '15 at 01:51
  • @Munira, I need access to the actual object--I need a reproducible example. In addition, simply asking for code is off topic here. If you are at the point where you understand the issues & just need code help, you should ask on [SO]. Having said that, the example for `?lmer` in the documentation has both a slope & an intercept `m1 = lmer(Reaction~Days+(Days|Subject), sleepstudy)`. For that example, you would use `attr(summary(fm1)$varcor$Subject, "stddev")[2]`. To figure this out for yourself, notice that the number you want shows up in `summary(m1)`, so use `str(summary(m1))` to find it. – gung - Reinstate Monica Jul 19 '15 at 04:06
  • Could you please explain a little bit : what is the difference between `m1 = lmer(Reaction~Days+(Days|Subject), sleepstudy)` and `m1 = lmer(Reaction~Days+(1|Subject), sleepstudy)` , That is , in the `formula` if I use $1$ before vertical bar `(1|Subject)` , what does it mean ? – user81411 Jul 19 '15 at 05:45
  • @Munira, try: [R's lmer cheat-sheet](http://stats.stackexchange.com/q/13166/7290). – gung - Reinstate Monica Jul 19 '15 at 14:03
  • Would you please help a little bit that isn't `attr(summary(m1)$varcor$herd, "stddev")` point estimates of standard deviation of the random effect ? – user81411 Jul 30 '15 at 11:55
  • Could you please tell me what is *estimated standard deviation of population* in general term ? Is it $s=\frac{1}{n}\sum_{i=1}^{n}(x_i-\bar x)^2$ ? – user81411 Jul 30 '15 at 14:02
  • @Munira, I'm not sure I understand your questions. Are you asking what the difference between a population & a sample is? If so, [this thread](http://stats.stackexchange.com/q/269/7290) may help you. – gung - Reinstate Monica Jul 30 '15 at 17:51