13

I am trying to run a Bayesian logit on the data here. I am using bayesglm() in the arm package in R. The coding is straightforward enough:

df = read.csv("http://dl.dropbox.com/u/1791181/bayesglm.csv", header=T)
library(arm)
model = bayesglm(PASS ~ SEX + HIGH, family=binomial(link="logit"), data=df)

summary(model) gives the following output:

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.10381    0.10240   1.014    0.311    
SEXMale      0.02408    0.09363   0.257    0.797    
HIGH        -0.27503    0.03562  -7.721 1.15e-14 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2658.2  on 1999  degrees of freedom
Residual deviance: 2594.3  on 2000  degrees of freedom
AIC: 2600.3

Please walk me through this. I understand that this code uses a very weak prior (since I am not specifying the prior means) so the output is going to be practically the same if I used glm() instead of bayesglm(). But the output should still be in the Bayesian spirit, right? What are the $p$-values and $z$-values here? Aren't these frequentist inference tools? Are they interpreted differently here?

Macro
  • 40,561
  • 8
  • 143
  • 148
user3671
  • 1,059
  • 2
  • 12
  • 20
  • This is a comment and not an answer, but this is what would make some sense to me. You get estimates which are probably the values for which the posterior distribution is maximized. It might also be possible that they are just the means of the posterior? Worth checking out if you can. But no matter what the exact details are, once you have some estimates you can test them by the usual Estimate/Std. Error -> z-score procedure which works if the posterior is close enough to a normal (it goes to a normal under some conditions that usually hold). – Erik Jul 02 '12 at 06:29
  • Erik... You are correct: the coefficients are indeed the means of the posterior densities. My question is about the p- and z-values. What do they represent here? – user3671 Jul 02 '12 at 06:32
  • Ok. If you have a density that is approximately normally distributed you can test whether it's mean is zero by taking the z-score = mean / standard deviation of the distribution and comparing it with the standard normal distribution. Then you look how unlikely your value or a larger value would be under the standard normal distribution -> p-value. See z-score on Wikipedia for details. – Erik Jul 02 '12 at 07:06
  • Well, yes. But why bother doing that in a Bayesian setting? In Bayesian inference, the point estimate is my best guess about the random parameter, so there's no need to test it. At the most, I can include a "credible interval" which is equivalent to a frequentist "confidence interval" but whose statistical interpretation is vastly different. This the confusing part about the summary() output. The spirit is Bayesian, but the output is frequentist? – user3671 Jul 02 '12 at 07:13
  • One point is that the estimate you get will be different, since you used a prior. And while the point estimate is the "best guess" if you want to show in a Bayesian way that something has an effect you would try to show that the credible interval does not contain the zero. When you approximate the posterior by a normal with the same mean and sd (asymptotically correct) then the (1-p/2) credibility interval is the largest symmetrical credibility interval containing the zero, so your answer is basically the same. The p is p-value stated above. – Erik Jul 02 '12 at 07:32
  • @user3671 I agree with you. The p-value output in the table you display appears to be the usual frequentist p-value and there would be no reason to look at it in that context when doing Bayesian inference. – Michael R. Chernick Jul 02 '12 at 13:12
  • my bet is that this program defaults to a standard glm call if you don't specify a prior. This is what should happen, because for glms, the posterior is the sampling distribution of the MLE. Do you get p-values if you put in a prior distribution? The p-values are "bayesian" in the sense that it tells you how much posterior probability is "near" zero. This is because $\beta-\hat{\beta}$ has the same asymptotic dependence on $\beta$ and $\hat{\beta}$. – probabilityislogic Jul 02 '12 at 13:35

1 Answers1

19

Great question! Although there are Bayesian p-values, and one of the authors of the arm package is an advocate, what you are seeing in your output is not a Bayesian p-value. Check the class of model

class(model)
"bayesglm" "glm"      "lm" 

and you can see that class bayesglm inherits from glm. Furthermore, examination of the arm package shows no specific summary method for a bayesglm object. So when you do

summary(model)

you are actually doing

summary.glm(model)

and getting frequentist interpretation of the results. If you want a more Bayesian perspective the function in arm is display()

atiretoo
  • 1,458
  • 14
  • 29
  • +1 Excellent answer! This is the trouble with R, there are so many highly intelligent statisticians who write horrible code that leaves these kinds of landmines lying around. – Bogdanovist Jul 02 '12 at 22:05
  • It seems like [a deliberate choice](http://andrewgelman.com/2012/07/moving-beyond-hopeless-graphics) on the designers part, rather than an oversight. – atiretoo Jul 02 '12 at 22:23
  • After reading the link I agree with the intent, but in that case summary() should have been re-implemented to simply call display() rather than giving nonsense results without warning. The person who asked this question got tripped up by a code that broke the user model for R that has been established by every other object they have ever used. That's terrible programming practice. – Bogdanovist Jul 02 '12 at 22:37
  • 2
    Many thanks, atiretoo. This raises another question. What's the difference between display() and summary()? It seems to me that the output from the former is just the output from the latter, less two columns, and rounded to 2 digits. It would appear so, from Gelman's post you've linked above. – user3671 Jul 03 '12 at 12:04
  • Yes, and from the discussion at Andrew Gelman's blog it sounds like they'll fix this in future versions of the arm package. – atiretoo Jul 03 '12 at 19:54
  • One last corollary question. If the outputs from display() and summary() are the same, then there's really no point in going with display() till they get around to fixing it, right? xtable(), for example, cannot handle display(). (I had forgotten to mark your response as correct - a mistake I just corrected. Thanks, atiretoo). – user3671 Jul 04 '12 at 03:28
  • Thanks! and your welcome. I agree that if you want to use something like xtable() then you won't want to use display(). – atiretoo Jul 04 '12 at 04:19
  • Hi, running `display(model, detail = TRUE)` actually displays the z and p values. Are this bayesian p values? – HNSKD Jun 25 '21 at 01:40