Calculate R-squared with JAGS and R

Question

I have the following model that I am running in JAGS from R:

 model {
        for( i in 1:nData){
          y[i] ~ dnorm(mu[i], tau)
          mu[i] <- b0 + inprod(b[],x[i,])              
          }

        tau ~ dgamma(.01,.01)
        b0 ~ dnorm(0,.0001)
        for (j in 1:nPredictors){
          b[j] ~ dgamma(2,2)
          }
        }

It is a simple regression model with a gamma prior on the beta coefficients. I can run the model and get some reasonable results, but my boss wants to know whether the Bayesian flavor is better than a normal OLS regression, and to this end he wants me to calculate an R^2 statistic for both models. I am fairly new to Bayesian statistics, so if anyone could share some R code to illustrate how it is done, I would be very grateful!

If you could share a fragment of your R code, at least the part that calls JAGS, that would help construct some R code that would address your "how it is done" question in the context of your code, as there are several ways you could be doing this. — jbowman, Jun 14 '12 at 21:51
Thomas, 1) why do you use `dgamma` prior - it doesn't allow negative values of your linear coefficients, are you aware of it? It is not a standard linear regression then. 2) Why do you use `inprod` instead of classic full formula with `+` and `*` - is it faster in JAGS? Thanks! — Tomas, Nov 24 '14 at 11:58

score 6 · Accepted Answer · edited Apr 13 '17 at 12:44

There are a couple of ways you can do this. The first would be to use the mean of the posterior for each of the $\mu_i$, and calculate a residual using this as the "estimated value" corresponding to $\hat{\beta}X$ in OLS. You then calculate the variance of the residuals as usual and plug it into the $R^2$ calculation. You would do this in R, of course. An alternative would be to use the posterior mean of the variance ($1/\tau$) as the estimate of residual variance in the $R^2$ calculation, again done in R. The former comes closest to how $R^2$ is calculated in classical statistics.

No doubt there are other approaches, which (hopefully) others will point out in their answers.

However, the bigger issues are a) with $R^2$ as a criterion and b) with comparison of OLS estimation to anything else using $R^2$ as a criterion. I'll skip over the first, pointing you to statisticalengineering.com and Andrew Gelman as references. The second issue arises because OLS maximizes $R^2$ (a consequence of the "least squares" property) and therefore no other technique (that is not equivalent to OLS) will generate as high an $R^2$. Consequently, your Bayesian approach is doomed if maximizing $R^2$ is the criterion of choice.

You might be able to suggest a more out-of-sample criterion instead, for example, a k-fold cross-validation, which would necessitate multiple runs of JAGS on subsets of the data, then comparing the out-of-sample predicted values to the actual values. You can generate the predicted values inside JAGS as observed in the answer to Missing values in response variable in JAGS, or in R of course.

I'll also point out that the dgamma(0.01,0.01) distribution has largely fallen out of favor, as it is actually quite informative near zero. The answers to priors for lognormal models might help with that.

Calculate R-squared with JAGS and R

1 Answers1