Horseshoe priors and random slope/intercept regressions

Question

I'm interested in using the horseshoe prior (or the related hierarchical-shrinkage family of priors) for regression coefficients of a traditional multilevel regression (e.g., random slopes/intercepts). Horseshoe priors are similar to lasso and other regularization techniques, but have been found to have better performance in many situations. A regression coefficient $\beta_i$, where $i \in \{1,D\}$ predictors, has a horseshoe prior if its standard deviation is the product of a local $(\lambda_i)$ and global $(\tau)$ scaling parameter. $$\beta_{i} \sim Normal(0,\lambda_{i}) \\\lambda_{i} \sim Cauchy^{+}(0,\tau) \\\tau \sim Cauchy^{+}(0,1)$$

I am uncertain as to the best way to expand this to a random intercept framework. For example, group $j$'s $i$th coefficient is often normally distributed around a group-level mean $(\gamma_i)$ with a group level standard deviation $(\sigma_i)$.

$$\beta_{i,j} \sim Normal(\gamma_{i},\sigma_{i}) \\\gamma_{i} \sim Normal(0,\psi) \\\sigma_{i}\sim Cauchy^{+}(0,c) $$

This tends to shrink estimates of $\beta_{i,j}$ towards $\gamma_i$ based on the average dispersion around the coefficient mean. However, if only a small number of groups are substantially different from the mean, I'm concerned that the predictive or explanatory ability of the model may decrease. If I wanted to add a horseshoe prior to these coefficients, would it be appropriate to give each group's coefficient it's own independent $\lambda$?

$$\beta_{i,j} \sim Normal(\gamma_i,\lambda_{i,j}) \\\gamma_{i} \sim Normal(0,\lambda_{i,0}) \\\lambda_{i,j} \sim Cauchy^{+}(0,\tau) \\\tau \sim Cauchy^{+}(0,1)$$

Would it be better for the $\lambda_{i,j}$'s to have an extra level of hierarchy that controls for dispersion around $\gamma_i$?

$$\beta_{i,j} \sim Normal(\gamma_i,\lambda_{i,j}) \\\gamma_{i} \sim Normal(0,\lambda_{i,0}) \\\lambda_{i,j} \sim Cauchy^{+}(0,\phi_i) \\\lambda_{i,0} \sim Cauchy^{+}(0,\tau) \\\phi_{i} \sim Cauchy^{+}(0,\tau) \\\tau \sim Cauchy^{+}(0,1)$$

I've played around with modeling some of these options in Stan, but I would appreciate thoughts or advice on whether or not these formulations make statistical sense.

I am not aware of any research using such priors on group-specific deviations from common parameters. Most of the research on hierarchical shrinkage priors has focused on the case where there are many common regression coefficients. And the researcher believes that many of them are zero, some are far from zero, but the researcher has no idea which. A hierarchical shrinkage prior could be used for group-specific deviations from common parameters, but it seems odd in most contexts to believe that most of the deviations are zero, some are far from zero, but the researcher has no idea which. — Ben Goodrich, Feb 29 '16 at 22:08
2017, does this paper help? https://arxiv.org/abs/1707.01694 — hhh, Aug 29 '18 at 13:19

Horseshoe priors and random slope/intercept regressions

0 Answers0