I'm interested in using the horseshoe prior (or the related hierarchical-shrinkage family of priors) for regression coefficients of a traditional multilevel regression (e.g., random slopes/intercepts). Horseshoe priors are similar to lasso and other regularization techniques, but have been found to have better performance in many situations. A regression coefficient $\beta_i$, where $i \in \{1,D\}$ predictors, has a horseshoe prior if its standard deviation is the product of a local $(\lambda_i)$ and global $(\tau)$ scaling parameter. $$\beta_{i} \sim Normal(0,\lambda_{i}) \\\lambda_{i} \sim Cauchy^{+}(0,\tau) \\\tau \sim Cauchy^{+}(0,1)$$
I am uncertain as to the best way to expand this to a random intercept framework. For example, group $j$'s $i$th coefficient is often normally distributed around a group-level mean $(\gamma_i)$ with a group level standard deviation $(\sigma_i)$.
$$\beta_{i,j} \sim Normal(\gamma_{i},\sigma_{i}) \\\gamma_{i} \sim Normal(0,\psi) \\\sigma_{i}\sim Cauchy^{+}(0,c) $$
This tends to shrink estimates of $\beta_{i,j}$ towards $\gamma_i$ based on the average dispersion around the coefficient mean. However, if only a small number of groups are substantially different from the mean, I'm concerned that the predictive or explanatory ability of the model may decrease. If I wanted to add a horseshoe prior to these coefficients, would it be appropriate to give each group's coefficient it's own independent $\lambda$?
$$\beta_{i,j} \sim Normal(\gamma_i,\lambda_{i,j}) \\\gamma_{i} \sim Normal(0,\lambda_{i,0}) \\\lambda_{i,j} \sim Cauchy^{+}(0,\tau) \\\tau \sim Cauchy^{+}(0,1)$$
Would it be better for the $\lambda_{i,j}$'s to have an extra level of hierarchy that controls for dispersion around $\gamma_i$?
$$\beta_{i,j} \sim Normal(\gamma_i,\lambda_{i,j}) \\\gamma_{i} \sim Normal(0,\lambda_{i,0}) \\\lambda_{i,j} \sim Cauchy^{+}(0,\phi_i) \\\lambda_{i,0} \sim Cauchy^{+}(0,\tau) \\\phi_{i} \sim Cauchy^{+}(0,\tau) \\\tau \sim Cauchy^{+}(0,1)$$
I've played around with modeling some of these options in Stan, but I would appreciate thoughts or advice on whether or not these formulations make statistical sense.