I wish to model data from an experiment using a hierarchical Bayesian logistic regression. The experiment involved many subjects, and many trials collected from each subject. The DV is the outcome of each trial, coded as 0s and 1s. I have numerous IVs, which vary on a trial-to-trial basis. I wish to model the effects of these IVs on each individual subject's responses, such that I have a regression coefficient for each IV per subject. However, I also want to estimate these regression coefficients hierarchically, such that a coefficient associated with a particular IV for a given subject is informed by the same coefficients from other subjects. Here is the relevant portion of my code:
def make_model(group, formula, data):
"""Construct the hierarchical logistic linear model according to the
formula. Assumes a fully balanced repeated-measures design (i.e., each
subject is a complete sub-model).
"""
# coefficient names
names = dmatrix(formula, data).design_info.column_names
# hyperpriors
mu = [pm.Normal(name='%s_mu_%s' % (group, n), sd=100.) for n in names]
sd = [pm.HalfCauchy(name='%s_sd_%s' % (group, n), beta=30.) for n in names]
for subj, subj_data in data.groupby('subj'):
# design matrix
x = np.asarray(dmatrix(formula, subj_data))
# priors
beta = [pm.Normal(name='%s_%s' % (subj, n), mu=mu[i], sd=sd[i]) for
i, n in enumerate(names)]
# deterministic variables
theta = dot(x, stack(*beta))
p = pm.invlogit(theta)
obs = subj_data.stay.values
# likelihood
pm.Bernoulli(name='%s_Y' % subj, p=p, observed=obs)
Basically, I create a design matrix for the complete data set using patsy
. I then create hyper-priors for each regression coefficient. Then I loop over the individual subjects, creating subject-specific design matrices, subject-level priors for each coefficient, and a subject-level likelihood vector as I go.
Unfortunately, the model is so large that it never compiles. I suspect that I might get better performance by constructing a single likelihood vector rather than multiple ones, but I'm not sure how I could "match up" each item in that vector to the correct priors. Any suggestions?