Edit: I see that I missed a key part of the question, which is your confusion about what a1, a2, b1, and b2 actually are. These refer to the 4 cell means in the 2x2 design. Imagine that subjects get randomly assigned twice: first to either condition a or condition b, and then to either condition 1 or condition 2. So each subjects ends up in either group a1, a2, b1, or b2. In what I wrote below, I use these labels as shorthand to refer to the group means rather than the groups themselves (as in the previous advice you received).
Actually I think it is more appropriate to use
$$
d=\frac{(a1-b1)-(a2-b2)}{2\sigma}
$$
rather than the definition you mentioned. I covered this on my blog a few months back (LINK) but I'll cover the basic argument again here.
If we take your numerator and distribute the implicit $-1$, we see that it equals
$$
a1-b1-a2+b2=(+1)a1 + (-1)a2 + (-1)b1 + (+1)b2.
$$
The key here is to realize that this is still a comparison between two group means, just like in the classical definition of Cohen's d. We are comparing the a1 and b2 groups (which have coefficients of +1 in the above sum) against the a2 and b1 groups (which have coefficients of -1). So the two relevant means to use in computing d are the mean of the a1 and b2 means, $\mu_1=\frac{a1+b2}{2}$, and the mean of the a2 and b1 means, $\mu_2=\frac{a2+b1}{2}$. This gives us
$$
d=\frac{\mu_1-\mu_2}{\sigma}=\frac{\frac{a1+b2}{2}-\frac{a2+b1}{2}}{\sigma}=\frac{(a1-b1)-(a2-b2)}{2\sigma}.
$$
I think that this is the most natural extension of the classical Cohen's d to a 2x2 interaction effect.
If you're not convinced yet, see my blog comment (HERE) for some further arguments for why this should be preferred over the effect size definition that you mentioned.