2

I've got a dataset where a treatment $W$ has been applied to units $i$ within clusters $c$. $W$ is constant within each cluster. As a component of an algoritm that I'm implementing (which was developed for non-clustered data), I need to match each unit $i$ with it's match $m_i$.

For example, think of schools that receive new books versus those that don't. I want to match students from treated schools with their nearest neighbor in schools that were in the control group.

I've got a solution in mind, but I'm looking for a canonical solution/standard practice/state of the art. Ideally something that has already been programmed in R.

I was first thinking of matching schools with each other based on minimizing a Mahalanobis distance between them, based on mathods from this thread. This would only be on cluster-level covariates (including averages of individual-level data). Then, cluster matches in hand, I'd match individuals within a treatment cluster with their counterparts in a control cluster, based on individually-varying data.

The above strikes me as logical, but I'd rather not re-invent the wheel, and I haven't done much matching before. Is there a standard practice for problems of this sort? Note I need the (nearest neighbor) matches themselves, not just an estimate of an average treatment effect.

generic_user
  • 11,981
  • 8
  • 40
  • 63
  • Why don't you just use a propensity score? If you think cluster-level variables are important to the outcome variable of interest, just include them in the propensity score. – Bill Jan 29 '16 at 19:12
  • First, thanks @ssdecontrol for putting a bounty on my question! Feel free to edit my question if you want to improve it or emphasize certain aspects of it. – generic_user Jan 29 '16 at 20:43
  • @Bill are you talking about matching on the estimated probability of treatment? This seems a bit coarse. Say that treatment assignment is completely randomized. Your propensity score matches will be pure noise. – generic_user Jan 29 '16 at 20:43
  • To add: In many settings, the purpose of matching is to balance unobserved covariates that vary with the treatment, under the assumption that such can be done by balancing observed covariates. In my case, I actually want $i$ and its match to be really, really similar. – generic_user Jan 29 '16 at 20:57
  • OK, but you can probably see from your response that it's important to know why you want them to match. You could just minimize Euclidean distance if you want a close match. – Bill Jan 29 '16 at 22:06
  • Well, Mahalanobis is better than Euclidian. But the real question is how to deal with the multilevel structure – generic_user Jan 29 '16 at 22:31
  • Given you know the number of clusters you expect why would $k$-NN be out of the question? Also, can you actually state the question you want to answer? I say this because I think that you are asking for non-standard intermediate step rather than the actual final question and maybe you don't need this match-up step. Do you want to "compare treatments"? (Apologies maybe I misunderstand the question) – usεr11852 Feb 01 '16 at 07:19
  • I think this is somehow borderline off-topic to statistics and is much more related to algorithms in computer science. – caveman Feb 02 '16 at 04:49
  • Im curious to know why you are not using multilevel analysis in a multilevel setting? – mandata Feb 04 '16 at 03:01

0 Answers0