Assume we have a sample where the treatment is present for only a small fraction of the sample, and we want to exploit the treatment using propensity score matching. However, since the treatment group is small, matching with a caliper distance will result in a completely different matched sample everytime and, thus, the randomness is results is huge?
After reading some literature, I found mixed arguments, but: is it common practice to, for example, repeat the caliper matching 1,000 times and estimate average coefficients for the matched sample (without bootstrapping the original sample)? And, subsequently, estimate the standard error as the standard deviation of all average coefficients? Argumentation would of course be to get unbiased estimates, since the matching is always random (in caliper).
I found some articles that state that similar methods yield accurate estimates of the population parameters, however, I don't find many practical evidence of reseachers doing this method, e.g.:
- Austin, P. C., & Small, D. S. (2014). The use of bootstrapping when using propensity-score matching without replacement: a simulation study. Statistics In Medicine, 33(24), 4306-4319
- Bai, H. (2013). A Bootstrap Procedure of Propensity Score Estimation. Journal Of Experimental Education, 81(2), 157-177.
Thanks ;)