Matching is not recovering the true effect in simulated data

Question

I am trying to recover the true (simulated) effect of a treatment Z on an outcome Y, which is set to ATE = 5 (the csv file for the data is located here: https://www.dropbox.com/s/92obn9hsu3tqy92/synthetic_data_2.csv?dl=0). I am able to recover the true effect using a linear model, however, for some reason, I am unable to get the same effect using MatchIt (or Opmatch). As the main confounder (variable name “C_p”) is a binary variable, I have tried converting it to numeric, integer, and factor, but the same problem persists. I have also tried “cem” method and “nearest” but no progress.

After suspecting that something is convoluted in the original simulated file, I simulated some new data (see below). Using these data, I am recovering the true effect using lm. With matching, the effect is closer to the truth compared to the original problem, but still biased. Using a t.test, we see that the ATE is -4.15 – (-2.55)= -1.6, yet it should be equal to 5.

Any ideas of why matching is not recovering the true effect of synthetic_data_2.csv, using matching?

setwd(dirname(rstudioapi::getActiveDocumentContext()$path))
setwd("../../")


library(MatchIt)
library(dplyr)

#setwd("path/to/dir")
imf.meta <- read.csv("synthetic_data_2.csv", sep= ",")

imf.meta$Z <- imf.meta$'T'
#imf.meta$C_p <- as.numeric(imf.meta$C_p) AD: tried numeric, integer, and factor


# Executing a matching algorithm
imf.meta_nomiss <- imf.meta %>%
  select(C1, C2, C3, Cp, Y, Z) %>%
  na.omit()

# tried different approaches
mod_match <- matchit(Z ~ C1 + C2 + C3 + Cp,
                     method = "nearest", data = imf.meta_nomiss)
# mod_match <- matchit(Z ~ C1 + C2 + C3 + C_p,
#                      method = "cem", data = imf.meta_nomiss)
# mod_match <- matchit(Z ~ C_p,
#                      method = "nearest", data = imf.meta_nomiss)
# mod_match <- matchit(Z ~ C_p,
#                      method = "cem", data = imf.meta_nomiss)
dta_m <- match.data(mod_match)


# Estimating treatment effects
with(dta_m, t.test(Y ~ Z))

# recover treatment effect withouth additional adjusting
lm_treat1 <- lm(Y ~ Z, data = dta_m)
summary(lm_treat1)

# recover treatment effect with adjusting


lm_treat2 <- lm(Y ~ Z + C1 + C2 + C3 + C_p, data = dta_m)
summary(lm_treat2)




### Simulate new data
# all covariates are continous
n <- 2000
p <- 10
X <- matrix(rnorm(n * p), n, p)
W <- rbinom(n, 1, 0.4 + 0.2 * (X[, 1] > 0))
Y <- pmax(X[, 1], 0) * W + X[, 2] + pmin(X[, 3], 0) + rnorm(n)


# create binary covariate
n <- 2000
p <- 10
X <- matrix(rnorm(n * p), n, p)
X[,1] <- rbinom(n, 1, 0.6)
W <- rbinom(n, 1, 0.1+0.7 * (X[, 1] > 0.5))
#Y <- pmax(X[, 1], 0) * W + X[, 2] + pmin(X[, 3], 0) + rnorm(n)
Y <- pmax(X[, 1], 0)*(-10) + 5*W  + rnorm(n)

#   na.omit()
imf.meta_nomiss <- as.data.frame(X) 
imf.meta_nomiss$Y <- Y
imf.meta_nomiss$W <- W

# compare with grf
library(grf)
tau.forest <- causal_forest(X, Y, W)
average_treatment_effect(tau.forest, target.sample = "all")

# compare with lm
lm_treat1 <- lm(Y ~ W + V1+V2+V3+V4+V5+V6+V7+V8+V9+V10, data = imf.meta_nomiss)
summary(lm_treat1)



# use matching
mod_match <- matchit(W ~ V1+V2+V3+V4+V5+V6+V7+V8+V9+V10,
                     method = "nearest", data = imf.meta_nomiss)
dta_m <- match.data(mod_match)

# Estimating treatment effects
with(dta_m, t.test(Y ~ W))

# AD with and without mweights is not making a difference
# without weights
lm_treat3 <- lm(Y ~ W, data = dta_m)
summary(lm_treat3)

# with weigths
lm_treat4 <- lm(Y ~ W, data = dta_m, weights=weights )
summary(lm_treat4)

score 2 · Accepted Answer · answered Aug 19 '21 at 08:34

1:1 propensity score matching without replacement is not generally unbiased; it works only when there are many more control units than treated units and when there is good overlap between the treatment groups. There are a few things that could possibly be going on here that I'll attempt to address.

Let's consider your analysis of the Dropbox CSV file. Starting with nearest neighbor matching on the propensity score:

> mod_match <- matchit(Z ~ C1 + C2 + C3 + Cp,
+                      method = "nearest", data = imf.meta_nomiss)
> summary(mod_match, improvement = FALSE)

Call:
matchit(formula = Z ~ C1 + C2 + C3 + Cp, data = imf.meta_nomiss, 
    method = "nearest")

Summary of Balance for All Data:
         Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean eCDF Max
distance        0.7011        0.2248          1.8304     1.2078    0.3973   0.6309
C1            105.1623       94.4411          0.2145     1.0402    0.0620   0.0961
C2            612.5812      409.4929          1.1874     0.9812    0.2965   0.4356
C3           -200.4626     -200.5272          0.0013     0.9718    0.0063   0.0176
Cp              0.7255        0.3189          0.9112          .    0.4066   0.4066


Summary of Balance for Matched Data:
         Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean eCDF Max Std. Pair Dist.
distance        0.7011        0.2934          1.5668     1.2203    0.3127   0.5714          1.5668
C1            105.1623       98.1299          0.1407     1.0524    0.0411   0.0710          1.1167
C2            612.5812      464.6477          0.8649     1.2906    0.2179   0.3556          1.1090
C3           -200.4626     -200.9724          0.0103     0.9481    0.0093   0.0248          1.1554
Cp              0.7255        0.4006          0.7281          .    0.3249   0.3249          1.0761

Sample Sizes:
          Control Treated
All          3424    2576
Matched      2576    2576
Unmatched     848       0
Discarded       0       0

We can notice a few things here. First, the data are very imbalanced. There are large standardized mean differences and KS statistics (eCDF max) for C2 and Cp. Let's take a look at the distribution of C2 before and after matching:

There are areas of non-overlap in the treated distribution, which means nearest neighbor matching without a caliper will fundamentally be unable to achieve balance, yielding bias in the effect estimate. Looking at this plot, there is no hope in accurately estimating the ATT using matching without a caliper. Supplying a caliper (e.g., caliper = .05) restricts the analysis to the area of overlap and allows us to achieve balance and accurately recover the treatment effect (the treatment effect seems to be constant so the average treatment effect in the caliper-matched sample is equal to the ATE and ATT, though this is rarely true in real data).

> #Caliper matching
> mod_match <- matchit(Z ~ C1 + C2 + C3 + Cp,
+                      method = "nearest", data = imf.meta_nomiss, caliper = .05)
> summary(mod_match, un = FALSE)

Call:
matchit(formula = Z ~ C1 + C2 + C3 + Cp, data = imf.meta_nomiss, 
    method = "nearest", caliper = 0.05)

Summary of Balance for Matched Data:
         Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean eCDF Max Std. Pair Dist.
distance        0.4874        0.4785          0.0344     1.0574    0.0065   0.0317          0.0345
C1            100.2388      101.6385         -0.0280     0.9981    0.0076   0.0246          1.1019
C2            521.3383      520.3469          0.0058     1.0987    0.0045   0.0167          0.8306
C3           -200.0333     -201.4824          0.0293     0.9590    0.0112   0.0281          1.1515
Cp              0.5805        0.5558          0.0552          .    0.0246   0.0246          0.8712

Sample Sizes:
          Control Treated
All          3424    2576
Matched      1137    1137
Unmatched    2287    1439
Discarded       0       0

> lm_treat1 <- lm(Y ~ Z, data = dta_m, weights = weights)
> lmtest::coeftest(lm_treat1, vcov. = sandwich::vcovCL, cluster = ~subclass)

t test of coefficients:

            Estimate Std. Error  t value Pr(>|t|)    
(Intercept) -55.8991     1.7003 -32.8756   <2e-16 ***
Z             3.2060     2.3028   1.3922    0.164    
---

Why didn't CEM provide the right answer? CEM, like caliper matching, restricts the analysis to an area of common support, so it should recover the effect correctly. You analyzed the matched data incorrectly. The MatchIt vignette on estimating effects explains how to estimate effects after CEM; you must include the weights in the effect estimation. When we do this, we find an estimate with a confidence interval that contains 5, indicating that CEM is successful.

> lm_treat1 <- lm(Y ~ Z, data = dta_m, weights = weights)
> lmtest::coeftest(lm_treat1, vcov. = sandwich::vcovCL, cluster = ~subclass)

t test of coefficients:

            Estimate Std. Error  t value Pr(>|t|)    
(Intercept) -62.3837     3.0945 -20.1594   <2e-16 ***
Z             1.8940     2.9900   0.6335   0.5265    
---

Let's move on to the analysis of the data you simulated with the code in your question. Examining the balance of the covariates before matching, we see that the covariates are mostly balanced except for V1, which has an extreme imbalance. But more important is that the numbers of treated and control units are very close together, with a few more treated units than control units. This is not a situation in which nearest neighbor matching without replacement will succeed. Barely any matching is taking place; the matched sample is almost identical to the unmatched sample. If you instead use matching with replacement (i.e., replace = TRUE), balance is achieved and the effect estimate is correct. (To keep this answer from getting too long I'll let this be an exercise for the reader.)

I think your attempts to analyze these data suffered from a few problems: 1) You applied a blunt instrument instead of using the method most appropriate for the data, 2) You did not assess balance to see if your method was working, and 3) You analyzed the data incorrectly. Matching is a method that requires care and nuance; its strength is that you can tailor the method to incorporate substantive information about the subject matter and build trust in the reader that you have eliminated all bias due to confounding by the measured covariates. In order to realize these benefits, though, you have to do the matching in line with best practices as described in the MatchIt vignettes. It is critical to use the matching method most appropriate for the data at hand (described in the "Matching Methods" MatchIt vignette), ensure you have achieved balance after matching (described in the "Assessing Balance" MatchIt vignette), and estimated the effect correctly (described in the "Estimating Effects" MatchIt vignette). Hopefully taking a closer look at these documents will provide some clarity.

many thanks for your thoughtful reply. While I assumed I knew matching well, this problem and your reply made me realize how much of an art matching is rather than an exact science. For example, although, I tried many different calipers (e.g., 1 standard deviation and below), I never went below 0.1 and concluded that something is strange with my data. I also combined “replacement” but that obviously did not work propensity score matching if the caliper is rough. Yet replacement is key for CEM in this problem. — Adel, Aug 27 '21 at 18:35
Two small follow ups, (1) in the summary output "summary(mod_match, improvement = FALSE)", what is the column "distance" stand for? Is it the average distance but should th at not be the same for control? — Adel, Aug 27 '21 at 18:39
And (2) I am trying to replicate the problem in optmatch, like the following, but optimal matching has different mechanics.: ps.mod = glm(Z ~ C1 + C2 + C3 + C_p, data=imf.meta, family="binomial") ps.dist = optmatch::match_on(ps.mod, data = imf.meta) ps.match = fullmatch(caliper(ps.dist, width=0.001*sd(ps.dist@.Data)), data= imf.meta, remove.unmatchables=T, max.controls=1, tol = 0.005) — Adel, Aug 27 '21 at 18:40
`distance` is the propensity score. You can do the same thing using `MatchIt` with `method = "full"` as you can using `fullmatch` if you prefer `MatchIt`'s syntax. The tough part about `optmatch` is that extracting meaningful matching weights is not straightforward, which is why `MatchIt` does it for you. Remember you must incorporate those weights to get valid estimates. — Noah, Aug 27 '21 at 22:29

Matching is not recovering the true effect in simulated data

1 Answers1

Linked