Propensity Score Matching - Multiple observations in control group

Question

I am quite new to propensity score matching and have the following issue in my data. I have two groups (Treatment vs control) this could be for eg. two types of loyalty program at a retail store. In my case, individuals are presented with an option to join one of two programs and then their shopping expenditure per visit is tracked.

Since individuals select one of the two programs, there exists a form of selection bias in my data. I was looking at Propensity score matching (psm) with replacement to model:

a) Whether individual joins Treatment program (=1, and therefore whether individual joins control program =0).

b) Compute the weights.

c) Use the weights obtained from psm in the model for expenditure per visit.

Now, the issue that I face is that individuals in either group can visit the store multiple times. I do not want to use average expenditure per visit or treat my data as a panel data. How should the weights be computed from step (b) be divided among multiple observations in this case?

For example, suppose individual A,B and C are matched with individual D and the corresponding weights are computed (A,B,C in treatment vs D in control or vice versa). Suppose the weight for individual D = X. Now suppose A, B and C make only one trip whereas individual D makes multiple trips. Should the weight that is computed for individual D (=X) be divided among the three observations for D i.e. should the weight of each visit for D = X/3? Or should all three observations for D be allotted a weight of X? Or some other solution for that matter. I would appreciate it if anyone could point me to some resources (Literature from statistics etc.) which talks about a similar issue.

I am using the MatchIt package in R to implement the model. Any help in dealing with the problem in R will be appreciated.

I found some discussion links related to this issue:

a) Propensity score matching with panel data - Matching with panel data (not my case)

b) https://www.statalist.org/forums/forum/general-stata-discussion/general/590352-propensity-score-matching-on-hospital-data-with-multiple-observations-per-person -

Please let me know if any part of the question is unclear. Thank you in advance.

EDIT

Brief Description of the data

The dataset contains information on individual spend (in dollars) on shopping during every visit at a retail store. Individuals can select to be a part of Program 1 or Program 2 which are the two available loyalty program (Program 1 existed before, whereas Program 2 was a newly introduced program, hence I call them Control and Treatment here). The dataset contains information on about 7000 unique individuals and roughly with roughly 4500 in the Treatment group and about 2500 in the control group. All together, I have information on about 10000 store visits during a period of 3 days with some individuals making repeat purchases (500 individuals in the control group accounting for 3500 store visits). The primary dependent variable is the shopping expenditure (in dollars) every visit - continuous variable and the independent variable includes demographic and socio-economic factors like income, household size etc (about 20 variables in total). Here I am not interested in examining purchase incidence or timing and I would not like to remove observations.

You gave almost no background data. Is shopping $ the outcome? What's its distribution? How many potential predictors do you have? What is the number of subjects in the sample — Frank Harrell, Mar 05 '18 at 13:24
I have edited my original question and added a section with more details about the data @FrankHarrell. Please let me know if you need further clarification Dr. Harrell. — Prometheus, Mar 05 '18 at 13:56
Thanks for the new information. This seems to be a low dimensional data situation where propensity score is not needed at all. Propensity is used when you need data reduction, i.e., reduce the dimensionality of the predictors without using Y. This appears to be more for regular longitudinal regression modeling taking correlation structure into account - generalized least squares or mixed effects model, taking into account the possibly strange distribution of Y. Not sure but you may want to total all purchases per individual and go simple, allowing for zero $ using ordinal regression. — Frank Harrell, Mar 05 '18 at 15:24
Although what you said is true, isn't propensity score matching (PSM) also used to address selection bias when random assignment is not available? My understanding is that PSM is used in quasi-experiments to create two comparable groups (Treatment and Control) to account for self selection. Or is the primary objective of PSM to solely reduce dimensionality? For example : http://pareonline.net/pdf/v20n13.pdf — Prometheus, Mar 05 '18 at 19:11
PSM is not able to adjust for confounders that are not measured. It is an indirect adjustment method that can lose efficiency and hide interactions. Direct adjustment is preferred if the effective sample size is adequate (covariate adjustment; stratifying if covariates are not continuous). See Chapter 17 of http://fharrell.com/doc/bbr.pdf . One way to get less interested in PSM is to realize that you need to deal with non-collapsibility of the odds ratio by covariate adjusting for very important predictors anyway. Propensity ignores outcome heterogeneity, concentrating on baseline dists. — Frank Harrell, Mar 05 '18 at 20:49

Propensity Score Matching - Multiple observations in control group

0 Answers0

Linked