1

Inverse propensity weighing involves a machine learning model that takes features and outputs the predicted probability that this person is in the sample. Let $w_i$ be the inverse of the output for the $i^{th}$ person in the sample. Then the inverse propensity prediction for the mean is $\frac{\sum w_iy_i}{\sum w_i}$ as the sums run over the sample, and where $y_i$ is the target variable for the $i^{th}$ participant in the sample.

My question is: what the common way to compute confidence intervals in this situation?

I'm a little confused about how little I find when I google this. In a sense inverse propensity is similar to Horvitz-Thompson (where the weights are known, and not estimated), for which there is an estimate of its variance... But I can't find a straight formulaic answer.

I assume that doing CI's with inverse propensity weights is very common in practice. What do most practitioners do?

(As a similar, but more involved, question: what it the proper way to do a two-sample hypothesis testing that the weighted means are different? The naive way of creating synthetic datasets where each individual is counted int(inverse propensity) times is a terrible idea: it will easy make non statistically significant results statistically significant. I'm sure this is done all the time by practicioners... What is the proper way to do it?)

Andrew NC
  • 309
  • 2
  • 7

0 Answers0