Background: weighted mean
In the context of survey statistics it so happens that a sample of respondents from a survey are fit some weights to adjust their answers to the general population. These weights are often fitted using inverse of estimated propensity scores (via logistic regression or other methods).
Let there be $n$ samples of the responses and their respective (estimated) weights: $y_1, ..., y_n$ and $y_1, ..., y_n$.
To estimate something such as the population mean, the weighted mean is employed, i.e.:
$$\bar y_w = \frac{\sum{w_i y_i}}{\sum{w_i}}$$
Question: what's behind this formula of the variance for the weighted mean? (assumptions/derivations?!)
What I'm searching for is an estimation of the variance of $\bar y_w$. I found the following proposed estimator:
$$ \widehat{\sigma_{\bar{y}_w}^2} = \frac{n}{(n-1)(\sum{w_i} )^2} \sum w_i^2(y_i - \bar{y}_w)^2 $$
It is from the paper "The standard error of a weighted mean concentration—I. Bootstrapping vs other methods." by Gatz, Donald F., and Luther Smith (1995) (pdf).
It states that this formula is an "approximate ratio variance". I've looked for the referenced papers from this paper (specifically, the book by Cochran from 1977 on sampling techniques), and couldn't find a detailed description of how this formula was created, and what are the underlying assumptions. E.g.: do the weights and outcome need to be uncorrelated? Should the outcome be i.i.d? (or just equal variance, or equal expectancy) I'm going to guess it did some sort taylor expansion, but without the details I cannot know for sure.
I would appreciate:
- Insights into this formula?
- Any other relevant/useful formulas for the variance of the weighted mean?
- Any references/explanations will be much appreciated.
Thanks upfront.