4

Background: weighted mean

In the context of survey statistics it so happens that a sample of respondents from a survey are fit some weights to adjust their answers to the general population. These weights are often fitted using inverse of estimated propensity scores (via logistic regression or other methods).

Let there be $n$ samples of the responses and their respective (estimated) weights: $y_1, ..., y_n$ and $y_1, ..., y_n$.

To estimate something such as the population mean, the weighted mean is employed, i.e.:

$$\bar y_w = \frac{\sum{w_i y_i}}{\sum{w_i}}$$

Question: what's behind this formula of the variance for the weighted mean? (assumptions/derivations?!)

What I'm searching for is an estimation of the variance of $\bar y_w$. I found the following proposed estimator:

$$ \widehat{\sigma_{\bar{y}_w}^2} = \frac{n}{(n-1)(\sum{w_i} )^2} \sum w_i^2(y_i - \bar{y}_w)^2 $$

It is from the paper "The standard error of a weighted mean concentration—I. Bootstrapping vs other methods." by Gatz, Donald F., and Luther Smith (1995) (pdf).

It states that this formula is an "approximate ratio variance". I've looked for the referenced papers from this paper (specifically, the book by Cochran from 1977 on sampling techniques), and couldn't find a detailed description of how this formula was created, and what are the underlying assumptions. E.g.: do the weights and outcome need to be uncorrelated? Should the outcome be i.i.d? (or just equal variance, or equal expectancy) I'm going to guess it did some sort taylor expansion, but without the details I cannot know for sure.

I would appreciate:

  1. Insights into this formula?
  2. Any other relevant/useful formulas for the variance of the weighted mean?
  3. Any references/explanations will be much appreciated.

Thanks upfront.

Tal Galili
  • 19,935
  • 32
  • 133
  • 195
  • 2
    For what it's worth, this variance is arrived at and presented on p. 247 of Sampling 2nd Ed, by Lohr. It doesn't answer your question but is an additional reference of interest. That whole chapter is of interest. – luke.sonnet Aug 11 '21 at 18:10

2 Answers2

3

You can get general answer to this question (and the specific answer) from just considering the variances of sums. Suppose there are $N$ individuals in the population and you sample $n$ of them. The $X_i$ are fixed (Bob's opinion is whatever it is, whether you measure it or not) but the sampling indicators are random ($R_{\textrm{Bob}}=1$ if you sampled Bob).

The population total is $T=\sum_{i=1}^N X_i$. Your estimator is $$\hat T=\sum_{i=1}^N R_iw_iX_i$$ Its variance is $$\mathrm{var}\left[\sum_{i=1}^N R_iw_iX_i \right]= \sum_{i,j=1}^Nw_iw_jX_iX_j\mathrm{cov}[R_i,R_j]$$ Now, that isn't any use because it depends on $X_i$ for unsampled $i$, but we can do a weighted estimate of the total, just like the weighted mean we started with: $$\widehat{\mathrm{var}}[\hat T]= \sum_{i,j=1}^NR_iR_jw_{ij}w_iw_jX_iX_j\mathrm{cov}[R_i,R_j]$$ where $1/w_{ij}$ is (an estimate of) the probability that both $i$ and $j$ are sampled (and $R_iR_j$ is $R_{ij}$ an indicator that means we have seen both i and j).

You could evaluate this for any precisely-specified sampling design, because you know the sampling probabilities.

Now make the approximation that the sampling is independent for different individuals (either $N$ is very large or $n$ isn't fixed and you're just sampling each individual independently). Only the $i=j$ terms remain and you get $$\widehat{\mathrm{var}}[\hat T]= \sum_{i=1}^NR_iw_{i}w_iw_iX_i^2\mathrm{var}[R_i]$$ and approximating the sampling probability $\pi_i$ by $1/w_i$, $$\widehat{\mathrm{var}}[\hat T]= \sum_{i=1}^NR_iw_{i}w_iw_iX_i^2w_i^{-1}(1-w_i^{-1})=\sum_{i\in\textrm{sample}}^Nw_i^2X_i^2(1-w_i^{-1})$$ That's the total. By the same arguments, the denominator of the mean, the estimated $N$, has variance $$\widehat{\mathrm{var}}[\hat N]= \sum_{i\in\textrm{sample}}^Nw_i^2(1-w_i^{-1})$$

Next, we decide to apply this to $X=Y-\bar Y_w$, and use the (Taylor series) approximation for the variance of a ratio $$\widehat{\mathrm{var}}\left[\bar Y_w \right]= \frac{T^2}{N^2}\left(\frac{\mathrm{var}[\hat T]}{E[\hat T]^2} -2\frac{\mathrm{cov}[\hat T, \hat N]}{E[\hat T]E[\hat N]} + \frac{\mathrm{var}[\hat N]}{E[\hat N]^2} \right)$$

At this point we note that the covariance term and the second variance term are of smaller order than the first variance term, and that $\hat T$ is unbiased for $T$, so that it simplifies to $\mathrm{var}[\hat T]/N^2\approx\mathrm{var}[\hat T]/(\sum w_i)^2$

This doesn't give you quite what you want (we've lost the $n/(n-1)$ and acquired a $(1-w_i^{-1})$, but doing the argument more carefully gives something closer.

Tal Galili
  • 19,935
  • 32
  • 133
  • 195
Thomas Lumley
  • 21,784
  • 1
  • 22
  • 73
  • ( for reference, a development of the taylor expansion of the variance of ratios: https://www.stat.cmu.edu/~hseltman/files/ratio.pdf ) – Tal Galili May 25 '21 at 05:29
  • Dear Thomas, what a beautiful answer you wrote, thank you! I read through it carefully, and believe I understood it. (I'll add some followup comments) – Tal Galili May 25 '21 at 05:54
  • Regarding the addition of $(1-1/w_i)$, I guess that since we expect $\pi_i$ to be very small (i.e.: n< – Tal Galili May 25 '21 at 06:07
  • Lastly, are there any references I can use to see more expansions on this? Specifically, I'm wondering what "standard" cases exists in which this formula "breaks" or should be done differently. – Tal Galili May 25 '21 at 06:18
  • 2
    My favourite reference for survey-type formula is "Model Assisted Survey Sampling" by Särndal, Swensson, and Wretman. They show where the $n/(n-1)$ comes from (it comes from negative correlation in the sampling indicators. – Thomas Lumley May 25 '21 at 23:20
  • after going over this more times, there is one thing I'm not sure about. You wrote: "we decide to apply this to =−¯, ". And I don't understand this part. I thought that X=Y, and that $\hat T / \hat N$ is the $\hat y_w$. Could you please explain? – Tal Galili Jun 08 '21 at 10:39
  • For reference: in "Model Assisted Survey Sampling" by Särndal, Swensson, and Wretman. This formula is given in page 182 (equation 5.7.4). The delta term is defined in page 36 (2.6.1, 2.6.2), and also page 43 (2.8.4). This is based on result 5.6.12 from page 181. And also on the taylor linearization from page 178 (results 5.6.5-5.6.7 - a big think). All are based on having the variance coming from the indicator random variables. – Tal Galili Jun 24 '21 at 20:28
  • p.s.: also in the book, we don't have the terms $n/(n-1)$, and we do have an extra $(1-1/w_i)$, but I suspect it's also based on having that approach 1 when N is large (although I didn't see it mentioned in the book). – Tal Galili Jun 24 '21 at 20:44
  • Lastly, to answer my own question regarding $X=Y - \hat Y$. The answer is that we're taking the taylor approximation of the ratios around the mean (i.e.: a in the taylor series is the weighted mean: https://en.wikipedia.org/wiki/Taylor_series#Definition). This is given in the book in page 173 (5.5.7). The results of this (finding the coefficients, leading to $Y - \hat Y$) is given in page 178 (5.6.5). Very cool result... – Tal Galili Jun 24 '21 at 20:52
  • For future reference: I've delved a bit into the archives and organized a bunch of learnings about this topic in Wikipedia. See here: https://en.wikipedia.org/wiki/Weighted_arithmetic_mean#Survey_sampling_perspective – Tal Galili Jun 28 '21 at 16:12
0

A bit nonrigorous, but define $z_i = w_i y_i$. Calculate the variance for the total, $\sum_i z_i$, using the usual formula. Then replace $z_i$ with $w_i y_i$ and notice that the total of $z_i$ is the weighted mean. This is the fastest/most intuitive way to get it -- you're just treating $w_i y_i$ as a single random variable and estimating its variance.

(The formula will differ by a normalizing constant $(\sum_i w_i)^2$, which is just there to make sure that the weights in the squared term inside the sum add up to 1.)

The lack of rigor comes from ignoring the variance introduced by dividing out the normalizing constant. To show this is negligible you'd need a Taylor expansion, but that doesn't add intuition. The post above provides more rigor; my goal is just to show why you should expect it to be true.