22

Unbiased weighted variance was already addressed here and elsewhere but there still seems to be a surprising amount of confusion. There appears to be a consensus toward the formula presented in the first link as well as in the Wikipedia article. This also looks like the formula used by R, Mathematica, and GSL (but not MATLAB). However, the Wikipedia article also contains the following line which looks like a great sanity check for a weighted variance implementation:

For example, if values {2,2,4,5,5,5} are drawn from the same distribution, then we can treat this set as an unweighted sample, or we can treat it as the weighted sample {2,4,5} with corresponding weights {2,1,3}, and we should get the same results.

My calculations give the value of 2.1667 for variance of the original values and 2.9545 for the weighted variance. Should I really expect them to be the same? Why or why not?

confusedCoder
  • 423
  • 1
  • 4
  • 7

1 Answers1

20

Yes, you should expect both examples (unweighted vs weighted) to give you the same results.

I have implemented the two algorithms from the Wikipedia article.

This one works:

If all of the $x_i$ are drawn from the same distribution and the integer weights $w_i$ indicate frequency of occurrence in the sample, then the unbiased estimator of the weighted population variance is given by:

$s^2\ = \frac {1} {V_1 - 1} \sum_{i=1}^N w_i \left(x_i - \mu^*\right)^2,$

However this one (using fractional weights) does not work for me:

If each $x_i$ is drawn from a Gaussian distribution with variance $1/w_i$, the unbiased estimator of a weighted population variance is given by:

$s^2\ = \frac {V_1} {V_1^2-V_2} \sum_{i=1}^N w_i \left(x_i - \mu^*\right)^2$

I am still investigating the reasons why the second equation does not work as intended.

/EDIT: Found the reason why the second equation did not work as I thought: you can use the second equation only if you have normalized weights or variance ("probability/reliability") weights, and it is NOT unbiased, because if you don't use "occurrences/repeat" weights (counting the number of times an observation was observed and thus should be repeated in your math operations), you lose the ability to count the total number of observations, and thus you can't use a correction factor.

So this explains the difference in your results using weighted and non-weighted variance: your computation is biased.

Thus, if you want to have an unbiased weighted variance, use only "occurrences/repeat" weights and use the first equation I have posted above. If that's not possible, well, you can't help it.

For more theoretical details, here is another post about unbiased weighted covariance with a reference about why we cannot unbias with probability/reliability type weights and a python implementation.

/EDIT a few years later: there is still some confusion as to why we cannot unbias probability/reliability weights.

First, to clarify, the difference between probability/reliability weights and repeat/occurrences weights is that probability/reliability weights are normalized, whereas repeat/occurrences weights are not, so you can get the total number of occurrences by just summing the latter but not the former. This is necessary to unbias because otherwise you lose the ability to know what I would call the statistical magnitude, what other calls polarization.

Indeed, it's like anything else in statistics: if I say that 10% of my subpopulation have X disease, what does it mean for the broader population? Well it depends on what is my subpopulation: if it's only 100 people, then my 10% figure doesn't mean much. But if it's 1 million people, then it may faithfully represent the whole population. Here it's the same, if we don't know the total N, we can't know how representative of the whole population our metric is, and hence we cannot unbias. Unbiasing is exactly the process of generalizing to the broader population.

gaborous
  • 635
  • 1
  • 8
  • 22
  • After reading and thinking a lot through this I still don't get an intuitive meaning or example of the term "reliability weights". Can you please elaborate a bit on that? – Peter Aug 31 '17 at 13:58
  • @Peter reliability weights are normalized weights, eg, bounded between 0 and 1 or -1 and 1. They represent a frequency (eg, 0.1 means that this sample was seen 10% of the time compared to all other samples). I did not invent the term, it can be found in publications. For repeat weights it is the opposite, each weight represent the number of occurences, the cardinality (eg, 10 if the sample was observed 10 times). – gaborous Sep 01 '17 at 14:06
  • This is confusing because what you call repeat weights is often also called _frequency weights_, but I think I got the difference. It depends on normalization, right? – Peter Sep 03 '17 at 09:33
  • No, frequency weights is an alternative name for reliability weights. For repeat weights, it's the number of occurences, not the frequency. With repeat weights, there is no normalization at all, that's the point: as long as you normalize your weights, you lose the base frequency, so you cannot totally unbias your calculations. The only way is to keep the total number of occurences. If you really want to use frequency weights, I think if you store beforehand the total N number of occurences you can convert back and forth to repeat weights by multiplying frequency weights by N, then that's OK. – gaborous Sep 06 '17 at 18:45
  • And if your weights are 1/variance weights, how would you call those? Would that be "reliability weights" then? – Tom Wenseleers Jun 18 '19 at 11:10
  • @TomWenseleers yes reliability weights because you lose the ability to do sum(weights) to get the unbiased N (total of all weights) value, which is necessary to do any *really* unbiased calculation. – gaborous Aug 11 '19 at 01:19
  • A few years later: so yes the difference between frequency/reliability weights and repeat/occurrences weights is that frequency/reliabily weights are normalized, whereas repeat/occurrences weights are not, so you can get the total number of occurrences by just summing the latter but not the former. This is necessary to unbias because otherwise you lose the ability to know what I would call the statistical magnitude, what other calls [polarization](https://stats.stackexchange.com/questions/61225/correct-equation-for-weighted-unbiased-sample-covariance/61298). – gaborous Jan 31 '21 at 05:00
  • Any references for these terms? Because it sounds like it should be reliability weights vs frequency/repeat/occurrences weights. – Tal Galili Jun 06 '21 at 10:12
  • @TalGalili It's been a very long time, so I don't have the references around unfortunately. But for sure it's not your typology: frequency represent a fraction/probability, occurrences/repeat weights represent a count. Reliability is debatable but I clearly remember it being used as a synonym of frequency weights, but maybe the paper(s) were mistaken. Anyway, as long as you differentiate these two types of weights, and unbias them accordingly, it doesn't matter much how you call them IMHO. – gaborous Sep 18 '21 at 12:49
  • 1
    I agree with @TalGalili that frequency weights and repeat/case weights are the same thing in the most common terminology. See for example https://stats.oarc.ucla.edu/other/mult-pkg/faq/what-types-of-weights-do-sas-stata-and-spss-support/, https://www.stata.com/help.cgi?weight, and https://en.wikipedia.org/wiki/Weighted_arithmetic_mean#Frequency_weights. – Milan Bouchet-Valat Jan 20 '22 at 21:44
  • @MilanBouchet-Valat In the link you provided, they use "probability weights" for weights where the total count is lost, versus "frequency weights" for what I called "repeat" weights above. How the weights are called does not matter much IMHO, feel free to use the terminology you want. I used the terminology I read in a paper at the time (I think in machine learning, hence the "reliability" terminology), I am not clinging to any particular terminology, my point (and the issue I had at the time) was how to completely unbias weights, and I found out the type matters. – gaborous Jan 28 '22 at 00:03