Determine weights from selection probabilities in a "with replacement" sampling scheme

Question

Possible Duplicate:
Derive househould weights from a uniformly distributed person sample

EDIT: Essentially, I have answered the question myself in the linked question: Derive househould weights from a uniformly distributed person sample. I am keeping the text below for reference.

For a finite population $U$, I know the probability $\pi_{i,k}$ of sampling case $i$ exactly $k$ times, where $k$ is limited by $m_i$. In my case,

$$\pi_{i,k} = \binom{m_i}{k} p^k(1-p)^{m_i - k}$$

for some $p$. I also have a sample of size $n$ (taken with replacement), but the total size $N$ of the population is unknown.

I am trying to derive a reweighting $w_i$ of the sample that will give unbiased and/or minimum variance estimates of totals for, say, an attribute $x_i$. (Estimating $N$ is a special case.) What would be the correct weights for my sample? Is it advisable to remove the duplicates, given that I can reliably identify them? EDIT: What if duplicates cannot be reliably identified?

My initial guess is $w_i = 1/(1 - \pi_{i, 0})$, but I don't see how to back this with mathematical argumentation. Also, I'd appreciate any hints on literature concerning this topic. (Perhaps this very problem has been treated in a previous paper?)

See Derive househould weights from a uniformly distributed person sample for a more practical description of the problem. The question asked here is different.

Perhaps Calculating % unsampled in sampling with replacement is related.

You should start with a good sampling book: Cochran; Kish; Hansen, Hurwitz & Madow if you can get it; Chaudhuri & Stenger. None of them would address your question directly, as it is kinda obvious to a sampling statistician, but you will get a feeling of how these sampling and inverse probability weighting procedures work. — StasK, Apr 12 '12 at 20:24
Thank you for the references, I'll check if our library has them. -- If it's obvious to you, would you consider providing a hint? Also, [this estimation question](http://stats.stackexchange.com/q/9240/6432) seems to address the same problem, can you confirm this? In that case, the question should be closed as dupe. — krlmlr, Apr 12 '12 at 20:33
Households are different units; if you want to analyse the data at HH level, you have to redefine your units, which, at the data set level, would appear like getting rid of the duplicates. However, the individual level characteristics like age or gender don't make sense for the household, so you will not only be deleting rows, but you will also be deleting columns, for the data set to make any sense. Inverse probability weighting is known as Horvitz-Thompson estimator, and that would be one of the first mathematical concepts you will encounter in these books. — StasK, Apr 12 '12 at 20:47
Right. I delete rows and ignore columns. (Do I have to delete the duplicates? What if I cannot?) Horvitz-Thompson can be used to estimate the mean, but how about the population total? -- This I have learned "the hard way". How can I help you help me more? :-) — krlmlr, Apr 12 '12 at 21:16
Many thanks for providing the references. I was able to find the answer to my problem in Cochran (1972), Section 11.9.. Also, I don't think anymore that this question is different from the linked one, I am going to update [my answer there](http://stats.stackexchange.com/a/26340/6432). — krlmlr, Apr 13 '12 at 11:06
Horvitz-Thompson is the estimator of the total. The mean is estimated as a ratio of two estimators of the total: the numerator is the variable of interest, and the denominator is 1. The total in the denominator may be known, as in your case with knowledge of the total # of people in the register, or unknown, as in your case of counting the households. Please close this question or post your own answer -- I think you get a badge of some kind for answering your own question. — StasK, Apr 13 '12 at 13:15
@StasK: I can only delete the question, not close it. Could you close it but keep it for reference, please? — krlmlr, Apr 13 '12 at 13:50

Determine weights from selection probabilities in a "with replacement" sampling scheme

0 Answers0