Proof of the Horvitz-Thompson result

Question

I'm trying to find an elementary derivation (proof was the wrong word) of the Horvitz-Thompson estimator: $$ \hat{Y}=\sum_{i\in s}\frac{y_{i}}{\pi_{i}} $$ where $i \in s$ if and only if unit $y_{i}$ a sampled unit is in a sample of interest, and $\pi_{i}$ is the probability that $y_{i}$ is in the sample of interest.

My apologies if I've messed up the definitions; I'm rather new to Sampling theory.

Can someone point me in the right direction? Many thanks.

What do you want to "prove"? It's obviously an estimator. Do you want to show it's unbiased as an estimator of the mean? Do you want to compute its moments? Something else? — whuber, Feb 10 '17 at 14:03
Mary Thompson's book? Sharon Lohr's book? Sarndal, Swensson and Wretman? — StasK, Feb 10 '17 at 22:51
@StudentT I've been wrestling with the paper for a while now. I'm not sure what some of the steps mean. — SubOptimal, Feb 13 '17 at 00:05
It was a poorly put question, @whuber. I've amended it, requesting a derivation rather than a proof. — SubOptimal, Feb 13 '17 at 00:06
@StudentT Your request is counter to how this site works. Because this question has not been answered, the best procedure is to edit it rather than start a new one. SubOptimal, thank you for the edits: your request is now clear. — whuber, Feb 13 '17 at 00:18
Thank you for your suggestions, @StasK. Unfortunately, neither Sarndal et. al. or Thompson seem to have derivations. I'm waiting for a copy of Lohr. — SubOptimal, Feb 13 '17 at 00:51

score 3 · Accepted Answer · answered May 15 '17 at 12:49

Let $\pi_{i}$ be the probability that unit $U_{i}$ is included in sample of size $n$ by a without replacement sampling procedure. Now, define a random variable $t_{i}$, for $i=1,2,\cdots N$, by \begin{equation} t_{i}=\begin{cases} 1, & U_{i}\in s\\ 0, & \text{ otherwise. } \end{cases} \end{equation} Since a without replacement sampling procedure gives rise to $n$ distinct units, it is clear that \begin{equation} \sum_{i=1}^{N}t_{i}=n, \text{ and } \end{equation} \begin{equation} E(t_{i})=\pi_{i}. \end{equation}

A general linear function of the sample values can be written as \begin{eqnarray} T &=& \sum_{i=1}^{n}c_{i}y_{i},\qquad \text{ or }\\ T &=& \sum_{i=1}^{N}t_{i}c_{i}y_{i} \end{eqnarray} where $c_{i}$ is a constant attached to the unit $U_{i}$ whenever it is selected into the sample and Considering expectation of $T$, we get, \begin{equation} E(T)= \sum_{i=1}^{N}\pi_{i}c_{i}y_{i} \end{equation} For $T$ to be an unbiased estimator of the population total $\sum_{i=1}^{N}Y_{i}$, the constant $c_{i}$ should equal to $1/\pi_{i}$. Thus an unbiased estimator for the population total, as suggested by Horvitz-Thompson, is given by, \begin{eqnarray} \hat{Y}_{HT} &=& \sum_{i=1}^{N}t_{i}\left(\dfrac{Y_{i}}{\pi_{i}}\right)\qquad \text{ or }\\ \hat{Y}_{HT} &=& \sum_{i=1}^{n}\dfrac{y_{i}}{\pi_{i}} \end{eqnarray}

Thank you very much, @L.V.Rao. I shall study that with interest! — SubOptimal, May 17 '17 at 00:37

Proof of the Horvitz-Thompson result

1 Answers1