5

I'm trying to find an elementary derivation (proof was the wrong word) of the Horvitz-Thompson estimator: $$ \hat{Y}=\sum_{i\in s}\frac{y_{i}}{\pi_{i}} $$ where $i \in s$ if and only if unit $y_{i}$ a sampled unit is in a sample of interest, and $\pi_{i}$ is the probability that $y_{i}$ is in the sample of interest.

My apologies if I've messed up the definitions; I'm rather new to Sampling theory.

Can someone point me in the right direction? Many thanks.

SubOptimal
  • 53
  • 5
  • 2
    Have you checked the original paper? – SmallChess Feb 10 '17 at 08:10
  • 2
    What do you want to "prove"? It's obviously an estimator. Do you want to show it's unbiased as an estimator of the mean? Do you want to compute its moments? Something else? – whuber Feb 10 '17 at 14:03
  • Mary Thompson's book? Sharon Lohr's book? Sarndal, Swensson and Wretman? – StasK Feb 10 '17 at 22:51
  • @StudentT I've been wrestling with the paper for a while now. I'm not sure what some of the steps mean. – SubOptimal Feb 13 '17 at 00:05
  • It was a poorly put question, @whuber. I've amended it, requesting a derivation rather than a proof. – SubOptimal Feb 13 '17 at 00:06
  • @StudentT Your request is counter to how this site works. Because this question has not been answered, the best procedure is to edit it rather than start a new one. SubOptimal, thank you for the edits: your request is now clear. – whuber Feb 13 '17 at 00:18
  • @whuber ok sorry my mistake. Please edit and we will help. – SmallChess Feb 13 '17 at 00:19
  • Thank you for your suggestions, @StasK. Unfortunately, neither Sarndal et. al. or Thompson seem to have derivations. I'm waiting for a copy of Lohr. – SubOptimal Feb 13 '17 at 00:51

1 Answers1

3

Let $\pi_{i}$ be the probability that unit $U_{i}$ is included in sample of size $n$ by a without replacement sampling procedure. Now, define a random variable $t_{i}$, for $i=1,2,\cdots N$, by \begin{equation} t_{i}=\begin{cases} 1, & U_{i}\in s\\ 0, & \text{ otherwise. } \end{cases} \end{equation} Since a without replacement sampling procedure gives rise to $n$ distinct units, it is clear that \begin{equation} \sum_{i=1}^{N}t_{i}=n, \text{ and } \end{equation} \begin{equation} E(t_{i})=\pi_{i}. \end{equation}

A general linear function of the sample values can be written as \begin{eqnarray} T &=& \sum_{i=1}^{n}c_{i}y_{i},\qquad \text{ or }\\ T &=& \sum_{i=1}^{N}t_{i}c_{i}y_{i} \end{eqnarray} where $c_{i}$ is a constant attached to the unit $U_{i}$ whenever it is selected into the sample and Considering expectation of $T$, we get, \begin{equation} E(T)= \sum_{i=1}^{N}\pi_{i}c_{i}y_{i} \end{equation} For $T$ to be an unbiased estimator of the population total $\sum_{i=1}^{N}Y_{i}$, the constant $c_{i}$ should equal to $1/\pi_{i}$. Thus an unbiased estimator for the population total, as suggested by Horvitz-Thompson, is given by, \begin{eqnarray} \hat{Y}_{HT} &=& \sum_{i=1}^{N}t_{i}\left(\dfrac{Y_{i}}{\pi_{i}}\right)\qquad \text{ or }\\ \hat{Y}_{HT} &=& \sum_{i=1}^{n}\dfrac{y_{i}}{\pi_{i}} \end{eqnarray}

L.V.Rao
  • 1,909
  • 15
  • 23