Which is a better estimator, averaged functions vs. A function of an average?

Question

Problem:

Assume that we want to estimate $f(\theta)$ with a pre-specified strictly increasing function $f$ and a parameter $\theta$.

Let $\hat{\theta}_1$ and $\hat{\theta}_2$ be unbiased estimators for $\theta$. My question is to compare which estimator is better for $f(\theta)$:

$f\left(\frac{\hat{\theta}_1 + \hat{\theta}_2}{2}\right)$, or
$\frac{f(\hat{\theta}_1) + f(\hat{\theta}_2)}{2}$.

My approach:

My approach is taking an expectation on both based on Taylor's expansion of $f$, provided that they are well-defined.

$$\mathbb{E}f\left(\frac{\hat{\theta}_1 + \hat{\theta}_2}{2}\right) = f(\theta) + \frac{f''(\theta)}{2}\text{Var}\left(\frac{\hat{\theta}_1 + \hat{\theta}_2}{2}\right) + \ldots $$
$$\mathbb{E} \left[\frac{f(\hat{\theta}_1) + f(\hat{\theta}_2)}{2} \right] = f(\theta) + \frac{f''(\theta)}{2}\left(\frac{\text{Var}(\hat{\theta}_1) + \text{Var}(\hat{\theta}_2)}{2}\right) + \ldots $$

Here, the first-order disappears in each expectation because $\hat{\theta}_1$ and $\hat{\theta}_2$ are unbiased.

Let's further assume $f''(\theta)>0$. Then, since $$\label{varineq} (0 \le) \text{Var}\left(\frac{\hat{\theta}_1 + \hat{\theta}_2}{2}\right) \le \frac{\text{Var}(\hat{\theta}_1) + \text{Var}(\hat{\theta}_2)}{2}, $$ the first estimator (i.e. a function of averages) seems less distant from $f(\theta)$, as long as the remaining terms (order $\ge 3$) are ignored. The last inequality comes from a simple observation: $$ \frac{\text{Var}(\hat{\theta}_1) + \text{Var}(\hat{\theta}_2)}{2} - \text{Var}\left(\frac{\hat{\theta}_1 + \hat{\theta}_2}{2}\right) = \dfrac{1}{4}\left( \text{Var}(\hat{\theta}_1) + \text{Var}(\hat{\theta}_2) - 2\text{Cov}\left(\hat{\theta}_1, \hat{\theta}_2\right) \right) = \dfrac{\text{Var}(\hat{\theta}_1 - \hat{\theta}_2)}{4} \ge 0. $$

Question:

I wonder if this is logical and there is another way to justify a better estimator.

Edit:

Let's restrict the class of $f$ by the strictly increasing ones.

An example would help, e.g.: Suppose someone gives you the mean $\theta_1$, median $\theta_2$, and size $n$ of a sample drawn from a normal parent $N(\mu,\sigma)$. What are your best estimates for $\mu$ and $\mu^2$? It should be possible to work this out using the joint distribution at https://math.stackexchange.com/questions/477115/asymptotic-correlation-between-sample-mean-and-sample-median -- note the answers may well be weighted averages and the covariance may be significant. — Matt F., Mar 02 '20 at 14:51
Where do you obtain the last inequality about variances? It's not generally true. Counterexamples include the cases $\hat\theta_2 = -\hat\theta_1$ (for which the right hand side is zero) and when the $\hat\theta_i$ are independent (for which the right hand side is half the left hand side). — whuber, Mar 02 '20 at 14:54
@whuber Oops, sorry for that. I used the opposite inequality. Now it is edited. — inmybrain, Mar 02 '20 at 19:43
Bear in mind that if the function represents something other than a linear transformation of the inputs, the two will yield [different] answers to different questions. To get a sense of that, it may help to read my answer here: [Difference between generalized linear models & generalized linear mixed models](https://stats.stackexchange.com/a/32421/7290). — gung - Reinstate Monica, Mar 02 '20 at 19:57
@gung-ReinstateMonica Thanks! Then, let's keep our focus on the strictly increasing $f$ (see my edit). Does this make sense to you? — inmybrain, Mar 02 '20 at 20:19
Why would unbiasedness matter when transforms do not preserve unbiasedness? — Xi'an, Mar 02 '20 at 21:03
@Xi'an Because $\theta$ only can be estimated consistently. It is implicitly assumed that an unbiased estimator for $f(\theta)$ cannot be directly found. — inmybrain, Mar 02 '20 at 21:11
@MattF. Thanks for your comments and the example. It seems more appropriate to say I am more interested in estimating $\mu^2$ using two (possibly correlated) unbiased estimators of $\mu$. — inmybrain, Mar 04 '20 at 01:43

Matt F. · Answer 1 · 2020-03-04T10:24:45.750

Suppose $\theta_1$ and $\theta_2$ are unbiased estimators of $\mu$ with a bivariate normal distribution.

Let $\sigma_1 = k \sigma_2$, with $k<1$, so $\theta_1$ is the more efficient estimator.

Let the correlation be $r$, with $-1<r<1$.

We seek the least-variance linear estimates for $\mu$ and $\mu^2$. The results are:

\begin{align} \min_{a,b}Var[\mu-(a\theta_1+b\theta_2)] \text{ has }&\mu \sim \frac{(1-kr)\theta_1+(k^2-kr)\theta_2}{1-2kr+k^2}\\ \min_{c,d,e}Var[\mu^2-(c\theta_1^2+d\theta_1\theta_2+e\theta_2^2)] \text { has }&\mu^2 \sim \left(\frac{(1-kr)\theta_1+(k^2-kr)\theta_2}{1-2kr+k^2}\right)^{\!2} \end{align}

So for functions $f$ that are quadratic or well approximated by quadratics, if $\frac{\theta_1+\theta_2}{2}$ is the best linear estimate for $\mu$, then $f\left(\frac{\theta_1+\theta_2}{2}\right)$ is the best linear estimate for $f(\mu)$. In general an appropriate weighted average inside $f$ works even better.

Example: $\theta_1$ is the mean of a sample from a normal parent with mean $\mu$, and $\theta_2$ is the median from the same sample. Then $k=r=\sqrt{2/\pi}$, and the above formula shows that the best estimate for $\mu$ is $\theta_1$, and the best estimate for $\mu^2$ is $\theta_1^2$, i.e. ignoring the median.

Example: $\theta_1$ is the mean of a sample from a normal parent with mean $\mu$, and $\theta_2$ is the median of a different sample of the same size. Then $k=\sqrt{2/\pi}$, $r=0$, so the best estimate for $\mu$ is $(\pi\theta_1+2\theta_2)/(\pi+2)$, or $.611\theta_1 + .389\theta_2$, and the best estimate for $\mu^2$ is the square of that.

Example: $\theta_1$ is the mean of a sample from a normal parent with mean $\mu$, and $\theta_2$ is the mean of a different sample of the same size. Then $k=1$, $r=0$, so the best estimate for $\mu$ is $(\theta_1+\theta_2)/2$, and the best estimate for $\mu^2$ is $\left(\frac{\theta_1+\theta_2}2\right)^{\!2}$.

Thanks for your answer! I was more curious about general conditions of $f$ or the estimators, but this also helps. — inmybrain, Mar 04 '20 at 14:10
What functions $f$ do you have in mind that aren't well approximated by quadratics, or what estimators do you have in mind that aren't distributed normally? — Matt F., Mar 04 '20 at 14:59
@inmybrain, if this is worthy of thanks, it is also worth an upvote, or a clearer specification of what situations you care about that it doesn't cover? — Matt F., Mar 05 '20 at 17:28

Which is a better estimator, averaged functions vs. A function of an average?

Problem:

My approach:

Question:

Edit:

1 Answers1