2

I wouldn't be surprised if this question has already been asked, as it sounds like a standard bookwork result. However, I'm not sure I know the language to describe it, and when I type in the the title it doesn't find in similar questions. Feel free to delete it if you can point me to a duplicate.

Suppose I have a random variable $Y \sim \psi(w)$ that's drawn from a distribution $\psi$ parameterised by a parameter $w$ (some arbitrary distribution), and all I know is the conditional expectation $E(Y \mid w)$ which we can label as the function $f(w)$. $f$ is not linear.

If I have a sample of N realised variates of $Y$: $\{y_i\}_{i=1}^{N}$, I can see two ways to estimate $w$.

  1. Take the mean of $y_i$ and invert that: $$\hat{w} = f^{-1}\left( \frac{1}{N} \sum_{i=1}^N y_i \right)$$
  2. Take the mean of the inverses of $y_i$: $$\tilde{w} = \frac{1}{N} \sum_{i=1}^N f^{-1}\left( y_i \right)$$

Which of the two is correct? They can't be the same, because $f$ is nonlinear (so by Jensen's inequality they must be differen, correct me if I'm wrong).

My first guess would be 1 because for large N, $\frac{1}{N} \sum_{i=1}^N y_i \approx E(Y \mid w)$ and $f^{-1}(E(Y \mid w)) = w$ but I'm not sure if that's a valid argument to make and if there is a similar argument for 2. Is there some general result for the expectation of the second estimator?

Marses
  • 291
  • 2
  • 11
  • It's not about being correct. If you're a Frequentist, then it's about which long term properties are desirable for you. – JimB Jul 21 '21 at 20:24
  • In that case what are the pros and cons of each? Plus wouldn't a frequentist just say that correct = unbiased? – Marses Jul 22 '21 at 11:44
  • I hope no Frequentist would say that. You've asked a question that would require much more space than available here. Unbiasedness is a reasonable (not "correct" or "best") property but not completely at the expense of the precision of the estimator. – JimB Jul 22 '21 at 15:34
  • Approach 1 makes sense. It is the substitution estimator for $w$. If $f^{-1}$ is differentiable then consistency and the limit distribution follow from the delta method. I don't think approach 2 makes any sense at all. $Y = E[Y|w] + \epsilon$ has mean-zero noise and there is no guarantee that $f^{-1}(Y)$ will behave like $f^{-1}(E[Y|w]) + \varepsilon$ with $\varepsilon$ mean-zero unless $f^{-1}$ is linear. –  Jul 22 '21 at 20:02
  • As fix to the approach 2, If $f$ is monotone and thus invertible, you could compute $median(f^{-1}(Y_i))$ which equals $f^{-1}(median(Y_i))$. If $Y$ is symmetric (so its median equals its mean. Then, this will give you an estimator of $w$. –  Jul 22 '21 at 20:16

1 Answers1

0

If the function is invertable and the result is not too complex, you could use your approach #2 with a small adjustment: Don't just take the mean value but look at the distribution of computed results. It will have a mode (expected value) and you can directly look at the confidence bounds.

If the function is not invertable, it might be easy to get the correct value using a non-linear optimization. You could do a grid-search, a genetic algorithm or any other optimization to approach a value for omega that reliably predicts the values of y you have.