This question uses the derivations found here.
The short version
Consider a regression model. If the error variance is a known function of the data (rather than a constant), under what conditions can we draw conclusions about the OLS estimates?
The long version
Notation
Denote:
- $X = \left[\matrix{ X_{11} & \dots & X_{1p} \\ \vdots & \ddots & \vdots \\ X_{n1} & \dots & X_{np} \\ }\right]$
- $\beta = \left(\beta_1, \dots, \beta_p\right)$
- $Y = \left(Y_1, \dots, Y_n\right)$
- $\epsilon = \left(\epsilon_1, \dots, \epsilon_n\right)$
Assume:
- $Y= X \beta + \epsilon$
- $\operatorname{E}\left(\epsilon\,|\,X\right)=0$ so that $E(Y\,|\,X) = X \beta$
- $\operatorname{Var}\left(\epsilon\right)$ is diagonal.
- $X$ is deterministic so we can drop the "$\left(\cdot\,|\,X\right)$".
Define:
- $\hat{\beta}$: the OLS estimate of $\beta$ in the model $Y=X \beta + \epsilon$
- $\tilde{\beta}$: an arbitrary competing estimate $\tilde{\beta} = A'Y$
- $B = X \left(X'X\right)^{-1}$
Background
We derive $\operatorname{Var}\left(\hat{\beta}\right)$ by assuming that $\operatorname{E}\left(\epsilon\epsilon'\right) = \sigma^2 I$. Then we can conclude that: $$\begin{align} \operatorname{Var}\left(\hat{\beta}\right) &= \left(X'X\right)^{-1} X' \underbrace{\operatorname{E}\left(\epsilon\epsilon'\right)}_{=\sigma^2 I} X \left(X'X\right)^{-1} \\ &= \sigma^2 \left(X'X\right)^{-1} X' X \left(X'X\right)^{-1} \\ &= \sigma^2 \left(X'X\right)^{-1} \\ \end{align}$$
This in turn is used to show that $\hat{\beta}$ is efficient among unbiased estimators: $$\begin{align} \operatorname{Var}\left(\tilde{\beta}\right) - \operatorname{Var}\left(\hat{\beta}\right) &= \sigma^2 A'A - \sigma^2 \left(X'X\right)^{-1} \\ &= \sigma^2 A' M A \\ &\geq 0 \end{align}$$
The question
What if $\operatorname{Var}\left(\epsilon\right) = h\left(X\right)$ for a known function $h$?
This leaves us with $$ \operatorname{Var}\left(\hat{\beta}\right) = B' h\left(X\right) B $$ which is nice, but $$ \operatorname{Var}\left(\tilde{\beta}\right) - \operatorname{Var}\left(\hat{\beta}\right) = A' h\left(X\right) A - B' h\left(X\right) B $$ doesn't tell us anything.
What conditions on $h$ will allow us to learn something about $\operatorname{Var}\left(\hat{\beta}\right)$ and $\operatorname{Var}\left(\tilde{\beta}\right) - \operatorname{Var}\left(\hat{\beta}\right)$? Or (as per AdamO's comment) about the relative efficiency?
For instance, this reduces to generalized least squares when $h(X) = X' \Omega X$. But I'm mainly still interested in the case (as per the assumptions at the beginning) where $h(X)$ is diagonal.
Similarly, consider $$ h\left(X\right) = \left[\matrix{f(X_1 \cdot \beta_1) & \dots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \dots & f(X_p \cdot \beta_p) \\}\right] $$
where $f(z) = z$ (implied if $\epsilon$ is Poisson) or $f(z) \propto z^2$ (implied if $\epsilon$ is lognormal or gamma). This looks suspiciously like iteratively reweighted least squares.