2

Consider the posterior distribution $p(\theta|x)$. We aim to find a "good" estimate of the random variable $\theta$. The Bayes risk associated with the loss function $L(\hat{\theta}, \theta)$ is denoted $E(L(\hat{\theta} , \theta)|x)$.

For the mean square error loss function $L(\hat{\theta} , \theta) = (\hat{\theta} - \theta)^2$, a well known result is that $\hat{\theta}=E(\theta|x)$ minimises the Bayes risk (minimum mean square estimator).

But what about the multidimensional case, when we aim to estimate a random vector $\theta=(\theta_1 ... \theta_n)^T$? With $L(\hat{\theta} , \theta) = \sum_i(\hat{\theta} - \theta)^2$, is the best estimate $\hat{\theta}_i = E(\theta_i|x)$?

If it is indeed the case, this is counter-intuitive to me because this estimate would be characterised solely by the posterior marginals, whereas the Maximum A Posterior (MAP) estimator depends on the joint posterior marginal.

Finally: same question for the cost function $L(\hat{\theta} , \theta) = |\hat{\theta} - \theta|$. In one dimension, the minimum is the median of $p(\theta|x)$. In multiple dimensions, with $L(\hat{\theta} , \theta) = \sum_i|\hat{\theta_i} - \theta_i|$ is the minimum obtained at the medians of the posterior marginals?

Xi'an
  • 90,397
  • 9
  • 157
  • 575
nbedou
  • 147
  • 4

1 Answers1

1

For the mean square error loss function $$L(\hat{θ̂},θ)=(\hat{θ̂}−θ)^2$$ a well known result is that $$\hat{θ̂}=\mathbb{E}(θ|x)$$ minimises the Bayes risk (minimum mean square estimator).

generalises quite easily [see, e.g., my book The Bayesian Choice, Chap. 2, Corrolary 2.5.3] to

For the quadratic error loss function $$L(\hat{θ̂},θ)=(\hat{θ̂}−θ)^\text{T}\mathbf{A}\,(\hat{θ̂}−θ)$$ the posterior expectation $$\hat{θ̂}=\mathbb{E}(θ|x)$$ minimises the posterior risk, whatever the positive definite matrix $\mathbf{A}$.

The remark

this is counter-intuitive to me because this estimate would be characterised solely by the posterior marginals, whereas the Maximum A Posterior (MAP) estimator depends on the joint posterior marginal.

just points out that the posterior expectation only depends on the marginal posterior distributions of the $\theta_i$'s, which is not at all counter-intuitive when considering that minimising the sum $$\mathbb{E}\left[\sum_{i=1}^p (\hat{θ̂}_i−θ_i)^2\right]= \sum_{i=1}^p\mathbb{E}\left[(\hat{θ̂}_i−θ_i)^2\right]$$ is equivalent to minimising each term of the sum, which only depends on the marginal posterior distribution of the corresponding $\theta_i$. The MAP is not directly related to a loss function.

And the above argument applies to

for the cost function $L(\hat{\theta},θ)=|\hat{\theta}−θ|$, in one dimension, the minimum is the median of $p(θ|x)$. In multiple dimensions, with $$L(\hat{\theta} ,θ)=∑_{i=1}^p |\hat{\theta}_i−θ_i|$$ is the minimum obtained at the medians of the posterior marginals?

in that the argument of the minimum of the sum in $i$ is made of the arguments of the minima of the absolute errors for all $i$'s, that is, the vector of the marginal medians.

Xi'an
  • 90,397
  • 9
  • 157
  • 575