Parameters estimation of a normal distribution from noisy observations

Question

Some reference links since I don't have enough reputation to post more than 2 links:

"First post": Estimating parameters of a normal distribution from noisy observation of samples

"Second post": Determining true mean from noisy observations

Wikipedia article on weighted mean: https://en.wikipedia.org/wiki/Weighted_arithmetic_mean

I would like to estimate parameters of a normal distribution from noisy observations. I've found the first post in which the author posted a solution. However after reading that post and threads that post leads to such as the second post. I am still confused about this problem. So here, I am trying again. I'll be concrete as possible as I can and hopefully this post will serve people who have the same problem as I expect there will be many. Lastly, since I am still learning statistics, if anyone spots an error in the question please feel free to edit or report it. Thanks in advance for all the help.

Here is the problem (the notation is very smiliar to the first post for easier comparison): I have $n$ observations $z_{i} = x_{i} + e_{i}, i \in [1,\dots,n]$. All $x_{i}$ subject to an univariate normal distribution $\mathcal{N}(\mu,\gamma^2)$ where $\mu$ and $\gamma$ are scalars. The observations $z_{i}$ is perturbed by noise $e_{i}$ and each $e_{i} \sim \mathcal{N}(0,\sigma_{i}^2)$ where each $\sigma_{i}$ is different from the others which is also a scalar. Given that I know all $z_{i}$ and $\sigma_{i}$, how to estimate $\mu$ and $\gamma$?

Let's target the estimation of $\mu$ first. From my understanding so far this is a problem of estimating parameters from weighted observations. In the special case if we assign each $z_{i}$ the same weight, then $\hat{\mu} = \frac{\sum_{i=1}^{n}z_{i}}{n}$. However since each $z_{i}$ is perturbed by $e_{i}$ with different variance $\sigma_{i}$, a more intelligent solution is to assign $z_{i}$ different weights $w_{i}$ and apply the formula for weighte mean calculation $\hat{\mu} = \frac{\sum_{i=1}^{n}w_{i}z_{i}}{\sum_{i=1}^{n}w_{i}}$. The intuition is to assign $z_{i}$ which has $e_{i}$ with smaller $\sigma_{i}$ a larger weight since it has been perturbed "less"; and assign $z_{i}$ which has $e_{i}$ with larger $\sigma_{i}$ a smaller weight since it has been perturbed "more".

Next, as $z_{i} = x_{i} + e_{i}$ and we assume $x_{i}$ and $e_{i}$ are come from mutually independent normal distributions. Then $z_{i} \sim \mathcal{N}(\mu + 0,\gamma^2 + \sigma_{i}^2)$. According to the Wikipedia article on weighted mean and the second post, one way for picking up the weights is to set $w_{i} = \frac{1}{\gamma^2 + \sigma_{i}^2}$. Plugin this to the weighted mean calculation we get a formula for estimating $\mu$:

$\hat{\mu} = \frac{\sum_{i=1}^{n}z_{i}/(\gamma^2 + \sigma_{i}^2)}{\sum_{i=1}^{n}1/(\gamma^2 + \sigma_{i}^2)}$ (1)

Here comes my first question: is my understanding so far correct? in the second post it says the reason of why set $w_{i} = \frac{1}{\gamma^2 + \sigma_{i}^2}$ assuming $\sum_{i=1}^{n}w_{i} = 1$ is "easily obtained with a Lagrange multiplier or by re-interpreting the situation geometrically as a distance minimization problem" could anyone point me to the deriviation of this? I would like to know that calculation.

Moving on, in equation (1), we actually cannot do the computation because we don't know $\gamma$ which is the second parameter we would like to estimate. Suggested by the first post, in the Wikipedia article on weighted mean "weighted sample variance" section, it seems like we can estimate $\gamma$ with following equation:

$\hat{\gamma}^2 = \frac{\sum_{i=1}^{n}(x_{i}-\hat{\mu})^2/(\gamma^2 + \sigma_{i}^2)}{\sum_{i=1}^{n}1/(\gamma^2 + \sigma_{i}^2)}$ (2).

Here comes my second question, is my derivation so far correct? And most importantly, in equation (1) and (2), we have unknown parameters intertwined together (especially given that $\hat{\mu}$ appears in (2)), how can we solve these two equations to get $\hat{\mu}$ and $\hat{\gamma}$?

You can just write down the likelihood function and maximize that. — kjetil b halvorsen, Jul 29 '17 at 20:57

Parameters estimation of a normal distribution from noisy observations

0 Answers0