I have an elementary question that I'm sure must be answered in textbooks somewhere, but I haven't found it yet. Why is the average the right way to deal with Gaussian noise?
Let's flesh this out a bit. Here's my model. There is an unknown parameter $\theta$. We have iid random variables $X_1,\dots,X_n$, defined as $X_i = \theta + Y_i$ where $Y_i \sim \mathcal{N}(0, \sigma^2)$. In other words, the $Y_i$'s are Gaussian noise, and each $X_i$ is a noisy observation of the underlying parameter. Now, given observations of $X_1,\dots,X_n$, we want to estimate the underlying parameter $\theta$.
I think I remember reading that the optimal way to estimate $\theta$ is using the average: i.e., $\hat{\Theta} = \frac{1}{n} (X_1 + \dots + X_n)$. Is this true? In what sense is this optimal, and why is it optimal?