0

I understand that when data errors are independent and Gaussian distributed, then the maximum likelihood principle solution is the least squares solution (i.e. I understand this answer). However, some of the sources I am reading imply that the equivalence goes in both directions, e.g. one asks "The choice of parameters resulting from a least squares linear regression correspond to the maximum likelihood estimate of which Likelihood function", and their answer is presumably the Gaussian likelihood function.

But my understanding is that any function which satisfies both of the following conditions could in theory also be a likelihood function that that corresponds to the least squares estimator:

  • is an increasing function of $-\sum_i(y_i-x_i\beta)^2$
  • is a normalized probability density function

Is this correct and my sources were simply sloppy with language, or is there something that I'm missing and the only likelihood function which can correspond to the least squares estimator is the Gaussian?

The Hagen
  • 121
  • 3
  • Your understanding is not correct. For instance, one increasing function is the square root. That corresponds to a Laplace distribution of errors, whose MLE is median regression, typically producing a different solution than the OLS model. Note that the function *must* be applied to each term of the sum--nothing else would make sense. – whuber Sep 01 '20 at 20:50
  • I meant the function should be an increasing function of the entire summation, e.g. $\sqrt{-\sum_i(y_i-x_i\beta)^2}$, rather than applying the function on each term of the summation. When you say nothing else would make sense, is this because the likelihood function needs to be decomposed into a product of probability density functions for each independent observation? – The Hagen Sep 01 '20 at 21:12
  • 1
    The sum arises by combining independent data (through the multiplication rule, it's the log of the product of probabilities). Applying a square root to such a sum (if it has two or more terms) cannot possibly arise in any way from a sample, and so would never be a likelihood for a sample. – whuber Sep 01 '20 at 21:14
  • Sure, I agree the square root would never be a likelihood for a sample, I was just emphasizing that the function would be operating on the entire sum. But supposing if we had some kind of weird constraints on our parameter space, such that it wasn't the real line (or maybe not even intervals of the real line), could it be conceivable that we could have other strange (possibly not analytical) functions as the likelihood function? – The Hagen Sep 01 '20 at 21:18
  • Many likelihoods *already* acquire interesting forms -- there's no need to introduce artificial complications like that! But constraining the parameter space never changes the form in which the function itself is expressed (although it can permit it to be re-expressed in a different form, in unusual circumstances). Furthermore, the log likelihood enjoys some important properties that are destroyed by your approach. For instance, differences in log likelihoods are used to test hypotheses and construct confidence intervals. – whuber Sep 01 '20 at 21:20
  • Thank you for the discussion, and for reminding me that the sum arises naturally from the product of probabilities! I definitely agree that the Gaussian is the most natural choice and probably the only choice for most parameter spaces. I guess I need more experience to understand more of the practical side, but this helped me understand the equivalence between the approaches. – The Hagen Sep 01 '20 at 21:30
  • @The Hagen . The way I think of it is that, for the normal, the likelihood sum has the $exp(-\sum(x_{i})^2)$ term in it where the $x_{i}$ are the differences between the observations and the mean. But the difference between the observation and the mean is, by definition, the residual so when you maximize the likelihood, you are minimizing the sum of the residuals squared. – mlofton Sep 01 '20 at 23:37

0 Answers0