Why is the likelihood function for linear regression based on $(|)$ instead of $(,)$

Question

Likelihood functions typically based on the joint probabilities of the variables involved. In linear regression we have variables and , but derivations for MLE under a zero meaned gaussian assumption for the errors uses the conditional probability (|) for the likelihood function. Why does this deviate from the standard of usage of joint probability (,)?

Models have a regression structure if the predictor or explanatory variables either are design variables, (variables subject to experimental control) or are ancillary (variables that have a joint marginal distribution that does not depend on model parameters). With this type of structure, the analysis can be made conditional on the explanatory variables, and it is customary to do so. — Jesper for President, May 25 '20 at 23:08
@JesperforPresident Hmm I see. So then why for gaussian discriminant analysis do we use the joint probability instead of the conditional? — David, May 25 '20 at 23:37
I am not expert on LDA, but from what I understand the explanatory variables are not ancillary for LDA. — Jesper for President, May 27 '20 at 01:41
@David, can you provide a reference for LDA? My understanding is that, for LDA, we also classify variables $y$ depending on predictors $x$. E.g., the choice of travel mode *given*, say, age, gender, income. — Christoph Hanck, May 27 '20 at 13:20
@ChristophHanck Sure. I saw it in these notes: http://cs229.stanford.edu/notes/cs229-notes2.pdf, pg 5 towards the bottom. Note that I'm not well versed with MLE, but it looks like the joint probability is used to me. — David, May 27 '20 at 21:05
Hm, which part of these notes do you refer to? E.g., "we can then use the Gaussian Discriminant Analysis (GDA) model, which models p(x|y)" looks pretty conditional to me? — Christoph Hanck, May 28 '20 at 07:01
@ChristophHanck, the second to last equation from the bottom of page 5 writes the likelihood in terms of the product of the joint distributions. When MLE is performed for regression under zero meaned gaussian assumption, they wrote it as the product of the conditional distributions. — David, May 28 '20 at 16:24

score 1 · Answer 1 · answered Mar 11 '21 at 02:54

Since $p(y,x)=p_1(y|x)p_2(x)$ from the definition of conditional density, the log likelihood is given by $\log p(y,x)=\log p_1(y|x) + \log p_2(x)$. However, since $\log p_2(x)$ does not depend on the parameter of interest, $\beta$ and $\sigma^2$, maximizing $\log p_1(y|x)$ is equivalent to maximizing $\log p(y,x)$.

score 0 · Answer 2 · answered Mar 11 '21 at 00:11

Regression models are models for conditional expectations. Linear regression is then a model for the conditional expectation of $Y$ given $X=x$. So $x$, the realized value, is usually treated as a known value, without uncertainty (if that is not your case, look for errors in variables).

You can find much more information in this other posts:

Why is the likelihood function for linear regression based on $(|)$ instead of $(,)$

2 Answers2