How do I get the probability density function of a regression equation?

Question

Consider $x_{1}, x_{2},\ldots, x_{n} \in \mathbb{R}$ and independent random variables $y_{1}, y_{2},\ldots, y_{n}$
where $ y_i = \theta_{0} + \theta_{1}x_i^2 + \theta_{2}\exp{(x_i)} + \varepsilon_i $
and where $\varepsilon_i$ are distributed as $N(0,\sigma^2)$.
Derive the maximum likelihood estimator for $\theta_{0},\theta_{1},\theta_{2}$ .

I understand the first step is to find the PDF, but how do I do that?

If $\varepsilon_i$'s are independent normal, then so are the $y_i$'s. Likelihood is the joint pdf of $y_1,\ldots,y_n$. — StubbornAtom, Jun 13 '21 at 11:05
The concept of Likelihood is to consider the joint probability (although people don't like to use this term for reasons) as P(Y_1 = y_1, Y_2 = y_2, ..., Y_n = y_n), thus the probability of a set of random variables all taking a specific value (i.e. the realised value y_i). Considering that the randomness only comes from e_i and that these are independent one can write the joint probability as the product of individual probabilities. However, keep in mind this is rather the idea, Likelihoods should NOT be interpreted as actual probabilities! — Erin Sprünken, Jun 15 '21 at 07:36
Thanks, but my tutor told me to start with the PDF. The PDF is about actual probabilities. My question is about the very first step. — Kirsten, Jun 15 '21 at 15:57
Checking https://stats.stackexchange.com/questions/485011/what-is-a-random-variable-and-what-isnt-in-regression-models — Kirsten, Jun 16 '21 at 08:52

score 6 · Answer 1 · answered Jun 15 '21 at 09:11

6

Since the error terms are IID normal random variables, you have the likelihood:

$$\begin{align} L_\boldsymbol{\varepsilon}(\sigma) &= \prod_{i=1}^n \text{N}(\varepsilon_i| 0, \sigma^2) \\[6pt] &= \prod_{i=1}^n (2 \pi \sigma^2)^{-1/2} \cdot \exp \Bigg( - \frac{1}{2 \sigma^2} \cdot \varepsilon_i^2 \Bigg) \\[6pt] &= (2 \pi \sigma^2)^{-n/2} \cdot \exp \Bigg( - \frac{1}{2 \sigma^2} \sum_{i=1}^n \varepsilon_i^2 \Bigg). \\[6pt] \end{align}$$

Now, using the transformation $\varepsilon_i = y_i - \theta_0 - \theta_1 x_i^2 - \theta_2 \exp(x_i)$ you get:

$$\begin{align} L_{\mathbf{y}, \mathbf{x}}(\boldsymbol{\theta}, \sigma) &= \prod_{i=1}^n \text{N}(y_i - \theta_0 - \theta_1 x_i^2 - \theta_2 \exp(x_i)| 0, \sigma^2) \\[6pt] &= (2 \pi \sigma^2)^{-n/2} \cdot \exp \Bigg( - \frac{1}{2 \sigma^2} \sum_{i=1}^n (y_i - \theta_0 - \theta_1 x_i^2 - \theta_2 \exp(x_i))^2 \Bigg), \\[6pt] \end{align}$$

so the log-likelihood function is:

$$\ell_{\mathbf{y}, \mathbf{x}}(\boldsymbol{\theta}, \sigma) = \text{const} - n \log \sigma - \frac{1}{2 \sigma^2} \sum_{i=1}^n (y_i - \theta_0 - \theta_1 x_i^2 - \theta_2 \exp(x_i))^2.$$

answered Jun 15 '21 at 09:11

Ben

91,027
3
150
376

Thank you, but I was told to start with the PDF. Why does the randomness only come from the noise? – Kirsten Jun 17 '21 at 01:48
1

The likelihood I have written here has the same form as the sampling density, so you can use it for that if you want. Re the randomness, regression holds the explanatory variables as fixed conditioning values, so what else is random in the model? – Ben Jun 17 '21 at 03:52
1

Thanks I will send you some bounty – Kirsten Jun 17 '21 at 04:58
My lecture mentioned that logistic regression directly models conditional probabilities. I guess that is why I thought randomness was involved. – Kirsten Jun 17 '21 at 21:18
1

The randomness is situated in the "error term" of the regression model. This is really just definitional, since we usually *define* the error term as the deviation of an outcome from the true regression function (i.e., the true conditional expectation of the response variable). – Ben Jul 19 '21 at 22:14

Siong Thye Goh · Accepted Answer · 2021-06-16T02:00:30.383

We have $$y_i - \theta_0 - \theta_1x_i^2 - \theta_2 \exp(x_i)=\epsilon_i \sim N(0, \sigma^2)$$

By independence, the joint probability density is

$$f(\epsilon_1, \ldots, \epsilon_n; \sigma)=\prod_{i=1}^n \frac1{\sigma \sqrt{2\pi}}\exp\left(-\frac{\epsilon_i^2}{2\sigma^2} \right)$$

Substituting in $\epsilon_i=y_i-\theta_0-\theta_1x_i^2-\theta_2\exp(x_i).$

We want to maximize the likelihood function

\begin{align}&L(\theta_0, \theta_1, \theta_2, \sigma; x_1,\ldots, x_n, y_1, \ldots, y_n) \\&=\prod_{i=1}^n \frac1{\sigma \sqrt{2\pi}}\exp \left(-\frac12 \left( \frac{y_i-\theta_0-\theta_1x_i^2 - \theta_2 \exp(x_i)}{\sigma}\right)^2 \right) \\ &=\frac1{\sigma^n (2\pi)^{\frac{n}2}}\exp \left(-\frac1{2\sigma^2} \sum_{i=1}^n\left(y_i-\theta_0-\theta_1x_i^2-\theta_2\exp(x_i)\right)^2 \right)\end{align}

Taking logarithm, and dropping the constant terms, we want to minimize

$$\sum_{i=1}^n (y_i-\theta_0-\theta_1x_i^2-\theta_2\exp(x_i))^2$$

which is a convex problem. Differentiate with respect to $\theta_0, \theta_1, \theta_2$ respectively and equate them to $0$ gives us:

$$n \theta_0 + \left( \sum_{i=1}^n x_i^2\right) \theta_1 + \left(\sum_{i=1}^n \exp(x_i) \right) \theta_2 = \sum_{i=1}^n y_i $$

$$\left( \sum_{i=1}^n x_i^2\right) \theta_0 + \left( \sum_{i=1}^n x_i^4\right) \theta_1 + \left(\sum_{i=1}^n x_i^2\exp(x_i) \right) \theta_2 = \sum_{i=1}^n x_i^2y_i $$

$$\left( \sum_{i=1}^n \exp(x_i)\right) \theta_0 + \left( \sum_{i=1}^n x_i^2\exp(x_i)\right) \theta_1 + \left(\sum_{i=1}^n \exp(2x_i) \right) \theta_2 = \sum_{i=1}^n \exp(x_i)y_i $$

Solving the linear system will give you the MLE.

but isn't that the likelihood equation? Same as Ben has? My understanding is that Probability Density is not the same thing as Likelihood. — Kirsten, Jun 16 '21 at 01:41
relevant [readings](https://stats.stackexchange.com/q/31238/128610). Given theparameter, how will the data be like is density distributions. Likelihood is a function of the parameter given the data. — Siong Thye Goh, Jun 16 '21 at 02:10
Thank you but I don't see a PDF of the regression equation. I see one for the noise. Am I asking the wrong question? — Kirsten, Jun 16 '21 at 03:05

How do I get the probability density function of a regression equation?

2 Answers2