2

Consider $x_{1}, x_{2},\ldots, x_{n} \in \mathbb{R}$ and independent random variables $y_{1}, y_{2},\ldots, y_{n}$
where $ y_i = \theta_{0} + \theta_{1}x_i^2 + \theta_{2}\exp{(x_i)} + \varepsilon_i $
and where $\varepsilon_i$ are distributed as $N(0,\sigma^2)$.
Derive the maximum likelihood estimator for $\theta_{0},\theta_{1},\theta_{2}$ .

I understand the first step is to find the PDF, but how do I do that?

Kirsten
  • 439
  • 1
  • 13
  • 1
    If $\varepsilon_i$'s are independent normal, then so are the $y_i$'s. Likelihood is the joint pdf of $y_1,\ldots,y_n$. – StubbornAtom Jun 13 '21 at 11:05
  • Thank you, Why is that? – Kirsten Jun 13 '21 at 20:59
  • 1
    The concept of Likelihood is to consider the joint probability (although people don't like to use this term for reasons) as P(Y_1 = y_1, Y_2 = y_2, ..., Y_n = y_n), thus the probability of a set of random variables all taking a specific value (i.e. the realised value y_i). Considering that the randomness only comes from e_i and that these are independent one can write the joint probability as the product of individual probabilities. However, keep in mind this is rather the idea, Likelihoods should NOT be interpreted as actual probabilities! – Erin Sprünken Jun 15 '21 at 07:36
  • Thanks, but my tutor told me to start with the PDF. The PDF is about actual probabilities. My question is about the very first step. – Kirsten Jun 15 '21 at 15:57
  • Checking https://stats.stackexchange.com/questions/485011/what-is-a-random-variable-and-what-isnt-in-regression-models – Kirsten Jun 16 '21 at 08:52

2 Answers2

6

Since the error terms are IID normal random variables, you have the likelihood:

$$\begin{align} L_\boldsymbol{\varepsilon}(\sigma) &= \prod_{i=1}^n \text{N}(\varepsilon_i| 0, \sigma^2) \\[6pt] &= \prod_{i=1}^n (2 \pi \sigma^2)^{-1/2} \cdot \exp \Bigg( - \frac{1}{2 \sigma^2} \cdot \varepsilon_i^2 \Bigg) \\[6pt] &= (2 \pi \sigma^2)^{-n/2} \cdot \exp \Bigg( - \frac{1}{2 \sigma^2} \sum_{i=1}^n \varepsilon_i^2 \Bigg). \\[6pt] \end{align}$$

Now, using the transformation $\varepsilon_i = y_i - \theta_0 - \theta_1 x_i^2 - \theta_2 \exp(x_i)$ you get:

$$\begin{align} L_{\mathbf{y}, \mathbf{x}}(\boldsymbol{\theta}, \sigma) &= \prod_{i=1}^n \text{N}(y_i - \theta_0 - \theta_1 x_i^2 - \theta_2 \exp(x_i)| 0, \sigma^2) \\[6pt] &= (2 \pi \sigma^2)^{-n/2} \cdot \exp \Bigg( - \frac{1}{2 \sigma^2} \sum_{i=1}^n (y_i - \theta_0 - \theta_1 x_i^2 - \theta_2 \exp(x_i))^2 \Bigg), \\[6pt] \end{align}$$

so the log-likelihood function is:

$$\ell_{\mathbf{y}, \mathbf{x}}(\boldsymbol{\theta}, \sigma) = \text{const} - n \log \sigma - \frac{1}{2 \sigma^2} \sum_{i=1}^n (y_i - \theta_0 - \theta_1 x_i^2 - \theta_2 \exp(x_i))^2.$$

Ben
  • 91,027
  • 3
  • 150
  • 376
  • Thank you, but I was told to start with the PDF. Why does the randomness only come from the noise? – Kirsten Jun 17 '21 at 01:48
  • 1
    The likelihood I have written here has the same form as the sampling density, so you can use it for that if you want. Re the randomness, regression holds the explanatory variables as fixed conditioning values, so what else is random in the model? – Ben Jun 17 '21 at 03:52
  • 1
    Thanks I will send you some bounty – Kirsten Jun 17 '21 at 04:58
  • My lecture mentioned that logistic regression directly models conditional probabilities. I guess that is why I thought randomness was involved. – Kirsten Jun 17 '21 at 21:18
  • 1
    The randomness is situated in the "error term" of the regression model. This is really just definitional, since we usually *define* the error term as the deviation of an outcome from the true regression function (i.e., the true conditional expectation of the response variable). – Ben Jul 19 '21 at 22:14
2

We have $$y_i - \theta_0 - \theta_1x_i^2 - \theta_2 \exp(x_i)=\epsilon_i \sim N(0, \sigma^2)$$

By independence, the joint probability density is

$$f(\epsilon_1, \ldots, \epsilon_n; \sigma)=\prod_{i=1}^n \frac1{\sigma \sqrt{2\pi}}\exp\left(-\frac{\epsilon_i^2}{2\sigma^2} \right)$$

Substituting in $\epsilon_i=y_i-\theta_0-\theta_1x_i^2-\theta_2\exp(x_i).$

We want to maximize the likelihood function

\begin{align}&L(\theta_0, \theta_1, \theta_2, \sigma; x_1,\ldots, x_n, y_1, \ldots, y_n) \\&=\prod_{i=1}^n \frac1{\sigma \sqrt{2\pi}}\exp \left(-\frac12 \left( \frac{y_i-\theta_0-\theta_1x_i^2 - \theta_2 \exp(x_i)}{\sigma}\right)^2 \right) \\ &=\frac1{\sigma^n (2\pi)^{\frac{n}2}}\exp \left(-\frac1{2\sigma^2} \sum_{i=1}^n\left(y_i-\theta_0-\theta_1x_i^2-\theta_2\exp(x_i)\right)^2 \right)\end{align}

Taking logarithm, and dropping the constant terms, we want to minimize

$$\sum_{i=1}^n (y_i-\theta_0-\theta_1x_i^2-\theta_2\exp(x_i))^2$$

which is a convex problem. Differentiate with respect to $\theta_0, \theta_1, \theta_2$ respectively and equate them to $0$ gives us:

$$n \theta_0 + \left( \sum_{i=1}^n x_i^2\right) \theta_1 + \left(\sum_{i=1}^n \exp(x_i) \right) \theta_2 = \sum_{i=1}^n y_i $$

$$\left( \sum_{i=1}^n x_i^2\right) \theta_0 + \left( \sum_{i=1}^n x_i^4\right) \theta_1 + \left(\sum_{i=1}^n x_i^2\exp(x_i) \right) \theta_2 = \sum_{i=1}^n x_i^2y_i $$

$$\left( \sum_{i=1}^n \exp(x_i)\right) \theta_0 + \left( \sum_{i=1}^n x_i^2\exp(x_i)\right) \theta_1 + \left(\sum_{i=1}^n \exp(2x_i) \right) \theta_2 = \sum_{i=1}^n \exp(x_i)y_i $$

Solving the linear system will give you the MLE.

Siong Thye Goh
  • 6,431
  • 3
  • 17
  • 28