2

The Maximum Likelihood estimator theory comes often with the the theoretical result on the variance:

$\sigma^2(\hat \theta_{MLE}) \sim -\left(E[\frac{\partial^2 log L(\theta=\theta_0)}{\partial^2 \theta}]\right)^{-1}$

, called the inverse of the Fisher information ($\theta_0$ are the unknown exact parameters). The proofs I saw of this result always consider the problem of estimating the parameters of an unknown distribution having at disposal $\{y_1,...y_T\}$, a sample of identically distributed and independent extractions. These kind of proofs to me do not seem to apply immediately to simple linear regression, where we try to fit the model:

$\hat y_i = \alpha x_i +\hat \epsilon_i$

and the sample here is extracted from variables $\{y_1,...y_T\}$ which are independent but not identically distributed (e.g. $E[\hat y_i]=\alpha x_i$). Nevertheless, MLE theory is used to derive formulas of linear regression and also their variance.

Where can I find a presentation of MLE which applies directly to linear regression problems? (or maybe understand why the standard presentation applies...)

Thomas
  • 623
  • 3
  • 14
  • 1
    OLS assumption: $\hat \epsilon_i$ i.i.d. $N(0,\sigma^2)$. Now blast away with MLE. – Mark L. Stone Jan 14 '17 at 12:51
  • Thanks, but is it not a problem, to make the connection, that the errors $\hat \epsilon_i$ are unobserved and the parameter $\alpha$ unknown (so that we cannot infere the error therm) ? – Thomas Jan 14 '17 at 13:40
  • It is not a problem. The errors are unknown. If you knew what the errors were, they wouldn't need to be errors. The distribution of the errors is assumed known, other than the unknown parameter $\sigma^2$. – Mark L. Stone Jan 14 '17 at 21:06
  • I make an example to explain my doubt. In this pdf: https://www.le.ac.uk/users/dsgp1/COURSES/THIRDMET/MYLECTURES/1XMAXILIKE.pdf a fundamental point to get Fisher estimate of the variance is equation (22). It relies on the central limit theorem and the fact that the likelyhood can be written as $L=\prod_{i=1}^N f(y_i; \theta)$, function f being fixed. How can this framework applied to OLS ? In that case the likelyhood is of the type $L=\prod_{i=1}^N f(y_i; \mu_i ,\theta)$ with the non random parameters $\mu_i$ depending on $i$.Ido not get how the central limit theorem applied in the second one. – Thomas Jan 16 '17 at 14:54
  • MLE has "nothing" to do with Central Limit Theorem, unless the asymptotic behavior is considered. Anyhow, look at http://stats.stackexchange.com/questions/143705/maximum-likelihood-method-vs-least-squares-method and the links in there. – Mark L. Stone Jan 16 '17 at 16:27
  • Thanks now I check the links and see if I find the answer to my doubt!. But the relation between the Fisher information Matrix and the variance of the estimator (the one I am referring in the question) is a son of the asymptotic behavior. Am I wrong? – Thomas Jan 17 '17 at 09:39

0 Answers0