Asymptotic normal distribution via the central limit theorem

Question

I have a sample $n = 100$ with two "successes" (Two kids having a disease among 100). So we obviously have a binomial distribution.

First I had to compute the maximum likelihood (ML) estimator $\hat{p}$. I got $\hat{p} = \frac{k}{n}$.

Now, I have to derive asymptotic normal distribution for $\hat{p}$ via the central limit theorem (CLT).

I know that the expected value of $\hat{p}$ is not infinite and also variance is not infinite, so I know it will be normally distributed.

I have to know expected value and variance of $\hat{p}$ to get the asymptotic normal distribution, right?

I know that expected value is $\frac{k}{n}$. But what is variance?

Why is $\hat{p}$ a random variable at all? Also, under almost any model that you might dream up, the CLT will not give you the asymptotic distribution of $\hat{p}$ directly; you have to infer from what the CLT tells you that the asymptotic distribution is that of a degenerate random variable that non-statisticians call a constant. See [this answer](http://stats.stackexchange.com/a/22532/6633) to a related question. — Dilip Sarwate, Oct 26 '13 at 13:37
Apparently "$k$" is a *random variable* referring to the number of successes and you are modeling it with a Binomial$(100, \hat{p})$ distribution. What is the mean of that distribution? What is its variance? How are these related to $k/n$? — whuber, Oct 26 '13 at 15:14
"*I have a sample n=100 with two "successes" (Two kids having a disease among 100). So we obviously have a binomial distribution*" -- this conclusion isn't obvious to me, since we haven't established homogeneity of probability, or independence (kids from one school, or one neighborhood, with a highly contagious disease, for example, wouldn't be expected have independent disease status). How did you arrive at this being obvious without making explicit assumptions to that effect? — Glen_b, Oct 28 '13 at 00:47

score 3 · Accepted Answer · answered Oct 27 '13 at 14:34

Each child can be modeled as a Bernoulli r.v. $X_i$ with probability of having the disease equal to $p_i$, $X_i \sim B(p_i)$, $i=1,\dots ,n$. If you assume that a) $p_1 =p_2=\dots=p_n=p$ and b) that these are independent rv's then their joint density is

$$f(X_1,\dots,X_n) = \prod_{i=1}^{n}p^{x_i}(1-p)^{1-x_i}$$ and their log-likelihood function, viewed as a function of $p$ is

$$\ln L =\sum_{i=1}^{n}\left\{x_i\ln p+(1-x_i)\ln (1-p)\right\}$$

which leads to the MLE for $p$ $$\hat p =\frac 1n\sum_{i=1}^{n}x_i$$ which is unbiased since $$E\hat p =\frac 1n\sum_{i=1}^{n}Ex_i = \frac 1n np =p$$

Consider now the variable $$U_i = X_i - E(X_i) = X_i -p \Rightarrow X_i = U_i + p$$ We have $$EU_i = 0,\qquad Var(u_i) = Var(X_i) = p(1-p) $$ so it is covariance-stationary.

Subsitute for the $x$'s in the estimator

$$\hat p =\frac 1n\sum_{i=1}^{n}(u_i+p) = \frac 1n\sum_{i=1}^{n}u_i +p$$ and consider the quantity $$\sqrt n (\hat p-p) =\sqrt n\frac 1n\sum_{i=1}^{n}u_i= \frac {1}{\sqrt n}\sum_{i=1}^{n}u_i$$

Since the $U$'s are covariance stationary, (and evidently i..i.d) then the CLT certainly applies and so

$$\sqrt n (\hat p-p) \rightarrow_d N\left (0, p(1-p)\right) $$

For approximate statistical inference, we manipulate this expression through $$ \sqrt n (\hat p-p) = Z \Rightarrow \hat p = \frac {1} {\sqrt n}Z +p$$

and write that, for "large samples"

$$\hat p \sim_{approx} N\left (p, \frac {p(1-p)}{n}\right)$$

(but not when $n$ truly goes to infinity, since then $\hat p$ does not have a distribution, but collapses to a constant, the true value $p$ since $\hat p$ is a consistent estimator).

Thanks Alecos! In the meantime I got this as well and it's nice that you confirm my results! :-) But there is something else I want to know: This is the approximation. What is the exact distribution of the estimated p? — Michael, Oct 27 '13 at 16:46
Michael, that's another question. You have the functional form of $\hat p$. Look up how we derive the distribution of a function of a _discrete_ random variable. And if you cannot solve this, post the question. — Alecos Papadopoulos, Oct 27 '13 at 21:44

Asymptotic normal distribution via the central limit theorem

1 Answers1

Linked