Assume n samples from Bernoulli distribution with unknown parameter p,
i. e. $x_{1},....,x_{n}\underset{iid}{\sim}Ber\left(p\right)$
It is known that the confidence interval is given by:
$CI=\hat{p}\pm z_{1-\frac{\alpha}{2}}\cdot\sqrt{\hat{p}\left(1-\hat{p}\right)/n}$
where $\hat{p}=\frac{1}n\sum_{i=1}^{n}x_{i}$
This means that the estimator of the variance being used in this formula is:
$s^2 = \hat{p}\left(1-\hat{p}\right) $
The latter is not trivial for me. I've managed to prove that this estimator is biased (with a finite sample size) but failed to prove consistency.
Since it's not clear (for me) if this estimator is consistent, and it's clearly biased, why should we use it and not the usual unbiased estimator?
Is there a full mathematical reason for this? (A reference could be great also). Maybe I'm failing to understand something else?