1

For two sequence of random variables $X_1,\dots, X_n\sim_{iid} Beroulli(p) X_i$ and $Y_1,\dots, Y_n\sim_{iid} Beroulli(q)$ ($X_i$, $Y_j$ are independent), we have CLT $$ \sqrt{n}(\bar{X}_n-p)\to N(0,p(1-p)) $$ and $$ \sqrt{n}(\bar{Y}_n-q)\to N(0,q(1-q)) $$ Using Delta method, we get $$ \sqrt{n}(\log(\frac{\bar{X}_n}{1-\bar{X}_n})-\log (\frac{p}{1-p}))\to N(0, \frac{1}{p(1-p)}) $$

(1) Question: What is the distribution of $\sqrt{n}(\log(\frac{\bar{X}_n}{1-\bar{X}_n})-\log (\frac{p}{1-p})-\log(\frac{\bar{Y}_n}{1-\bar{Y}_n})-\log \frac{q}{1-q}))$?

Do we have $$\sqrt{n}(\log(\frac{\bar{X}_n}{1-\bar{X}_n})-\log (\frac{p}{1-p})-\log(\frac{\bar{Y}_n}{1-\bar{Y}_n})-\log \frac{q}{1-q}))\to N(0, \frac{1}{p(1-p)}-\frac{1}{q(1-q)})?$$ Otherwise, how to get it?

I think I am right. (2) Question: what is the consistent estimator of $\frac{1}{p(1-p)}-\frac{1}{q(1-q)}$?

Bob
  • 136
  • 9
  • Suppose $q=1/2:$ have you noticed your suggested variance of $p(1-p)-q(1-q)$ is *negative?* – whuber Feb 03 '22 at 18:16
  • @whuber I edited my question. Thanks! – Bob Feb 03 '22 at 18:22
  • 1
    Now suppose $p=1/2:$ your suggested variance of $1/(p(1-p))-1/(q(1-q))$ again is negative. It sounds like you ought to begin with https://stats.stackexchange.com/questions/26886. – whuber Feb 03 '22 at 18:37
  • Maybe! It depends on whether the $Y_i$ are independent of the $X_j.$ Your phrase "two independent sequence[s]" is ambiguous. Perhaps you mean that *everything* is independent? If so, have you thought of applying the CLT directly to the sequence $X_n-Y_n$? – whuber Feb 03 '22 at 18:53
  • 2
    The title asks for a different distribution than the text. Which one is the problem that you wish to get answered in this question?$$\sqrt{n}(\bar{X}_n-p)-\sqrt{n}(\bar{Y}_n-q)$$ versus $$\sqrt{n}(\log(\frac{\bar{X}_n}{1-\bar{X}_n})-\log (\frac{p}{1-p})-\log(\frac{\bar{Y}_n}{1-\bar{Y}_n})-\log \frac{q}{1-q}))$$ – Sextus Empiricus Feb 03 '22 at 19:25
  • Some parts of your text make the question difficult to read. I understand that 'Beroulli' should be 'Bernoulli', but what is $\bar{X}_n$ supposed to mean? I guess it is the mean of a sample of $n$ different $X_i$ (My confusion is that the subscript $n$ is confusing. It occurs as the $n-th$ member in the sample $X_1,X_2,\dots$ and it occurs in the mean of the $X_i$. – Sextus Empiricus Feb 03 '22 at 19:33
  • 1
    What if $\bar{X}_n = 1$? Then the division $\frac{\bar{X}_n}{1-\bar{X}_n}$ is undefined (division by zero). Why do you make this transformation? – Sextus Empiricus Feb 03 '22 at 19:35
  • Now that you have changed the question so substantially, the suggestion to analyze $X_n-Y_n$ is irrelevant. – whuber Feb 03 '22 at 19:56

2 Answers2

3

You're almost correct. As a reminder, you have to assume that $p, q \neq 0$.

Assume two independent sequences of random variables, $A_n$ and $B_n$, converge in distribution to $A$ and $B$: $A_n \rightarrow A$ and $B_n \rightarrow B$. Then for any real numbers, $c$ and $d$, we have $c \cdot A_n + d \cdot B_n \rightarrow c \cdot A + d \cdot B$.

In your example, we have $$A_n := \sqrt{n}\{\log(\frac{\bar{X}_n}{1-\bar{X}_n})-\log (\frac{p}{1-p})\}$$ $$B_n := \sqrt{n}\{\log(\frac{\bar{Y}_n}{1-\bar{Y}_n})-\log (\frac{q}{1-q})\},$$ $$A_n - B_n \rightarrow N\{0, \frac{1}{p(1-p)}\} - N\{0, \frac{1}{q(1-q)}\}$$.

You made a mistake in calculating your variance. For independent random variables $A$ and $B$, $var(A - B) = var(A) + var(B)$. So your two sequences converge to $$N\{0, \frac{1}{p(1-p)} + \frac{1}{q(1-q)}\}.$$

I see two ways to get a consistent estimate of the variance, $\frac{1}{p(1-p)} + \frac{1}{q(1-q)}$.

1. Multivariate Delta Method

Since we're assuming $\bar{X}_n$ and $\bar{Y}_n$ are $\sqrt{n}$ consistent estimators of $p$ and $q$, we know that

$$\sqrt{n} \begin{bmatrix} \bar{X}_n - p \\ \bar{Y}_n - q\ \end{bmatrix} \rightarrow N(0, \Sigma), \text{where}$$ $$\Sigma = \begin{bmatrix} p(1-p) & 0 \\ 0 & q(1-q) \\ \end{bmatrix}. $$

You can definitely estimate $\Sigma$ since it's the variance of two independent Binomial random variables. Plug $\hat{p}$ and $\hat{q}$ into $\Sigma$ to get $\hat{\Sigma}$.

Let $h(p ,q) = log(\frac{p}{1-p}) - log(\frac{p}{1-p})$ and let $\nabla h(p, q)$ be the gradient of $h$.

The Delta Method gives the asymptotic distribution and covariance matrix: $$\sqrt{n} \{h(\bar{X}_n, \bar{Y}_n) - h(p, q)\} \rightarrow N \{0, \nabla h(p, q)^T \cdot \Sigma \cdot \nabla h(p, q) \}$$

A consistent estimate of the variance is given by plugging $\hat{\Sigma}$ into the expression for the variance above.

2. Invariance Property of the Maximum Likelihood Estimate (MLE)

The Invariance Property of the MLE states that $\hat{\theta}$ is a MLE of $\theta$, then $f(\hat{\theta})$ is a MLE of $f(\theta)$. So plug $\hat{p} = \frac{1}{n} \sum_i X_i$ and $\hat{q} = \frac{1}{n} \sum_i Y_i$ into $f(p, q) = \frac{1}{p(1-p)} + \frac{1}{q(1-q)}$

Eli
  • 1,682
  • 10
  • 24
  • Thanks! For the consistent estimate part, can I say that since $\bar{X}_n\to p$ and $\bar{Y}_n\to q$ in probability? This implies $\bar{X}_n$ and $\bar{Y}_n$ are consistent estimator. Then plug it into $f(p,q)$ to get the consistent estimator? – Bob Feb 03 '22 at 23:21
  • Yes to your first question, but I wasn’t thinking of it that way though. The sample mean is maximum likelihood estimate of a $p$ in a $Binom(n, p)$ so you know it’s consistent. – Eli Feb 04 '22 at 04:08
  • @Bob: Try to solve it the same way, & post a new question if you should get stuck at some point. – Scortchi - Reinstate Monica Feb 04 '22 at 08:44
  • @Bob: Please do not ask new questions in a comment, but as a new Question! – kjetil b halvorsen Feb 04 '22 at 13:19
3

I'm answering the title question; the question in the body is unclear to me.


Using the rules for a variance operator we have:

$$\begin{align} \mathbb{V}(\sqrt{n} (\bar{X}_n-p) - \sqrt{n} (\bar{Y}_n-q)) &= \mathbb{V}(\sqrt{n} (\bar{X}_n-p)) + \mathbb{V}(\sqrt{n} (\bar{Y}_n-q)) \\[14pt] &= n \cdot \mathbb{V}(\bar{X}_n-p) + n \cdot \mathbb{V}(\bar{Y}_n-q) \\[14pt] &= n \cdot \mathbb{V}(\bar{X}_n) + n \cdot \mathbb{V}(\bar{Y}_n) \\[8pt] &= n \cdot \frac{p(1-p)}{n} + n \cdot \frac{q(1-q)}{n} \\[6pt] &= p(1-p) + q(1-q). \\[6pt] \end{align}$$

Since linear functions of normal random variables are normal random variables, you have the limiting distribution:

$$\sqrt{n} (\bar{X}_n-p) - \sqrt{n} (\bar{Y}_n-q) \overset{\text{approx}}{\sim} \text{N}(0, p(1-p) + q(1-q)).$$

Ben
  • 91,027
  • 3
  • 150
  • 376