Calculating a Confidence Interval for a Proportion for a Sample of Different Size

Question

I'm interested in a (preferably analytic) solution or approximation to the following problem:

Let $s_1$ be a sample from an unknown distribution of size $N_1$ and with proportion of successes $p_1$. Let $s_2$ be an independent sample from the same distribution of size $N_2$ with proportion $p_2$. Given $N_1$, $p_1$, and $N_2$, can we calculate a Confidence Interval for $p_2$?

I would love a general purpose analytic solution if anyone has one, but for simplicity I am fine with considering the case where both $s_1$ and $s_2$ satisfy the conditions for their sampling distributions to be approximated by a Gaussian distribution.

Now, my approaches to solving this have led me to 2 options:

Find upper and lower bounds for the confidence interval of $p$ (the population proportion of "successes"), and plug these back into confidence intervals for $p_2$ using the sampling distribution for $p$ with size $N_2$. Then take the max and min of those intervals. Or
Treat $p$ as a normally distributed random variable with $\mu=p_1$ and $\sigma=\sqrt{\frac{p_1(1-p_1)}{N_1}}$, which would imply the CDF for $p_2$ can be found by:

$CDF(x) = \int_0^1{NormPDF(\frac{y-p_1}{\sqrt{\frac{p_1(1-p_1)}{N_1}}})\cdot NormCDF(\frac{x-y}{\sqrt{\frac{y(1-y)}{N_2}}})dy}$

where $NormPDF$ and $NormCDF$ are the PDF and CDF functions for the standard normal distribution.

The problem with 1 is that the interval found will be much wider than I would ideally want (this is what I am currently using in my equations). The problem with 2 is that I have no idea how to convert this into an analytic function (through approximation with $erf$ since I assume there is no analytic solution to the integral). My goal is to graph these intervals as a function of $p_1$ in desmos along with other sampling/prediction strategies for comparison - this is why I would really like an analytic solution or approximation.

If someone can solve this, or point me in the right direction to finding a solution that would be greatly appreciated!

Are you look for something like this: http://www.sthda.com/english/wiki/two-proportions-z-test-in-r ? See historic source: https://www.ncbi.nlm.nih.gov/pubmed/19978918 — Sextus Empiricus, Feb 12 '19 at 14:34
I think it's a *prediction interval* you're after - confidence intervals express uncertainty about parameters. See [Prediction interval for binomial random variable](https://stats.stackexchange.com/q/255570/17230). If error functions count as analytic for your purposes, then you should be able to get a fairly straightforward approximate result assuming normality throughout by working out the variance of $p_2-p_1$ & finding the pivotal quantity you can use to construct prediction intervals for $p_2$. — Scortchi - Reinstate Monica, Feb 12 '19 at 15:59
@Scortchi sounds like that answer solves the question I was trying to ask. You're right, I'm definitely looking for a prediction interval here, not a confidence interval. My mistake — rsmith49, Feb 12 '19 at 19:03

Calculating a Confidence Interval for a Proportion for a Sample of Different Size

0 Answers0