For example, suppose I am concerned with two Bernoulli RVs and want to test the alternative $p_1 < p_2$ against the null $p_1 \geq p_2$. I take some samples from each distribution, and discover that the empirical proportions satisfy $\hat{p_1} < \hat{p_2}$, but the test doesn't reject the null because I didn't take enough samples to establish significance. I then gather more samples, perform the same test at the same significance level (using both my old samples and my new ones), and this time it does reject the null.
Is this "algorithm" kosher? Or can the fact that the new samples taken are conditioned on the old samples satisfying $\hat{p_1} < \hat{p_2}$ and the failure of the initial test to reject somehow affect the correctness of the procedure? If it does affect the correctness of the procedure, is there still a problem if I do not reuse the old samples and only use the new ones for the next test?