Here's my scenario. I take 1000 samples each from my control and treatment. I note that the difference in means is not statistically significant (stat-sig) under the null hypothesis testing. I then add 1000 more samples each to control and treatment, and test the entire batch of 2000 samples for stat-sig (I am using paired t-test). I keep doing this for exactly 5 times (till I get 5000 samples each for control and treatment), and I stop earlier if my p-value is stat sig. Is this p-hacking? If so, is there anything like the Benjamini Hochberg correction that I should apply every time I calculate the p-value for an augmented batch (the original batch + 1000 new samples)
More details on why I don't simply start with 5000 samples or use power analysis to determine my sample size
- After sampling, I need to have a manual evaluation of each of the sample (I can't get into details of what it is, but this is something that cannot be automated). This is expensive. So if I can get away with sampling a smaller batch size, then I save money and time. This is my main motivation to see if there is a stat-sig difference between control and treatment for a smaller batch size, before I add more samples.
- I don't know the effect size and cannot estimate this. So trying to estimate a sample size seems to be a no go.