Disclaimer: I am quite sceptic about statistical inference its tests and some of its concepts (sample size amongst them). I always recommend this article by Breiman to anyone working with data. My answer comes from a practitioner's perspective and I would never use it in a math exam (that's how I obtained my degree).
Regarding your particular problem, I believe the issue rather than a sample size problem is a experiment-design / sampling scheme one.
You should consider some things before deciding your sampling.
- How costly is for you to gather data
- Does your process change over time
- Do you know beforehand any variable affecting the production
I work in the manufacturing sector and the most important constraint faced is often the first one. If you were able to collect your variable of interest you do not need statistics at all.
The second point is relevant in factories as machines and their components tend to deteriorate over time. This means sampling today is not the same as sampling next week (the samples do not come from the same population).
The third point is more blurry, but there are usually some production variables that are known in advance and should be kept in account when designing the sampling. For instance, I recall finding a 5% increase in the active power of a machine when a scrap melting oven was working at the same time.
All that said, if your process changes over time, I think the best thing to do is to sample periodically as randomly as you can. Once you set a sampling schedule, a simple bayesian Beta-Bernoulli model can be quite efficient handling incremental data and taking advantage of prior samples.
I post a simple implementation of the Beta-Bernoulli in R
as an example. There is an excellent post on bayesian statistics using a Beta-Binomial here.
# Process performance at the beginning
init.process<-0.99
# deterioraion factor
deterioration.factor<-1.01
# alpha and beta priors
alpha<-0.5
beta<-0.5
# forgetting rate, included so the new observations
# have more weight in the posterior distribution
lambda<-1.5
# for reproducibility
set.seed(13)
# use samples when they are collected at time t
for(t in 0:10){
# real parameter
p<-init.process/deterioration.factor^t
q<-1-p
#sample 100 widgets
sampt<-sample(2,100,replace=TRUE,prob=c(q,p))-1L
# get posterior alpha and beta
alpha<-alpha+sum(sampt==1L)
beta<-beta+sum(sampt==0L)
# get the 5th percentile
perc<-qbeta(0.05,alpha,beta)
print(paste("The real parameter is",round(p,3)))
print(paste("Your parameter is larger than",round(perc,3),
"with 95% probability"))
# reduce weight from old observations
alpha<-alpha/lambda
beta<-beta/lambda
}