I have a data set of some 10,000 observations, derived from 64 categories. The mean of most of the categories is similar to the mean of the entire data set, but some are rather different.
If I understand correctly, I can apply a t-test to determine if the differences are significant, but the size of some of the groups is very small compared to the overall size (< 50 observations) which, again iiuc, reduces the power of the t-test to determine an accurate p-value.
One source suggests a solution to this is to "monte carlo the data", which I interpret as multiply sampling the 10k data set excluding the data under test to build a similarly-sized synthetic sample and running the t-test against that. I presume I then take the mean of those p-values to determine a more accurate p-value. Is this the correct approach?
If so, there is also the question of qualifying the variance equivilance, or otherwise, of the data. Should I run Levene's test on the real sample v the synthetic sample and feed that result into the t-test?
(I have read How should one interpret the comparison of means from different sample sizes?)