1

I'm comparing two different populations with unequal variance and non normal distributions.

For sample #1 I'm drawing a random sample size of $n=30$ from a population of 200. For sample #2 I'm drawing a random sample size of $n=30$ from a population of 840. Since the two sample sizes have unequal variance, I'm using Welche's t test (unequal variance test).

Is it a problem that my sample #1 population is only 200? Should I just use $n=20$ instead? I read that a random sample of $n=20$ should come from a population ten times that size.

I also read that normal distribution is irrelevant when the sample size is $n>20$ (but I also read this for $n>30$) so I'm a bit lost.

What's a good rule of thumb for population size when I'm randomly drawing samples of $n=30$? I'm doing all of this in python.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
  • 2
    Welcome to CV. Ignore what you have read--the stuff you quote is, to put it mildly, hardly even good rules of thumb. What matters most is [how skew the population distributions are.](https://stats.stackexchange.com/questions/69898) What might be of greater concern is whether you truly have finite populations: the vast majority of questions we receive about finite populations turn out to be based on misconceptions. To avoid the possibility of that error, could you give us a brief explanation of what these populations are and how you have sampled them? – whuber May 08 '19 at 22:00
  • 1
    Wow, thanks for your response whuber! Essentially, I joined 3 datasets (health code violations, yelp data and financial data). Inner joined each set and randomly had 906 restaurants match. So I have a population of 906 restaurants in Las Vegas. I parsed the reviews for the term "hole in the wall" (hitw) and divided 906 restaurants into two samples. hitw (201 samples) and non-hitw (705 samples). I'm using bootstrapping to randomly draw and replace 30 samples at a time and I'm taking the average p value that I've calculated. I am hypothesizing "are hitw restaurants really less clean." – Khalid Rahman May 08 '19 at 22:29
  • 2
    "I'm using bootstrapping to randomly draw and replace 30 samples at a time and I'm taking the average p value that I've calculated" --- you should post a question asking whether this is a good approach (and what you could do instead) – Glen_b May 09 '19 at 02:33
  • Hey thanks Glen! I'll post that instead. I think you're right. – Khalid Rahman May 09 '19 at 15:03
  • I found this population calculator (linked below) which appears to calculate population size. Very cool. https://select-statistics.co.uk/calculators/sample-size-calculator-population-proportion/ – Khalid Rahman May 09 '19 at 20:48
  • @Khalid Rahman: It doesn't (cannot) calculate pop size, that is part of its inputs ... – kjetil b halvorsen Feb 03 '20 at 12:50

0 Answers0