0

Edited: In relation with our research, the respondents were deaf employees but due to the difficulty to have a large sample size, we're hoping to target that the respondents will be n=40 to reasonably say that it is a good estimate of statistical inference of the data for prediction using a simple linear regression analysis. And so, how many should be the participants to say that it is a good sample to have a good prediction? -like to have a good correlation, from my knowledge, (is it true? that n= 30 is the smallest good sample size) then, what about for a simple linear regression?

M. Cris
  • 300
  • 2
  • 8
A Awoo
  • 1
  • 4

2 Answers2

1

The answer to your question is a hotly debated topic, and could depend on the underlying distribution of the variable you are trying to sample.

If the variable has an underlying density with a tight, high peak and never any outliers, then a smaller sample size will likely be able to capture the characteristics of the variable in question. Think of a mean-zero Gaussian with very very low variance. However, if you have a heavy-tailed distribution with low-probability, high-impact events, you will need lots more samples to capture that. S&P500 day-over-day changes are an example of this.

Here is a link to a "sample size calculator," in which I do not have much confidence, as I have not tested it and I do not believe in formulas for calculating a sample size. You may also want to look at this thread, which details many good answers to your question, some by analyzing $R^2$ values, some by rules-of-thumb, and other analytical means.

ERT
  • 1,265
  • 3
  • 15
1

If you are using R package, you can run a power analysis. Also the "30 size" method is an oversimplification of the classical CLT and you should not use it randomly.

M. Cris
  • 300
  • 2
  • 8
  • I think "oversimplification" is too generous. No version of the CLT has anything to do with n=30. There's no mention of anything but $n\to\infty$ in any CLT. – Glen_b Jul 26 '18 at 02:06