I have a single variable that represents my population values (sample of data):
[1] 94.51 59.81 63.84 94.51 94.51 94.51 94.51 94.51 94.51 94.51
[11] 59.81 94.51 94.51 94.51 47.90 29.16 50.36 23.51 44.41 33.14
[21] 47.90 29.16 47.90 29.16 47.90 29.16 47.90 29.16 47.90 29.16
...
[331] 23.44 24.52 12.37 29.12 24.52 12.37 29.12 24.52 12.37 29.12
[341] 24.52 12.37 29.12 24.52 12.37 29.12 24.52 12.37 29.12 24.52
[351] 12.37 29.12 24.52 12.37 45.25 25.78 49.84 29.12 24.52 12.37
[361] 29.12 24.52 12.37 29.12 24.52 12.37
> summary(group$V1)
Min. 1st Qu. Median Mean 3rd Qu. Max.
6.11 35.94 59.13 62.31 86.10 111.50
> mean(group$V1)
[1] 62.30546
> sd(group$V1)
[1] 29.55491
The corresponding histogram is:
And the Shapiro test of normality:
Shapiro-Wilk normality test
data: group$V1
W = 0.9466, p-value = 3.161e-10
With the last information my conclusion is that the population is not distributed normally. The objetive is extract a sample from these population, but I have problems to apply a method to determine the sample size, because in some methods the assumption is based on the normality of population. (According with these reference) The sample is required to comparate this group with a random group with the same sample size, and the single variable to evaluate is the Bitscore.
Some references, suggestions, approaches? Thanks in advance.