0

Just like the author of this post (at a time), I am quite new to statistics. So, I am not sure if I am using the right words here, yet I believe that our questions are pretty different (despite the headlines being worded almost the same).

A (mini-)course on Chemistry took place at my school. It consisted of 4 students only. The authors of the course carried out am exam with only a half of students getting through (that is, two people). They now want to show that a class of 4 people is statistically insignificant, that is nothing is furnishing solid evidence that only a half of the students in a bigger class would be able to handle the course.

I now wonder whether there is some mathematical model which would numerically prove this insignificance. So, is there such a thing?

  • 1
    This Wikpedia entry might be a good start https://en.wikipedia.org/wiki/Sample_size_determination – jpmuc Dec 30 '20 at 11:00

1 Answers1

1

'Significance' in statistics means 'low probability of obtaining results at least as extreme as actually obtained'. You need to make some assumptions before you can proceed. Here is one example:

You can require that in a valid course a certain fraction of students needs to pass, say 90%. You can also fix the so-called ’significance level’ $\alpha$ to some low probability value, say 5%. In addition, you can make the assumption that between repeated courses the number of students that pass is distributed according to the binomial distribution with the above required probability $p$ (90% = 0.9). I.e. you model passing of a student by an flip of an 'unfair' coin.

You can then use the CDF of the binomial distribution to show that, for a class of 4 students, the probability of only two (or less) of them passing is greater than 5%, even though each student has individually a 90% chance of passing.

In Python (this code is adopted from GeeksforGeeks):

from scipy.stats import binom 
# setting the values 
# of n and p 
n = 4
p = 0.9
# defining the list of k values 
k_values = list(range(n + 1)) 
# obtaining the mean and variance 
mean, var = binom.stats(n, p) 
# list of CDF values 
dist = [binom.cdf(k, n, p) for k in k_values ] 
# printing the table 
print("k\tCDF(k)") 
for i in range(n + 1): 
    print(str(k_values[i]) + "\t" + str(dist[i])) 
# printing mean and variance 
print("mean = "+str(mean)) 
print("variance = "+str(var))

results in:

k   CDF(k)
0   9.999999999999991e-05
1   0.003699999999999998
2   0.05229999999999998
3   0.34389999999999993
4   1.0
mean = 3.6
variance = 0.35999999999999993
Igor F.
  • 6,004
  • 1
  • 16
  • 41
  • Your initial characterization of significance is incorrect. It is *crucial* to mention that the probability is that of the *statistic* based on *assuming the null hypothesis.* – whuber Dec 30 '20 at 14:27