2

I have following values from an experiment:

   A   B
X 64  20
Y 62  11

I subject this to Chi-square test using following code:

from scipy.stats import chisquare
pval = chisquare([a,b], [c,d])[1]
print(pval)

Output is:

0.006421123271652286

This seems clearly significant (<0.05).

I now calculate odds ratio and its confidence intervals with above data using following formulae:

OR = (a*d) / (b*c)
se = math.sqrt((1/a)+(1/b)+(1/c)+(1/d))
lower  = np.exp(math.log(OR) - 1.96*se)
upper  = np.exp(math.log(OR) + 1.96*se)
print(OR, lower, upper)

Output is:

0.5677  0.2514   1.2819

( The confidence intervals agree with online calculator at https://select-statistics.co.uk/calculators/confidence-interval-calculator-odds-ratio/ )

So, confidence interval is very much overlapping 1, while I expected it to be on one side of 1 since P value was clearly significant.

I have following questions:

  1. Where is the error and how can I correct it?

  2. Would you call these data as statistically significant?

  3. What test can I use so that P value and confidence intervals match?

Thanks for your help.

rnso
  • 8,893
  • 14
  • 50
  • 94
  • Related (even possible duplicate): [Why do my p-values differ between logistic regression output, chi-squared test, and the confidence interval for the OR?](https://stats.stackexchange.com/q/144603/7290) – gung - Reinstate Monica Nov 05 '20 at 18:01
  • There difference is very borderline, here it seems to be marked. Would you call these data as statistically significant? – rnso Nov 05 '20 at 18:05
  • 1
    Are you sure you are applying the software correctly? The p-value should be around 24%. Isn't `chisquare` supposed to be a "one-way chi square test"? – whuber Nov 05 '20 at 18:24
  • You were right about underlying incorrect use of software and about real P value (as shown in accepted answer below). – rnso Nov 06 '20 at 07:32

1 Answers1

6

The chisquare function tests given counts against expected counts. That's not what you intend. You're testing a contingency table. Use the chi2_contingency function with takes a table (nested array) as input and returns:

chi2: float 
    The test statistic.

p: float
    The p-value of the test

dof: int
    Degrees of freedom

expected: ndarray, same shape as observed
    The expected frequencies, based on the marginal sums of the table.

(https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chi2_contingency.html)

The correct analysis gives a p-value of 0.24:

>>> from scipy.stats import chi2_contingency
>>> chi2_contingency([[64,20],[62,11]])
(1.3719790003937939, 0.24147215490328422, 1, array([[ 67.41401274,  16.58598726],
       [ 58.58598726,  14.41401274]]))
>>> 
abstrusiosity
  • 876
  • 1
  • 11