1

18 people, who have been divided into 2 groups based on the semiquantitative expression of a protein (low vs high).

Group 1 (low protein levels) has 6 people, Group 2 (high protein levels) has 12 people. In Group 1, 5 of the 6 people developed a specific disease, whereas only 2 of 12 in Group 2 developed this disease.

I want to determine if there is a statistically significant correlation between low protein levels and the development of this specific disease.

If someone could point me towards the tests I should use, it would be great.

Jeromy Anglim
  • 42,044
  • 23
  • 146
  • 250
user25498
  • 13
  • 2

3 Answers3

2

You have a couple of options. One is to do a test (or confidence interval) for the difference of two proportions. Here is a link to a similar question with a solution in R: Change in binomial proportion confidence interval

Another option is to do a permutation test since your samples are so small.

soakley
  • 4,341
  • 3
  • 16
  • 27
1

To perform the permutation test that soakley and Jake recommended in R, use the package exactRankTests. The code in your case is:

library(exactRanktTests)

group1 <- c(rep(1, 5), rep(0, 1))
group2 <- c(rep(1, 2), rep(0, 10))

perm.test(group1, group2, exact=TRUE, conf.int=TRUE)

The $p$-value is $0.0128$ with a 95%-confidence interval ranging from $(0.25, 1.00)$. The $p$-value from prop.test is $0.0262$.

COOLSerdash
  • 25,317
  • 8
  • 73
  • 123
1

Taking a step back, why do you want to split the continuous measure of protein levels into a binary variable? Presumably you will lose a lot of information and most likely your predictive accuracy will be reduced.

Unless there are compelling reasons to the contrary, I'd treat protein level as a continuous predictor. You could perform a logistic regression predicting disease outcomes from a continuous measure of protein levels. If the protein coefficient is positive and significant you could make a claim about protein levels being a significant predictor of disease outcome. If you are concerned about the relationship being non-linear you could explore transformations of the protein level variable or explore non-linear predictive relationships (e.g., incorporating a quadratic effect).

Jeromy Anglim
  • 42,044
  • 23
  • 146
  • 250