0

I've been trying to figure out the appropriate statistical approach for the following problem from work (simplified here):

I've got 5 manufacturers of a drug and each manufacturer makes their own testing kit. Each kit is positively biased for their manufacturer's drug. Results are binary. Error is unknown with each test.

I can run as many tests as I want, but only max two tests for each recipient (the manufacturer of the treatment and one other test).

How do I determine which manufacturer is the best? Or at a minimum... which manufacturer is different from the others? I've considered some kind of item total correlation, Kruskal-Wallis comparisons, etc. I assume there's some binomial logistic approach too... but I'm out of my depth here.

So data would look like:

Treatment:        Manu-A, Manu-B, Manu-C, Manu-D, Manu-E
User 1 (A-Treat)       
   Test A          1                            
   Test B          1                            
User 2 (B-Treat)       
   Test B                   1             
   Test C                   1  
User 3 (C-Treat)        
   Test C                           0             
   Test A                           0            

Or flattened:

UID, Treatment, TestA, TestB, TestC, TestD, TestE
1    A          1      1      --     --     --   
2    B          --     1      1      --     --   
3    C          0      --     0      --     --   
4    D          --     1      --     0      --   
5    E          --     --     1      --     1
6    A          0      --     --     0      --     

Any idea on what kind of approach I should use?

1 Answers1

0

I think it will help to start with a statistical model where the values of interest (true success probabilities) exist as parameters of the model. Assume we have 2 Bernoulli random variables $X$ for the true drug sample result and $Y$ for the given test results. Both of these variables are conditioned on the manufacturer and $Y$ is condition on $X$ such that each observation $y_i$ is accompanied by an unobserved $x_i$ and together have a joint distrubution of $P(Y|X, c_1,c_2)P(X|c_1)$ where $c_1,c_2$ are labels for the drug and test makers respectively. We then have a model with between 25-55 parameters (the probabilities) and $N$ latent variables that we can fit using maximum likelihood (expectation-maximization) or a Bayesian approach. In order to deal with positive bias, we use constraints (or priors) such that $P(Y|X,c_1=c_2)>P(Y|X,c_1\neq c_2)$.

deasmhumnha
  • 849
  • 4
  • 9