2

I'm currently planning a study to prove my software is comparable to humans at some task. The setup is that I take an input $I$, and have both a human and a computer modify $I$ in th same way, producing $M_H$ and $M_C$. Later, I will present various experiment participants both $M_C$ and $M_H$, and ask them which is better according to some qualitative metric, or if they're the same. I then repeat this for many possible inputs.

At the end of the day, my data is: I have a set of labels representing various inputs, and, for each label, I have a set of judgments about whether a participant preferred the human or the computer output. My goal is to show that, for a random input, the probability of a random judge preferring the computer output is at least $0.5$ (or some slightly lower number).

How do I analyze this? The closest thing I found in my search was stuff on "ipsative" data, but I didn't find anything on how to analyze it, and those tests seem to all be trying to measure the human subject instead of the two choices in the question.

James Koppel
  • 241
  • 1
  • 6
  • I am not sure if I understood question correctly, so I am not sending this as an answer, but sounds like a work for chi squared test: "Pearson's chi-squared test (χ2) is a statistical test applied to sets of categorical data to evaluate how likely it is that any observed difference between the sets arose by chance." – rep_ho Oct 01 '16 at 13:56
  • But....I'm trying to show that they're the same! – James Koppel Oct 01 '16 at 14:10
  • Yes. Your null hypothesis is that they are the same and alternative hypothesis is that they are different, you will do χ2 to test the difference, found insignificant results and conclude, that you don't have enough evidence to say that the results are indeed different. – rep_ho Oct 01 '16 at 14:27
  • Uhhh.....no. The easiest way to do that is just to collect very few samples. Proving they're the same is different from failing to prove they're different. – James Koppel Oct 01 '16 at 19:14

1 Answers1

2

What you are looking for is a test of equivalence or test of non inferiority. you don't have to get evidence against H0 stating your groups are the same, but your H0 can be that the difference in your groups is at least 10 percent (for example), and then you are trying to disprove this hypothesis. Nice introduction is in this chapter http://ncss.wpengine.netdna-cdn.com/wp-content/themes/ncss/pdf/Procedures/PASS/Non-Inferiority_Tests_for_Two_Means_using_Differences.pdf and some other resources in this post How to test hypothesis of no group differences?

rep_ho
  • 6,036
  • 1
  • 22
  • 44