I was thinking about how to reply to people who keep correcting people about the distinction between poisonous/toxic or venomous/toxic/poisonous. (Never mind that I get this wrong even five seconds after just having read the respective definitions because I just don't care.)
This got me to the following problem:
Let's assume there is a 50% chance to get the answer right merely by chance (so assuming a binary question).
If I find, for example, that 70% of a sample got it right, how do I know how many got it right by chance and how many actually know what they are talking about?
*I had intended to argue that the vast majority of people don't know the distinction between those terms anyway, so it does not matter how it's used wrong in an Internet forum not for subject experts because the readers don't know the difference anyway. To support my argument I wanted to show that even if it is used correctly a lot of times most of it is due to chance, and that I can prove mathematically that very few people know what each those words mean given a certain percentage of correct usages.*
Here are my thoughts:
$R$ — number of people who get it right because they actually know
$R_c$ — number of people who get it right by chance
$W_c$ — number of people who get it wrong by chance
$x$ — number of people who get it right in a sample
- The sum of all the above is 100%
- The number of people who get it right in a sample is the sum of those who get it right because they know and those who get it right by chance
- Given equal chances to get it right or wrong, the number of people who get it right by chance equals those who get it wrong by chance
$\begin{align} 100\% &= R + R_c + W_c \\ x\% &= R + R_c \\ R_c &= W_c \end{align}$
So I tried an example, 70% got it right — can I definitely conclude how many (or how few!) actually know and used the correct words deliberately?
$\begin{align} 70\% &= R + R_c \\ 100\% &= 70\% + W_c \\ \end{align}$
Therefore $W_c = 30\%$, and it follows that $R_c = 30\%$
Which means that $100\% = R + 30\% + 30\%$, so $R = 40\%$.
For an observed $x = 80\%$ I get $R = 60\%$.
For an observed $x = 20\%$ I get $R = -60\%$ (okay, because lower limit is 50% due to chance).
Which means that under the above conditions (two words, equal chances to use them correctly by chance), if I see that 70% of the time those words are used correctly 40% of those times people knew how to used them and 30% of those times the correct usage was by chance?
That seems too high to me. After all, if I see the correct usage 50% of times it could all be explained by mere chance. So I might not just have made a mistake... or not?
$\begin{align} 50\% &= R + R_c \\ 100\% &= 50\% + W_c \\ \end{align}$
Therefore $W_c = 50\%$, and it follows that $R_c = 50\%$
Which means that $100\% = R + 50\% + 50\%$, so $R = 0\%$.
Hmm. I think I might try to get a curve with x being the number of people who "know" and y being the number of correct usages I can observe later today.
EDIT
I have provided a specific method. I would just like to know if it is correct and if not what I overlooked. I am sure there are many much more complicated methods for much larger versions of this problem. But I want to develop MY OWN method for MY problem (such is learning), not just learn somebody else's solution. So my question is about MY solution, given here.
What, if anything, did I(!) do wrong? Or is it correct? (for this specific problem)
For example, let's assume I find out that the links given to me in comments lead to a different answer. Then I still don't know what I did wrong!