1

I'm quite new in the statistics world and the questions I will ask might be stupid.

So, I have a study in which I did the following. I have set 7 questions (7Q) which had a possible answer of true/false. At the end of the study I have asked the participants whether they knew the answers of the questions before hand (7Q'). Again, these final questions value can be true/false.

The problem that I encounter is that (1) some of the knew the questions beforehand and answered the questions rightly but (2) some of them mention they knew the questions beforehand and they still gave the wrong answer to the respective questions.

What I'm trying to get at is some sort of value for each participants' response as a whole (Q1 - Q7).

An example: USER 1 answered 5/7 questions rightly (Q1, Q2, Q3, Q4, Q5) but he said that he knew the questions Q6 and Q1 beforehand. Q1 he got it right, but Q6 got it wrongly. USER 2 answered 2/7 questions rightly (Q1, Q2) but he said that he knew the questions Q2, Q3, Q4, Q5 , Q6, Q7 beforehand.

Is there any way to equal USER1 score to USER2 score?

Thanks.

octaviandd
  • 13
  • 3
  • “Equal” the users in what sense? – Tim Nov 20 '21 at 10:48
  • Hi Tim, thanks for the questions. For example, USER1 knew beforehand the Q1, which would be no good for me as data since the initial questions would become redundant. So, his essential data would be that he answered 4/6 questions correctly (1 questions he knew beforehand and removed from the total analysis). Here, 0/6 questions would represent a WEAK score and 6/6 would represent a STRONG score and with gradation in between them. – octaviandd Nov 20 '21 at 10:57
  • In contrast, USER2 answered 2/7 questions correctly but Q2 would be removed because he knew beforehand, from the total analysis and therefore end up with 1/6 questions. This example would be fine since his score is still comparable to USER1. – octaviandd Nov 20 '21 at 10:59
  • But, let's say USER3 answers 6/7 (Q1,Q2,Q3,Q4,Q5,Q6) correctly but he knew beforehand 3 questions (Q1,Q2,Q3) and his essential data would be that that he answered Q4, Q5 and Q6 from Q4, Q5, Q6, Q7 (since he did not know these questions' answers beforehand) and his final data would be that he answered correctly 3/4 questions. However, here it seems to me that having answered 1/6 cannot be 'equalized' with answering 3/4. – octaviandd Nov 20 '21 at 11:02
  • I hope I made myself understood @Tim. – octaviandd Nov 20 '21 at 11:03

1 Answers1

0

You didn't say it explicitly, but I would assume that all the questions measured the same thing. Without this assumption, the answer to one question doesn't have to tell us anything about how the person could answer another question.

The second issue is how to treat answers to the questions where the responder knew the question in advance. The simple approach would be just to ignore such answers. Another approach would be to have a special dummy variable that would signal to the model that the responder knew the question.

You are saying that you are quite new to statistics and this is a non-trivial problem, so this would need some additional research and learn on your side. Answer on a Q&A site like this one would be far from exhaustive, but let me try.

The question falls into the area of psychometrics. We have multiple models for treating similar problems. You could check the Item Response Theory models, with Rasch model being the simplest one. You would model the response to $j$-th question by $i$-the person $X_{ij}$ as

$$ P(X_{ij} = 1) = \frac{\exp(\theta_i - \beta_j)}{1+\exp(\theta_i - \beta_j)} $$

where $\theta_i$ parameter would tell you what is the estimated "ability" for the person, based on the available data. Notice that you don't need to know answers for all the questions to be able to fit the model because it predicts the responses for the individual questions.

The model could be made more complicated by considering also the known questions, but accounting for them differently, though you should probably start with something simpler.

To fit such models you would need to use specialized software for IRT models (e.g. mirt, ltm R packages) or could treat them as generalized mixed-effects models and use appropriate software for such (e.g. lme4, nlme R packages, MixedModels.jl for Julia).

If you would like a trivial solution instead, you could just ignore the questions known by the responders in advance and average the answers to all the other questions for the final score. Unlike IRT models, this wouldn't correct for different difficulty $\beta_j$ of the individual questions, but if you could assume that they are equally difficult, this should not be that big of an issue.

Tim
  • 108,699
  • 20
  • 212
  • 390