I give my students an exam that has 8 questions on it. Each question is about a particular topic. The exam is made up on the fly by randomly selecting 1 question for each topic from a pool of questions for that particular topic. Each topic pool has 20 questions in it. I am worried that there might be a few outlier questions (i.e., they are much easier or harder than the other questions) in each pool.
I want to find out if the questions in each pool are essentially equivalent or if there is a particular question in the pool which is significantly harder or easier than the others in the pool. I have the scores for about 300 students.
Can anyone suggest a method that will allow me to for each pool rank each question by how hard it is using how the students did on the other questions in their instance of the exam?
As requested by a comment here is my current naive approach:
Lets say an exam is made of up $n$ questions. Each question is drawn from a specific pool. So, an exam is a set of elements of the form $q_{pi}$, where $p$ is the pool the question was drawn from and $i$ is the instance of the question from that pool. For notation ease, lets assume each pool has $m$ instances. So each exam, $e$, is $\{q_{pi} | 0 \le p < n, 0 \le i < m\}$ and there are $s$ students, so we have a universe of $s$ exams, $\{e_1, ..., e_s\}$. I want to make sure that the hardness of all $q_{pi}$ for a fixed $p$ and $0 \le i < m$ are roughly similar.
To determine the relative hardness of $q_{pj}$ I would look at all exams that include $q_{pj}$ and compare each students score on $q_{pj}$ with their score on the rest of the exam they took, e.g., $r = \sum_{x=0}^{x<n} q_{x*}$, where $x!=p$ and $*$ represents the instance of pool $x$ that that particular student took. Then, sum up each the difference for all students of $d_{pj} = q_{pj}-r$. Then, I will compare all the $d_{pj}$ for a particular pool. If a particular $|d_{pj}|$ is significantly larger (more than 1 std dev away?) than the others, I will modify its weight.
Suggestions? Comments?