DIF (differential item functioning) and wanted structural heterogeneous ability of test

Question

I'm calibrating an item pool which enables enables especially to measure different abilitylevels. The item pool is for an economy-achievementest. Therefore, a broad range of items is needed, which difficulty-levels scatters widely over the scale. Btw, I'm using a classical rasch model for scaling.

By testing rasch-homogenity I've a structural problem. By using the Likelihood-Ratio-Test (Andersen) you're able to detect Differential Item Functioning, which arises normaly between subgroups of the sample (like women doing better than man on some Items and so on). You may avoid that unwanted effect (item wouldn't be fair) by removing this items to fetch rasch-homogenity ... so far... The thing is, that the Testtakers have different preparatory training, which has of course has an effect on their ability level. Actually the itempool was created for this reason (diff. abilitylevel). But if I use the preparatory training as a split criterion for the LR-Test, I found a lot of Items with DIF, which works better for the persons with preparatory training (in the middle difficulty range).

This structural problem thwart now the rasch-homogenity of the item pool. But actually the itempool should cover both groups of persons (the one with / and without training).

Now I'm not sure what to do. Should I keep the Items although rasch-homogenity is impaired or should I remove them? If I remove them, how should I handle my hypothesis (H0:'there is no is no difference between person with/without training').

I guess it is a tricky question. Maybe someone have an Idea for it (or even an author or an source).

Many thanks in advance and kind regards

DIF (differential item functioning) and wanted structural heterogeneous ability of test

0 Answers0