I did some experiment in which tests are taken twice, pretest and posttest. I found there might be ceiling effect because the average of posttest is close to maximum test score possible. If I assume an IRT model: as the ability is getting higher (above difficulty level of the problems), the expected score distribution is skewed and never goes over the maximum score possible. So I think there might be some way to utilize the ceiling effect and skewedness of score distribution when comparing two averages of the test score.
But I wonder if there's any research done already about this subject, which could be called as "comparing two groups' averages assuming IRT model considering ceiling effect"...I am thinking of simulation research possibly with some MCMC.... Any idea or advice would be welcome also!