5

I want to fit a 1-parameter IRT model on a questionaire with 15 questions and about six million people. Considering the large N, standard errors aren't essential. It looks like the IRT world is sort of dizzying, and I was wondering if there were any tips as for what the proper software approach would be.

Gala
  • 8,323
  • 2
  • 28
  • 42
DavidShor
  • 1,281
  • 1
  • 11
  • 18

1 Answers1

3

It is possible to do in R with mirt, though it's still going to be a little slow (maybe 5-10 minutes) and you'll need a good amount of RAM (16+ GB...but with 6 million cases this should be expected). I just tested this and it seems to run okay:

library(mirt)
dat <- matrix(sample(0:1, 6e6 * 15, TRUE), ncol = 15)
mod <- mirt(dat, 1, itemtype = 'Rasch', D = 1, calcNull = FALSE)
Iteration: 4, Log-Lik: -64486844, Max-Change: 0e-04

If standard errors aren't interesting, nor is the comparison to the NULL model, then the above options should be fine. Since a large part of the problem is with sorting the data, there is a large = TRUE argument that can be passed so that sorting isn't repeated in each run.

philchalmers
  • 2,641
  • 1
  • 14
  • 22
  • This package is incredibly fast and exactly what I was looking for, thank you! If I can ask a dumb question though, how do I actually extract the estimated thetas out of the mirt object? – DavidShor May 07 '13 at 19:49
  • They aren't estimated as a consequence (mirt isn't a strict Rasch framework), you have to use the fscores() function, and perhaps change the method = '' argument if you are more interested in maximum-likelihood estimates rather than the default EAP (expected a posteriori). – philchalmers May 07 '13 at 19:52
  • Thanks for the example. I will go to bed less stupid tonight! – doug.numbers May 08 '13 at 23:48
  • Last question: Despite the fact that I have so much data, the scores produced aren't actually very robust to the estimation method. Any word on which technique is most appropriate and when? – DavidShor May 11 '13 at 20:13
  • The scores produced are more a function of the amount of items that you have, rather than the number of subjects. The larger number of subjects will only give very accurate item parameter estimates. Hard to say what the most appropriate method would be since it would be research question dependent, but I think ML estimates are usually fine if the Bayesian 'shrinkage' is something something that you don't want to worry about. – philchalmers May 12 '13 at 02:27