0

I am trying to estimate the user ability (success or fail) using item response theory. So, I am re-implementing an algorithm from this paper but I can not understand the following line:

enter image description here

In the equation, a (discrimination parameter (slope)), b (difficulty parameter), and c (guessing parameter) are estimated using item response theory. i refers to the user, j problem and u refers to the user response.

My question is, should I calculate the Maximum likelihood for each (u,a,b,c) separately, and then calculate the average, or Maximum likelihood need to be calculated for all (u,a,b,c) together.

Thanks

1 Answers1

0

For simplicity, let's assume a very simple model

$$ X \sim \mathcal{N}(\mu, \sigma^2) $$

and you are looking for

$$ \DeclareMathOperator*{\argmax}{arg\,max} \hat \theta = \argmax_{\mu,\sigma^2} \; \prod_{i=1}^n \mathcal{N}(x_i; \mu, \sigma^2) $$

The function has two parameters $\theta = (\mu,\sigma^2)$, how would you imagine maximizing it with one of the parameters being undefined? How would you evaluate normal density with undefined mean, or undefined variance?

There are algorithms such as Expectation-Maximization algorithm (in fact, often used in IRT models) that let you estimate parameters by leaving some of them unknown by filling them with "temporary" values, but still, they maximize over all of the parameters.

See the Maximum Likelihood Estimation (MLE) in layman terms tread to learn more about maximum likelihood estimation.

Tim
  • 108,699
  • 20
  • 212
  • 390
  • Thanks, @Tim for your reply. Now, I got the idea better. Let's check if I understand it well, μ: mean, σ2: variance for the normal distribution and x_{i} refers to the user response to the item `i`. Here, you imagine a normal distribution which in my case is a Bernoulli distribution. My understanding from your answer is to estimate the `θ` for each `(u,a,b,c)` by assuming a type of distribution, and then estimate the `argmax` of `θ`. Is it correct? – user3233712 Jun 16 '17 at 17:29
  • @user3233712 no. The distribution is the likelihood function - it is distribution assumed for your *data*. u,a,b,c are parameters of your model and you maximise likelihood by finding such combination of them that leads to greatest likelihood. – Tim Jun 17 '17 at 05:14