7

I have large comparison data in form

In a pairwise comparison data each data point compares two alternatives.
For instance:
A > B (A is preferred to B, A and B are classes, not numbers)
A > B
B > A
B > C
A > C
etc ...

In short we can write numbers of preferences in data set:
A vs B 999:1
X vs A 500:500
X vs B 500:500

Bradley-Terry model models pairwise preference by assigning on parameter to each class:

$ P(A > B\; |\; \vec{w} ) = \frac{w_A}{w_A + w_B} $

Parameters can be estimated from data through maximal likelihood.

I'm looking for extension of Bradley-Terry model (or a completely new model) that would be able to model situations like the one above. I.e. A is always strongly preferred to B: $P(A>B) = 0.999$ but $ P(X<A) = P(X<B) = 0.5 $.

B-T model cannot represent that. Do you have any ideas how to create better model ?

PS The model will be applied to a data of size $10^8$ so it would be good to have simple maximal likelihood algorithm.

Łukasz Lew
  • 1,312
  • 2
  • 14
  • 24
  • 1
    How many classes are there? If there are substantially fewer than $10^4$, then why not estimate all the probabilities directly from the data as $\hat{P}(A>B)$ = # cases where A is preferred to B / # cases where A is compared to B? (This is ML and it's as simple as they get.) – whuber Jan 11 '11 at 20:56

1 Answers1

2

The difficulty with specifying a model to resolve this problem is one of how to interpret the strength of preference information. Does A vs B 999:1 mean that 999 times out of 1000 people will prefer A, or, does it mean that a person prefers A by a large amount relative to B?

If we take the interpretation that the data means that A is preferred to B 999 out of 1000, then you can fit a Bradley-Terry(-Luce) model, but most people these days would instead estimate a logit model, or a generalization thereof, as their "choice model":

$ P(A > B\; |\; \vec{w} ) = \frac{e^w_A}{e^w_A + e^w_B} $

Maximum likelihood estimation with large data sets and aggregate data is straightforward as the sample size enters the log-likelihood as a weight for each pair. Complication arises if one wants to take into account how people differ in their preferences, in which case some type of mixture is required (see Train, Kenneth E. (2009), Discrete Choice Methods with Simulation (Second ed.). Cambridge: Cambridge University Press.).

It is not unknown for researchers to take this frequency interpretation when modeling even if it is believed that it is not an accurate characterization of the problem. This is because it is not a straightforward exercise to specify a good model which deals with degree, as you then have to find some way of working out what, precisely, 999:1 means and how it relates to 998:2 and so on. There are lots of different models that have been developed for this problem (e.g., models designed for constant sum dependent variables, models designed to predict probabilities, diffusion models). It is impossible to say with any exactitude which model is most appropriate as it really depends upon the appropriateness of the inherent assumptions to your data and how well it fits your data.

Tim
  • 3,255
  • 14
  • 24