Conditional Logistic Regression in R

Question

As my first question addressing this matter was incomplete and unclear, I made another attempt with an improved outline. I am currently working on a project in which I have a data-set of the following form:

$id_A \quad id_M \quad x_1 \quad x_2 \quad Link$
$1\quad\quad 1\quad\quad 1 \quad \quad 3 \quad\quad 1$
$1\quad\quad 2\quad\quad 3 \quad \quad 2 \quad\quad 0$
$1\quad\quad 3\quad\quad 2 \quad \quad 2 \quad\quad 0$
$2\quad\quad 1\quad\quad 4 \quad \quad 6 \quad\quad 0$
$2\quad\quad 2\quad\quad 5 \quad \quad 7 \quad\quad 1$
$2\quad\quad 3\quad\quad 4 \quad \quad 3 \quad\quad 0$

Here, there are 3 alternatives M = 1,2,3 from which every individual A has to choose. The realized choice stored in the binary variable $Link$. I want to model this choice as a conditional logit (or perhaps probit, I would love to be able to implement both and see which one performs better) model. This means that I would like to input a matrix (or data.frame subset) for each $id_A$ containing the different interaction values $x_1,x_2$ for each $M$. I then want to predict the probabilities of choosing a certain $M$ for that $A$, based on these $x_1,x_2$. So, I want a vector containing values $P_M = P(id_A \text{ } chooses \text{ } M)$ for each $A$ that I input. These probabilities need to satisfy that they sum to 1. However, I do not know how to implement this in R, through existing packages. I could of course use maximum likelihood and write optimization code myself, but I think that dedicated packages would be more efficient.

I looked at functions like multinom from nnet, but they seem to treat the different alternatives separately, while I would like to really implement it like it is suggested in the section Conditional Likelihood on this Wikipedia page:

https://en.wikipedia.org/wiki/Conditional_logistic_regression
To clarify: I want to have that the coefficients between alternatives $M$ do NOT vary, so I am not looking at multinomial logistic regression, see below: https://en.wikipedia.org/wiki/Multinomial_logistic_regression

Just two small remarks that might be important, and that should be taken into account:

In general, I will have many alternatives $M$, not just three. They are not ordinal.
After estimating the model, I want to be able to find new probabilities, after leaving one of the alternatives $M$ out, by just leaving away the corresponding rows in my data. It is not disastrous if this is not possible, but it is highly preferable.

score 2 · Accepted Answer · edited Aug 14 '19 at 19:33

2

A multinomial logit (MNL) model [or multinomial probit (MNP) if you prefer] is what you need. In R, you could for example use the mlogit package (in stata, you would use the "clogit" command and specify the right "group" variable). The key operation is to create a variable identifying the rows of the datasets which work together (Otherwise the software might "naively" assume that each row corresponds to a separate observation). This "group or observation" variable is obtained by combining your respondent and choice occasion variables. If you use the mlogit package, then it will take care of this automatically (as long as you correctly identify the respondent and choice ID variables).

Programming the objective function for an MNL model is relatively straightforward:

MNL <- function(Beta, Data, X, Y, G) {
  V = as.matrix(Data[,X]) %*% as.vector(Beta[1:length(X)])
  num = exp(V)
  den = tapply(num, Data[,G], sum)
  prob = num[Data[,Y]==1] / den
  llik = sum(log(prob))
  return(-llik)
}

Where X corresponds to the list of predictors (product cost, etc), Y to the observed choices (a binary variable!) and G is the observation ID variable.

Then you can use optim to find a ML solution to this problem.

edited Aug 14 '19 at 19:33

Ken Williams

1,670
1
12
14

answered Feb 04 '19 at 13:38

Nicolas K

859
7
14

Thank you for your response. So, if I want to use the "mlogit" package, how should I identify the respondent and choice ID? What should my function call look like? The objective function is useful as well, thank you! – J. Dekker Feb 04 '19 at 13:43
1

You will find everything you need to know about the "mlogit" package in this document (https://pdfs.semanticscholar.org/8d40/143d338c298a4b5e6d421a730d54908c9eba.pdf ) - I haven't used this package recently but from what I remember you simply need to declare your respondent ID (in your case Id_A) and eventually a choice occasion ID, if multiple choices per respondent. – Nicolas K Feb 04 '19 at 13:58
Thanks, this really helped me out, I managed to implement what I wanted, and it seems to be even much quicker than any optimization scheme I could have quickly written! – J. Dekker Feb 04 '19 at 19:34
Nicolas, in case you are still follow this, the semantic scholar link is broken. I'd love it if you could add the paper title/authors! – Barry DeCicco Jan 10 '22 at 20:39

Conditional Logistic Regression in R

1 Answers1