MLE for the conditional logit model seems to have awful precision?

Question

I'm simulating the data underlying a conditional logit model then doing my own MLE estimation using optim. However, even with an unreasonably large amount of data, the estimator seems to have awful precision. Is this expected, or I am making mistakes in my implementation?

Conditional Logit Model: The conditional logit is a discrete choice model, where a person chooses between multiple alternatives based on the alternatives' covariates. We want to estimate the person's preference for these covariates.

The person chooses the alternative that gives him the highest utility. For example, with 3 choices, the 3 utility functions are:

$$ U_{i1} = \alpha' X_1 + \varepsilon_1 \\ U_{i2} = \alpha' X_2 + \varepsilon_2 \\ U_{i3} = \alpha' X_3 + \varepsilon_3 $$

where $\alpha$ is the preference parameters to be estimated, and $X$ is the alternative's covariates that we observe.

In the conditional logit model, we assume that the random utility components (i.e. the $\varepsilon$) follows a standard Gumbel distribution. With that assumption, the probability of choosing alternative $j$ is:

$$ \Pr(\text{choosing }j) = \Pr(j\text{ offers highest utility}) = \frac{\exp(\alpha'X_j)}{\sum_{\text{all }j} \exp(\alpha'X_j)} $$

The log likelihood is then

$$ LL = \sum_i \left( \alpha'X_{c_i} - \log\left( \sum_j \exp(\alpha' X_j)\right) \right) $$

where $c_i \in \{1, \dots, j\}$ indidates the choice that $i$ makes.

Implementation:

I simulate the data using known preference parameters and estimate them using MLE as below.

num_choices <- 100
xx <- mvtnorm::rmvnorm(num_choices, sigma = diag(2)) # Alternatives' covariates
alpha <- c(0.05, 0.1) # Preference parameters, to be estimated

# Matrix of utilities
mat <- matrix(NA, nrow = 1000, ncol = num_choices)
for (i in 1:num_choices) {
  mat[, i] <- sum(alpha * xx[i, ]) + evd::rgumbel(1000)
}

# Choosing the alternative that gives the highest utility
y <- max.col(mat)

# negative log likelihood
cl_nllik <- function(alpha) {
  xa <- c(xx %*% alpha)
  lse_xa <- log(sum(exp(xa)))
  - sum( xa[y] - lse_xa )
}

# MLE estimate -- does NOT produce the true alpha values!
optim(c(0, 0), cl_nllik)

Is there a reason the parameter $\alpha $ doesn't vary across categories- $Pr (\text {i chooses j})= \frac {\exp(\alpha_j^TX_{ij})}{\sum_g \exp(\alpha_g^TX_{ig})} $ — probabilityislogic, Feb 10 '18 at 03:47
Because in [the conditional logit model](http://data.princeton.edu/wws509/notes/c6s3.html) the covariates $X_j$ only varies across choices $j$, not across choosers $i$. If the covariates vary both across $i$ and $j$, it's called a multinomial logit model (in the literature I'm familiar with) — Heisenberg, Feb 10 '18 at 18:35
More conceptually, consider the example when $X$ is the price of the alternatives. Then customers' preference for price should be the same across alternatives, hence $\alpha$ doesn't vary across categories. I don't see a theoretical reason why price should have a different effect on customer's utilities depending on what alternatives are being considered. — Heisenberg, Feb 10 '18 at 19:14

score 1 · Answer 1 · answered May 09 '19 at 16:29

This is a really old question but I came across it and am going to answer based on my experience---I also noticed conditional logit to have somewhat poor performance for me.

I was working on a conditional logit problem and three different softwares (stata/matlab/fortran) were giving me different coefficients. The stata code was the canned clogit command, whereas in matlab and fortran I had coded up the maximum likelihood myself.

First, and most importantly, it is important to have sufficient variation in your data. With stronger variation in choices I noticed more stability.

Second, the optimization method may matter. With a conditional logit, the gradient has a closed form. Therefore the canned routine in stata I believe runs a newton-raphson minimization using the gradient. However, if you write your own code and run optim or some other solver, the optimization method uses approximate derivatives. This approximation may not less accurate and thus lead to issues with finding the proper solution. Since calculating the true gradient is relatively straightforward, it is best to take advantage of this and run a gradient-based method such as newton-raphson.

Lastly with optimization, you need to make sure the tolerance is small enough. If tolerance is too large, the optimizer may stop before arriving at the solution.

MLE for the conditional logit model seems to have awful precision?

1 Answers1