Identification of correlated errors with multinominal probit

Question

Consider the multinational probit model where we observe $Y_i \in \{1, \dots, K + l\}$ with

$$ \begin{align*} Y_i = l \Leftrightarrow Z_l&\geq \max(Z_1,\dots Z_{K +1}\} \qquad l \in \{1, \dots, K + 1\} \\ \vec Z = (Z_1,\dots, Z_{K + 1})^\top \mid \vec X_i = \vec x &\sim N^{(K + 1)} \left(\vec\mu(\vec x), \Psi\right) \\ \vec\mu(\vec x) &= \begin{pmatrix}0 \\ B \vec x\end{pmatrix} \\ \Psi &= \begin{pmatrix}\xi & \vec 0^\top \\ \vec 0 & \Sigma\end{pmatrix} \\ \Sigma &= \begin{pmatrix} 1 & \sigma_{12} & \dots & \sigma_{1K} \\ \sigma_{21} & \sigma_{22} & \ddots & \vdots \\ \vdots &\ddots & \ddots & \sigma_{K - 1,K} \\ \sigma_{K1} &\cdots & \sigma_{K,K-1} & \sigma_{KK} \end{pmatrix}. \end{align*} $$

This is following one of the parameterisation from Bunch (1991) (https://doi.org/10.1016/0191-2615(91)90009-8) if we let $\xi\rightarrow 0^+$. My question is when we can identify the covariance parameters in $\Sigma$? What data is required to do it?

Example

Let

$$ \Phi^{(L)}(\vec m, M) = \int_{-\infty}^0\cdots\int_{-\infty}^0 \phi^{(L)}(\vec u;\vec m, M) d \vec u $$

where $\phi^{(L)}(\vec u;\vec m, M)$ is the $L$ dimensional multivariate normal distribution's density function with mean $\vec m$ and covariance matrix $M$. Then we can show that

$$ P(Y_i = l \mid \vec X_i = \vec x) = \Phi^{(K)}\left(\vec\mu(\vec x)_{(-l)} - \vec\mu(\vec x)_l; \Psi_{(-l)(-l)} + \psi_{ll}\vec 1\vec 1^\top - \vec 1\Psi_{l(-l)} - \Psi_{(-l)l}\vec 1^\top\right) $$

where a subscript with $(-l)$ means all but the $l$'th entry. The identity is easy to verify

# simulate Psi and the mean vector
set.seed(1)
K <- 3L
Sigma <- drop(rWishart(1L, 2 * K, diag(1/2/K, K)))
Psi <- matrix(0., K + 1, K + 1)
Psi[1, 1] <- 1e-14
Psi[2:(K + 1), 2:(K + 1)] <- Sigma
mu <- c(0, rnorm(K, sd = .25))

# compute marginal probabilities using simulation
library(mvtnorm)
table(apply(rmvnorm(1000000L, mu, Psi), 1L, which.max)) / 1000000L
#R>
#R>        1        2        3        4
#R> 0.024331 0.140776 0.302731 0.532162

# compute marginal probabilities using the identity
p_func <- function(l, mu)
  pmvnorm(upper = numeric(K), mean = mu[-l] - mu[l],
          sigma = Psi[-l, -l] + Psi[l, l] - outer(rep(1, K), Psi[l, -l]) -
            outer(Psi[-l, l], rep(1, K)),
          algorithm = GenzBretz(maxpts = 1e7, releps = 1e-8, abseps = -1))
sapply(1:(K + 1), p_func, mu = mu)
#R> [1] 0.02427415 0.14078773 0.30258305 0.53235506

Similar results follow with other seeds. Now, consider the three dimensional case, $K = 2$, with only intercepts and restricted to uncorrelated errors

$$ \begin{align*} B &= \begin{pmatrix} b_1 & 0 \\ 0 & b_2 \end{pmatrix} \\ \vec X_i &= (1, 1) \\ \Sigma &= \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix} = I_2 \end{align*} $$

Let $\vec b = (0, b_1, b_2)^\top$. Then the three probabilities with $\xi \rightarrow 0^+$ are

$$ \begin{align*} P(Y_i = 1) &= \Phi^{(2)}\left(\vec b_{(-1)}, I_2\right) = \Phi(-b_1)\Phi(-b_2) \\ P(Y_i = 2) &= \Phi^{(2)}\left( \begin{pmatrix}-b_1 \\ b_2 - b_1\end{pmatrix}, \begin{pmatrix} 0 & 0 \\ 0 & 1 \end{pmatrix} + \vec 1\vec 1^\top\right) \\ P(Y_i = 3) &= \Phi^{(2)}\left( \begin{pmatrix}-b_2 \\ b_1 - b_2\end{pmatrix}, \begin{pmatrix} 0 & 0 \\ 0 & 1 \end{pmatrix} + \vec 1\vec 1^\top\right). \end{align*} $$

It is now quite easy to verify that we can map to all possible probabilities using $b_1$ and $b_2$

# setup data
K <- 2
Sigma <- diag(K)
Psi <- matrix(0., K + 1, K + 1)
Psi[1, 1] <- 1e-16
Psi[2:(K + 1), 2:(K + 1)] <- Sigma

# setup a grid of probabilities to try to match
pvals <- expand.grid(p2 = seq(1e-3, 1 - 1e-3, length.out = 50),
                     p3 = seq(1e-3, 1 - 1e-3, length.out = 50))
pvals <- as.matrix(subset(pvals, p2 + p3 < 1))

# assign function to optimize
logml <- function(x, pvals){
  cons <- sapply(1:3, function(z) p_func(z, mu = c(0, x)))
  -sum(pvals * log(cons))
}

# optimize to find parameters and compute KL difference
opt_out <- apply(pvals, 1L, function(pvals){
  # optimize
  pvals_full <- c(1 - sum(pvals), pvals)
  opt <- optim(numeric(2), logml, pvals = pvals_full,
               control = list(reltol = 1e-12))

  # compute KL difference
  cons <- sapply(1:3, function(z) p_func(z, mu = c(0, opt$par)))
  sum(pvals_full * log(pvals_full / cons))
})
range(opt_out) # essentially zero
#R> [1] 1.220835e-15 1.350247e-10

Thus, any sort of rotation from $\Sigma$ cannot be identified/is arbitrary. Presumably, I am guessing something similar will be true with covariates as $\vec\mu$ is linear in $\vec X_i$. This seem in contrast with this answer where the answer seems to suggest that allowing for a free $\Sigma$ is more general and identifiable after imposing restrictions like in the beginning of my question.

Identification of correlated errors with multinominal probit

Example

0 Answers0

Linked