Based on my data generated by a complex process and the problem below detailed, I have tried various approaches, to no avail.
I am trying to answer one or more of the following questions:
a) Has someone solved my problem already? Are there other clever approaches I could use instead (perhaps something like PLS or CCA)?
b) What is the right literature to look for previous work addressing my problem?
c) If this problem is novel (which I personally doubt), what is the right approach to solve it?
d) In my proposed approach below described, is there something obvious I am missing?
Here is my problem:
For $N$ individuals, with $M$ associated covariates, I observe outcomes in $J$ possible classes; thus my design matrix $X$ is of dimension $M$ x $N$ and my response matrix $Y$ is of dimension $N \times J$. For a given class $j$ there is a latent variable $U_{i,j}$ s.t. when $U_{i,j} < 0$, $Y_{i,j}=0$ and $Y_{i,j}=1$, otherwise . Up to this point, if $U \sim MVN(\beta X, \Sigma)$, I have described a standard multivariate probit regression scenario. However, if a given $U_{i,j}$ is greater than all other classes $U_{i,-j}$ and greater than some constant $c_1$, then $Y_{i,j} =t$ if $c_{t-1} < U_{i,j} < c_t$ where $t\in \{1,2,...,T-1\}$; $Y_{i,j}=T$ if $U_{i,j} > c_{T-1}$. Here the cutpoints $c_t$ are unknown. On its own, the latter half corresponds to an ordinal regression problem. However, together this problem seems to be more complex than either probit or ordinal regression.
Below is the R code to generate data from this process.
The values of $U$ are generated directly without specifying a linear relationship with $X$:
library(MASS) # for mvrnorm
N <- 1e3 # num. of observations
J <- 12 # num. of classes
cuts <- c(1.5, 2, 2.25, 2.5, 2.7, 3) # T=6
S <- rWishart(n=1, Sigma=diag(J), df=20) # sample cov. mat.
U <- mvrnorm(n=N, mu=rep(0, J), Sigma=S[,,1]) # generate U
Y <- apply(U, 1, function(u) { # generate Y from U
u <- as.numeric(scale(u)) # s.d. is 1 for identifiability
y <- ifelse(u < 0, 0, 1) # U's extreme enough are observed
cond <- u > cuts[1] & u == max(u) # most extreme U's are ranked
if(any(cond))
y[which(cond)] <- findInterval(u[which(cond)], cuts)
return(y)
}
)
In my case, $N$ is approximately $3500$, $J=12$ and $M=30$, and $T=6$. The outcomes of the classes are correlated and the individuals are hierarchically related.
Here are the solutions I have tried so far:
Initially, I thought I could use JAGS to solve this problem. Before implementing the full model, I tried a hierarchical multivariate probit regression. However, JAGS seemed to take an inordinately long time to sample from the posterior.
Thus, I have since been searching through the simulated likelihood literature e.g. GHK, but found no work reproducing the model described above. Specifically, I am exploring approaches such as approximate Bayesian computation (ABC), indirect inference, and simulated method of moments.
Any help would therefore be much appreciated.