18

I am using spotfire (S++) for statistical analysis in my project and I have to run multinomial logistic regression for a large data set. I know best algorithm would have been mlogit, but unfortunately that is not available in s++. However, I have an option of using glm algorithm for this regression. I want to clarify two things here:

1.Is my understanding correct that glm also can be used to run Multinomial Logistic Regression?

  1. If answer to previous question is yes, then what parameters should be used in glm algo?

Thanks,

Raghvendra
  • 297
  • 2
  • 5
  • 10

2 Answers2

10

Yes, with a Poisson GLM (log linear model) you can fit multinomial models. Hence multinomial logistic or log linear Poisson models are equivalent.

You need to see random counts $y_{ij}$ as Poisson random variables with means $μ_{ij}$ and specify the following the following log-linear model

$\log(μ_{ij}) = o + p_i + c_j + x_iβ_j$

To get a multinomial logit model the parameters are:

A parameter $p_i$ for each multinomial observation, for example individuals or group. This assures exact reproduction of the multinomial denominators and actually establishes the equivalence of Poisson and multinomial model. They are fixed in the multinomial likelihood, but random in the Poisson likelihood.

A parameter $c_j$ for each response category. This way the counts can be different for each response category and the margins can be non-uniform.

What you are really interested in are the interaction terms $x_iβ_j$ that represent the effects of $x_i$ on the log-odds of response $j$.

The log-odds can be simply calculated by $\log(μ_{ij}/μ_{ik}) = (c_j-c_k) +x_i(β_j-β_k)$. It is the log odds that observation i will fall in response category j relative to the response category $k$.

Then, the parameters in the multinomial logit model (denoted in latin letters) can be obtained as differences between the parameters in the corresponding log-linear model, i.e. $a_j = α_j-α_k$ and $b_j = β_j-β_k$.

Momo
  • 8,839
  • 3
  • 46
  • 59
  • Thanks Momo. This is really helpful. My software provides me an option of choosing Family as "possion" and Link as "log" while running GLM alogorithm. So i think that's exactly what is required here. – Raghvendra Mar 15 '12 at 22:45
9

Yes you can, and in fact this is precisely what the R package GLMNET does for multinomial logistic regression. Writing the log-likelihood function as:

$$LogL=\sum_i\sum_cn_{ic}\log(p_{ic})$$

Where $i$ denotes observations and $c$ denotes the multinomial categories $n_{ic}$ is the observed count for observation $i$ in category $c$. The observations are defined by their unique covariate combinations - or alternatively we can allow duplicates and set each $n_{ic}=1$ so that we have categorical "binary" data (....don't know what the plural of binary is....). For logistic regression the probabilities are defined as:

$$p_{ic}=\frac{\exp\left(x_{i}^T\beta_{c}\right)}{\sum_{c'}\exp\left(x_{i}^T\beta_{c'}\right)}$$

This is a less than full rank parameterisation and can be useful if you are using penalised likelihood (such as GLMNET). We could in principle use IRLS/newton rhapson on the full beta matrix $(\beta_1,\dots,\beta_{C})$, however you end up with non-diagonal weight matrices. Alternatively we can optimise "Gibbs-style" by fixing all categories betas except for one, and then optimising just over that category. Then proceed to the next category, and so on. You can see that because the probabilities have the form

$$p_{ic}=\frac{\exp\left(x_{i}^T\beta_{c}\right)}{\exp\left(x_{i}^T\beta_{c}\right)+A}\text{ where }\frac{\partial A}{\partial \beta_c}=0$$ $$p_{ic'}=\frac{B}{\exp\left(x_{i}^T\beta_{c}\right)+A}\text{ where }\frac{\partial B}{\partial \beta_c}=0$$

That the quadratic expansion about $\beta_c$ will have the same form as for logistic regression, but with the IRLS weights calculated differently - although we still have $W_{ii,c}=n_{ic}p_{ic}(1-p_{ic})$ in the usual $(X^TWX)^{-1}X^TWY$ update of beta.

probabilityislogic
  • 22,555
  • 4
  • 76
  • 97
  • I am trying to implement multinomial logistic regression using the IRLS QR Newton variant. The code works for other GLM models, but haven't been able to get mlogit to work. Would the $W$ be the Jacobian of the softmax function which would allow me to only compute the Cholesky once per iteration rather than $k$ times to solve for each set of weights per outcome? – José Bayoán Santiago Calderón Apr 09 '18 at 03:34
  • Given that it wouldn't be diagonal it would not scale well with large number of observations, no? If going "Gibbs-style", would subtracting the base category parameters from the $\beta$ matrix be done before or after prediction? – José Bayoán Santiago Calderón Apr 09 '18 at 03:42
  • When you talk about "cholesky once" vs "cholesky k times" you must note that the matrices are different dimension - if there are $p $ columns in $X $ then the "once" is for a matrix sized $pk $ and the "k times" is for a matrix sized $p $ – probabilityislogic Apr 12 '18 at 08:33