6

I have a multinomial classification problem where I have > 2 classes, and for each observation I have i) the class the observation is assigned to, and ii) the probability of it belonging to a class (you could interpret this as the certainty of the observation belonging to that class).

Is there a regression technique that takes not just the class label but also these probabilities into account? I know I can easily convert the probabilities into a categorical target and run a multinomial logistic regression, but was wondering if there's a way to preserve this information?

I guess what I'm looking for here is a combination of a multinomial logistic regression and a beta regression. Any pointers to what literature there is out there or any relevant R/python packages would be appreciated!

Nigel Ng
  • 97
  • 5
  • 1
    So you have response data with *uncertain class labels*. For each observation, do you have a probability vector $(\pi_1, \dots,\pi_k)$ summing to one? Then we could write down a likelihood to optimize ... – kjetil b halvorsen Feb 24 '17 at 11:32
  • In my current case, no, I only have a scalar probability for how 'certain' an observation belongs to a class. But in the future I may need a model that can generalise to a full probability vector, so would love to hear a suggestion for either scenario. – Nigel Ng Feb 24 '17 at 13:35
  • So, say, if observation 3 belongs to class C with probability 0.8, you now nothing about the missing 0.2 probability , it could any of the other classes? We could then distribute it equally, or maybe better, distribute it proportionally to marginal distribution over classes? – kjetil b halvorsen Feb 24 '17 at 13:50
  • Yes. You are correct. Let's assume we distribute it proportionally according to the marginal distribution over classes. – Nigel Ng Feb 24 '17 at 14:03
  • Alternatively, I was thinking we could also do a hierarchical beta regression? $$ g(\mu_i) = {x_i}^T\beta_j $$ Where $\beta_j$ are coefficients for each class. And we assume partial pooling, i.e. $\beta_j \sim Normal(\beta, \sigma)$. Would this be a solution? Would love to hear what you had in mind as well. – Nigel Ng Feb 24 '17 at 14:12
  • http://www.cs.bris.ac.uk/~flach/ECMLPKDD2012papers/1125762.pdf – kjetil b halvorsen Feb 24 '17 at 22:20

1 Answers1

4

The polytomous extension of the beta regression is Dirichlet regression. For beta you have just one proportion $y$ which you could also see as a composition of $(y_1, 1 - y_1)$. More generally, one could also have $(y_1, y_2, \dots, y_{k-1}, 1 - \sum_{j = 1}^{k-1} y_j)$ with the additional restriction that $0 < y_j < 1 \forall j$.

The Dirichlet distribution then provides a probabilistic model for this kind of data. And there are different parameterizations that could be employed in a regression setup. The R package DirichletReg at https://CRAN.R-project.org/package=DirichletReg implements two possible parameterizations. See http://epub.wu.ac.at/4077/ for an introduction.

Achim Zeileis
  • 13,510
  • 1
  • 29
  • 53
  • It is well possible that it doesn't because I'm not sure whether your dependent variable sums to one. But even if it does not work, it might help you to say more precisely what you need (and what you don't). – Achim Zeileis Feb 25 '17 at 19:22
  • In my case it doesn't sum to 1, but I think I will incorporate @kjetil's comment to my question above by distributing the remaining probability according to the marginal distribution of the classes so they sum to 1. Accepted, thanks! – Nigel Ng Feb 26 '17 at 11:41