10

Is there a way to use logistic regression to classify multi-labeled data? By multi-labeled, I mean data that can belong to multiple categories simultaneously.

I would like to use this approach to classify some biological data.

user721975
  • 825
  • 2
  • 9
  • 15

3 Answers3

12

I principle, yes - I'm not sure that these techniques are still called logistic regression, though.

Actually your question can refer to two independent extensions to the usual classifiers:

  1. You can require the sum of all memberships for each case being one ("closed world" = the usual case)
    or drop this constraint (sometimes called "one-class classifiers")
    This could be trained by multiple independent LR models although one-class problems are often ill-posed (this class vs. all kinds of exceptions which could lie in all directions) and then LR is not particularly well suited.

  2. partial class memberships: each case belongs with membership $\in [0, 1]^{n_{classes}}$ to each class, similar to memberships in fuzzy cluster analysis:
    Assume there are 3 classes A, B, C. Then a sample may be labelled as belonging to class B. This can also be written as membership vector $[A = 0, B = 1, C = 0]$. In this notation, the partial memberships would be e.g. $[A = 0.05, B = 0.95, C = 0]$ etc.

    • different interpretations can apply, depending on the problem (fuzzy memberships or probabilities):

      • fuzzy: a case can belong half to class A and half to class C: [0.5, 0, 0.5]
      • probability: the reference (e.g. an expert classifying samples) is 80 % certain that it belongs to class A but says a 20 % chance exists that it is class C while being sure it is not class B (0 %): [0.8, 0, 0.2].
      • another probability: expert panel votes: 4 out of 5 experts say "A", 1 says "C": again [0.8, 0, 0.2]
    • for prediction, e.g. the posterior probabilities are not only possible but actually fairly common

    • it is also possible to use this for training
    • and even validation

    • The whole idea of this is that for borderline cases it may not be possible to assign them unambiguously to one class.

    • Whether and how you want to "harden" a soft prediction (e.g. posterior probability) into a "normal" class label that corresponds to 100% membership to that class is entirely up to you. You may even return the result "ambiguous" for intermediate posterior probabilities. Which is sensible depends on your application.

In R e.g. nnet:::multinom which is part of MASS does accept such data for training. An ANN with logistic sigmoid and without any hidden layer is used behind the scenes.
I developed package softclassval for the validation part.

One-class classifiers are nicely explained in Richard G. Brereton: Chemometrics for Pattern Recognition, Wiley, 2009.

We give a more detailed discussion of the partial memberships in this paper: Claudia Beleites, Kathrin Geiger, Matthias Kirsch, Stephan B Sobottka, Gabriele Schackert & Reiner Salzer: Raman spectroscopic grading of astrocytoma tissues: using soft reference information. Anal Bioanal Chem, 2011, Vol. 400(9), pp. 2801-2816

cbeleites unhappy with SX
  • 34,156
  • 3
  • 67
  • 133
  • Can you elaborate? – user721975 Jul 10 '12 at 22:10
  • @user721975: Was still doing this... – cbeleites unhappy with SX Jul 10 '12 at 22:13
  • Thanks for your answer. If I understand you right,option 1 means you build a series of binary (1-vs-all) LR classifiers. I don't think I get option 2. Are you asking me to build a sinlge LR that gives probability distribution over all classes? The question then is how do I decide which classes to assign the data to? Some sort of Thresholding? Which/how? – user721975 Jul 11 '12 at 19:17
  • @user721975: part 1: yes. part 2: I'll edit the answer to get more clear. – cbeleites unhappy with SX Jul 11 '12 at 19:21
  • @user721975: (2) "single" LR is a bit ambiguous: at least if there are more than 2 classes you'd have a multinomial model. Maybe you need to tell us more about your application in order to get more detailed answers. – cbeleites unhappy with SX Jul 11 '12 at 19:32
  • Well the spirit of my question is if and how LR can be used to predict multiple categories (in a multinomial setting, of course), and I am not so much looking for an application specific approach. And yes "hardening" the predictions does make sense at least in my case (not sure how probability distribution over say 100 classes will be interpreted - can one even draw any meaningful conclusions from something like that?). Any pointers on how this can be achieved? – user721975 Jul 11 '12 at 19:43
  • @user721975: just to be clear: the "soft" approach does not change the number of classes. If you have 3 classes a,b,c and a case is labelled class b, then you can write this also as membership [a = 0, b = 1, c = 0] (still very normal hard classification). But from this notation you can go to the soft membership: another sample may have membership [a = 0.2, b = 0.8, c = 0] - which are still 3 classes. Are we talking of the same thing? Whether hardening = rounding does make sense will depend on the application I think. – cbeleites unhappy with SX Jul 12 '12 at 00:49
  • I am only looking at obtaining (multiple) hard classification. – user721975 Jul 13 '12 at 16:55
1

One straightforward way to do multi-label classification with a multi-class classifier (such as multinomial logistic regression) is to assign each possible assignment of labels to its own class. For example, if you were doing binary multi-label classification and had 3 labels, you could assign

[0 0 0] = 0
[0 0 1] = 1
[0 1 0] = 2

and so on, resulting in $2^3 = 8$ classes.

The most obvious problem with this approach is you can end up with a huge number of classes even with a relatively small number of labels (if you have $n$ labels you'll need $2^n$ classes). You also won't be able to predict label assignments that aren't present in your dataset, and you'll be making rather poor use of your data, but if you have a lot of data, and good coverage of the possible label assignments, these things may not matter.

Moving beyond this and what was suggested by others, you'll probably want to look at structured prediction algorithms such as conditional random fields.

alto
  • 3,538
  • 17
  • 20
0

This problem is also related to cost sensitive learning where predicting a label for a sample can have a cost. For multi-label samples the costs for those labels is low while the cost for other labels is higher.

You can take a look at this tutorial which you can also find the corresponding slides here.

Ash
  • 233
  • 2
  • 8