5

I know that a similar subject was treated here, but my question is a little bit different.

I have a result of multilabel classification, like this (2 observations, 3 labels in the example, in practice I have 10k observations and 300 labels):

> pred_df
      truth.label1  truth.label2  truth.label3  pred.label1  pred.label2  pred.label3
    1 TRUE          FALSE         FALSE         TRUE         TRUE         FALSE
    2 FALSE         FALSE         TRUE          FALSE        FALSE        TRUE

I know that confusion matrix deals with accurracy of class/labels prediction, but I was wondering if it still has a meaning if applied to the observations instead. Indeed, I have the idea to transpose my results and compute the confusion matrix for each observation:

  > t(pred_df)
                      1        2
    truth.label1    TRUE     FALSE
    truth.label2    FALSE    FALSE
    truth.label3    FALSE    TRUE
    pred.label1     TRUE     FALSE
    pred.label2     TRUE     FALSE
    pred.label3     FALSE    TRUE

#confusion matrix for observation 1 :    
cm1 <- confusionMatrix(t(pred_df)[1:3,1],t(pred_df)[4:6,1])
#confusion matrix for observation 2 :
cm2 <- confusionMatrix(t(pred_df)[1:3,2],t(pred_df)[4:6,2])

It seems to me that this will measure the accuracy of my model for each observation, then I could summarize all the confusion matrices to have a good metric for the whole multilabel classification... But I am not sure it still has a relevant signification (practically and theoretically speaking). Does it?

Tau
  • 81
  • 1
  • 5
  • you might want to track it from the conversation [here](https://github.com/scikit-learn/scikit-learn/issues/3452) on sklearn issues. and also this is a page from the [documentation](https://scikit-learn.org/dev/modules/generated/sklearn.metrics.multilabel_confusion_matrix.html). I'm still not sure how to plot it though like a N by N heat map or some thing. the name is `sklearn.metrics.multilabel_confusion_matrix` As of this writing, 21 is not on an stable release so will need to install the develop version. [here](https://scikit-learn.org/stable/developers/advanced_installation.html#install-bl – Omid S. Jan 14 '19 at 22:59
  • Sklearn has published their latest v0.21 version, which contains multiple-label confusion matrix. You can refer to that. – user233953 Jan 14 '19 at 03:15

0 Answers0