How would you represent this one-vs-all SVM accuracy?

Question

I have a set on one-vs-all SVMs. Let's say I have three classes. I want to show FAR and FRR from the system, but I appear to get getting very large FRR values and very little FAR values. This is because I use a positive test set from the genuine class as positive test data and the positive data from all other classes as negative test data for a test. This means that I get an equal number of FAR and FRR values. If a genuine sample is falsely rejected then it means another SVM will falsely accept it in another test for another user.

This gives the same FAR and FRR values. But it means the FAR and FRR percentages are EXTREMELY different. The negative dataset can be up to 100 times bigger than the positive set. This means that, if we have n false rejections (and consequently false acceptances) then we have n/pos_data_size FRR and n/(pos_data_size*100) FAR!

I would like to nicely represent the error rates. But this seems to be very difficult to do! Is there a way that would work in this case?

You should elaborate on what is pos_data_size, and if you have three classes what do you mean by one-vs-all. From your explanation, I cannot understand the whole picture, and, judging from absence of anwers, neither can others. — user31264, May 21 '16 at 19:47
@user31264 It's the amount of positive testing data for a particular user. — mino, May 22 '16 at 12:21
If your classes A, B, and C have e.g. 100 samples each, for the A-vs-all, B-vs-all, and C-vs-all SVM you would be able to use 100 positive and 200 negative samples each, right? This would leave you with a possible max of 200 false accepts (FA) and 100 false rejects (FR) per model (600 FA and 300 FR in total). If this is the case then I don't understand why a) the individual 1-vs-all models need to be interconnected ("if [...] falsely rejected then [...] another SVM will falsely accept it") and b) we necessarily must obtain equal FA and FR with this setup. — geekoverdose, May 24 '16 at 11:33
@geekoverdose We obtain equal FA and equal FR because we use a portion of each dataset as positive testing and a portion of all others as negative testing. This means that if one classifier in the system does not 'accept' it (give it the maximum score) then another classifier will 'accept' it. This increments both the FA and the FR. — mino, May 24 '16 at 14:45
Do I understand correctly that the amount of positive and negative samples per model is *made equal* by design? (Because without designing the portions this way you *will* get more negative than positive samples when using more than 2 classes). I still don't get how your models are connected: the prediction of one 1-vs-all model does not directly influence the prediction of another 1-vs-model, or how does it do so? Doesn't this just have to do with how you use the *output* of your models? (e.g. predicting the class with highest predicted probability). — geekoverdose, May 24 '16 at 14:53
@geekoverdose You are right that the equal FR and FA is to do with how I use the output of the models. I am unsure of how to do this another way though. I don't see what could be used as a threshold for this problem. — mino, May 24 '16 at 15:13
@geekoverdose What do you assume uses equal positive and negative samples? The amount of training data used for the genuine part of a 1-vs-all model may contain 100 and the 'all' part may contain 200 (for two other classes (2*100)). Though this is not always the case, sometimes we may train on say sets a,b,c containing 200, 300, 400 respectively. I hope that makes sense. Is this an issue? — mino, May 24 '16 at 15:19
@mino OK, would e.g. reporting a ROC curve per *individual class* - based on the *final* output of all 1-vs-all models be beneficial for you? Even if you might have too many classes in your real data for actually doing this? And: unbalanced classes in training might affect model training negatively, but you could address this e.g. using according sample/error weights. — geekoverdose, May 24 '16 at 15:23
@geekoverdose If the ROC curse would give me the curve for a particular dataset, i.e.: the positive testing data for one dataset (genuine data) and the negative data of the dataset (the positive testing data from other datasets), then I believe it would be useful. I just don't know how to do this as it would require a threshold, as far as I know! Can I ask, in what way could the setup described above affect the model training negatively? Thank you for the advice. — mino, May 24 '16 at 16:00
OK, I tried to address those points in my answer - hope that shed some light on how to do this ;) There are some good articles out there about class imbalance having negative impact on some models, see e.g. [here](http://www.chioka.in/class-imbalance-problem/), [here](http://stats.stackexchange.com/questions/131255/class-imbalance-in-supervised-machine-learning), or [here](http://machinelearningmastery.com/tactics-to-combat-imbalanced-classes-in-your-machine-learning-dataset/). — geekoverdose, May 30 '16 at 13:02

geekoverdose · Answer 1 · 2016-05-24T17:34:06.913

If you have performance measures per individual class (and individual models in your case), you could report those along/instead of reporting the prediction performance over all classes. For example, you could report the ROC curve, the AUC, EER, etc. for each class individually, and this does not necessarily depend on having underlying individual models. The first two mentioned metrics have the advantage of capturing the model power without employing a concrete threshold yet, and with the latter, the threshold is still automatically determined. Reporting the FRR and FAR per individual class (using a specifically chosen threshold) would be an option too.

In case you have too many classes in your data to report performance on class level you could instead report meta-statistics about those performances like mean, standard deviation, quantiles of AUC, EER, etc. An example would be boxplots showing AUC, EER, etc. for all classes - even a boxplot over FAR and FRR might make sense in some cases. While not giving details about which classes are predicted well or badly, this still captures the variation of performance across classes.

Most ML tools already provide the functionality for computation of such statistics alongside evaluation, so you don't necessarily need to do this on your own. Here's a small example using SVMs with 3 class classification and statistics on individual class basis in R with the caret package:

library(caret)
library(plyr)
library(pROC)

# example problem, 3 class classification 
model <- train(x = iris[,1:4], 
        iris[,5], 
        method = 'svmLinear', 
        metric = 'Kappa', 
        trControl = trainControl(method = 'repeatedcv', 
                                number = 10, 
                                repeats = 20, 
                                returnData = F, 
                                returnResamp = 'final', 
                                savePredictions = T, 
                                classProbs = T), 
        tuneGrid = expand.grid(C=3**(-3:3)))
plot(model, scales=list(x=list(log=3)))

The overall confusion matrix already gives some insights to class confusion (you might want to use the relative representation instead):

# confusion accross partitions and repeats
conf <- confusionMatrix(data = model$pred$pred, reference=model$pred$obs)
print(conf)
levelplot(conf$table, col.regions=gray(100:0/100)) # absolute confusion over partitions and repeats
levelplot(sweep(conf$table, MARGIN = 2, STATS = colSums(conf$table), FUN = '/'), col.regions=gray(100:0/100)) # relative confusion over partitions and repeats

ROC curves can be calculated for individual classes (in this example only if the predictions been preserved during cross validation):

# compute ROC for each individual class
rocs <- llply(unique(model$pred$obs), function(cls) {
    roc(predictor = model$pred[[as.character(cls)]], response = model$pred$obs)
})

# report ROCs for individual classes in one figure
plot(rocs[[1]])
lines(rocs[[2]], col=2, lty=2)
lines(rocs[[3]], col=3, lty=3)
# ...

# compute some statistics per class
statistics <- ldply(rocs, function(cROC) {  
    cAUC <- cROC$auc
    cEER <- cROC$sensitivities[which.min(abs(cROC$sensitivities-cROC$specificities))]
    # you could add further metrics here, like FAR, FRR for specific thresholds
    # ...
    data.frame(auc=cAUC, eer=cEER)
})
print(statistics)

Calculation numeric statistics could be done by class as well:

         .id       auc       eer
1     setosa 1.0000000 1.0000000
2 versicolor 0.9839534 0.9291429
3  virginica 0.9852129 0.9338571
...

Finally, meta-statistics of those metrics over all classes could be calculated and displayed (in case there are too many classes to be reported individually):

# compute some meta-statistics in case there are too many classes
print(summary(statistics))
boxplot(statistics[-1])

I hope this in fact addresses your questions. BTW: if you struggle getting good results with certain individual models due to having few positive but many negative samples in the associated training set you could consider adding sample/error weights during model training, or utilizing e.g. up/-downsampling.

How would you represent this one-vs-all SVM accuracy?

1 Answers1