0

So, I have a binary classification problem. The classes are fairly balanced and I have a separate training set and a test set. No matter what I try, both classification accuracy and the f1 score are always hovering around 45-55%. This means that the model is just random guessing and not learning anything. I had this doubt that this could be because these two classes are so similar i.e they are inseparable.

Therefore, I ran a Kolmogorov–Smirnov test and got a p-value of 0.87 between two classes for the training set and a staggering 0.98 for the test set. I have 20 independent variables and I know that K-S works only for 1-D data. Therefore, I just flattened the n-d array into a 1-D array so that I can perform a K-S on it.

My question is this, can I now conclude that these two classes are statistically so similar and are inseparable using machine learning algorithms ?. Is this claim so naive? Is Kolmogorov–Smirnov test enough of a proof to prove this?

Can two classes be truly inseparable even in a higher dimension? i.e via kernel transformation or neural nets.

I did use different regularization techniques as well just to make sure that this is not due to overfitting.

Ambarish
  • 119
  • 1
  • 7
  • What are your K-S tests actually testing? – jbowman Nov 02 '18 at 18:55
  • The similarly between two classes – Ambarish Nov 02 '18 at 18:56
  • As @jbowman, I don't understand how you use K-S to assess the similarity between the two classes. Do you have only a single predictor, and do you run K-S on the distributions of this single predictor in the two classes? Also: [Why is accuracy not the best measure for assessing classification models?](https://stats.stackexchange.com/q/312780/1352) and [Is accuracy an improper scoring rule in a binary classification setting?](https://stats.stackexchange.com/q/359909/1352) – Stephan Kolassa Nov 02 '18 at 21:53
  • I know K-S only works with 1-D data, so I just flattened the n-d array. I might be wrong here but it was just an idea that I thought might work. The problem is not just with accuracy, even the f1 score is not so promising. – Ambarish Nov 02 '18 at 22:03
  • Your question is a bit unclear. My blog post [how to ask a statistics question](http://www.statisticalanalysisconsulting.com/how-to-ask-a-statistics-question/) may help. But one thing you might want to check is tests of equivalence. – Peter Flom Nov 03 '18 at 13:17
  • Flattening the array does not preserve its information; it loses all the information about correlations between the columns of the array. The KS test you performed is not appropriate if the elements of each of the arrays are not independent of each other. Furthermore, different columns of the array will likely have different distributional characteristics; KS assumes elements of the array are all identically distributed, which is almost certainly not the case here. – jbowman Nov 04 '18 at 17:58
  • 1
    Don't we use flattening in CNN? If this is not the right way then can you suggest me a way to compare two multivariate datasets? I need to check if they are similar or not. – Ambarish Nov 04 '18 at 18:08
  • 1
    CNN is not performing a formal statistical test of a hypothesis - a very different thing than just fitting a model. – jbowman Nov 07 '18 at 02:19

1 Answers1

1

Your final question is "Can two classes be truly inseparable even in a higher dimension?"

It is certainly possible that two classes can't be separated by any algorithm. Two classes could, at least in theory, be absolutely identical, except for class. More likely, they could be almost identical.

Indeed, you want algorithms to not show difference where the differences are so small as to be trivial; you don't want to distinguish samples where the distinguishing characteristics are due to random chance. This is what p values are about. But, even if you have N = 10 billion and, therefore, any difference is statistically significant, you don't usually want to make distinctions where there is no appreciable difference.

For your earlier questions, I still don't see exactly what you are doing with the K-S tests. There are tests of equivalence that flip the usual roles of null and alternative hypotheses. One such is TOST - two one sided t tests. You will need to specify a difference that you consider so small as to be essentially equivalent.

I think when you did your original analysis and found "No matter what I try, both classification accuracy and the f1 score are always hovering around 45-55%" that you should have concluded that, indeed, none of the methods you were using worked. When nothing you are doing is working to (in this case) separate classes, then the two most common reasons are a) Your sample is too small or b) There just really isn't anything going on.

Peter Flom
  • 94,055
  • 35
  • 143
  • 276
  • Thanks Peter for your clear explanation. As per me understanding K-S test is like any other hypothesis testing right? My idea was to use a no parametric test to check the equality of two distributions. In this case these are my two classes. Let's say I set the confidence level at 80 percentage, then if the p-value more than that I can accept my null hypothesis - two classes follow same distribution. I can't set the distance metric here because I don't know that. I can only use the information from p-value. – Ambarish Nov 04 '18 at 12:32
  • But .... distribution of what? Usually the K-S is to test the difference in distribution of two continuous variables (or the same continuous variable on two groups). You haven't got that and I am not sure what you are getting when you flatten the data, or if that is suitable to K-S. My intuition is that it is not, but I don't know for sure. – Peter Flom Nov 04 '18 at 12:36
  • Yes, exactly. Here the continuous variables are my independent variables. Flattering an array preserves it's information. So that I have the same continuous variable and two groups namely my two classes. – Ambarish Nov 04 '18 at 12:42
  • OK, you may be right. It seems wrong, somehow, to me, but, as I said, I'm not sure. – Peter Flom Nov 04 '18 at 13:06