I have a dataset with few (20,000) data points and many (100) features ranging from 0 to 1. The dataset is divided into two classes with even distribution. I'm doing a classification task on this and want to compare that to a reasonable baseline.
One option would be to take a random guessing baseline where we consider the same classes but without features. That would give an accuracy of 0.5, because the classes are evenly distributed.
As the feature-data point ratio increases, spurious correlations make it easier to classify the data points. I therefore take another baseline, where I use the same data but randomly distribute the classes over the data points. Then I build the same classifier on that data (B), and compare it to the classifier built on the actual data (A). The difference between (A) and (B) tells me how I improve compared to 'seeing patterns in randomness'; the difference between (B) and the random guessing baseline tells me how easy it is to find spurious correlations in the dataset.
Is there a name for a baseline like (B) described here?