2

I have a given set of numbers and I want to divide it into even n subsets, maximising the sum of each subset's variance.

Can I do this better than creating each subset with random sampling?

Torkoal
  • 121
  • 1
  • 1
    Of course you can do better, because (almost by definition) random sampling will produce variable results, so if you carry it out long enough you can get ever better values of your objective function, up to some maximum value that (by its very construction) is unlikely to appear in most random samples. Since your question is equivalent to *minimizing* a weighted variance of the subset means, which will be least when those means are made to be as close to one another as possible, it looks like this problem will be NP-hard to solve. – whuber Aug 23 '17 at 12:55
  • Ah, that makes sense. Can you think of any heuristics to try? Currently, I am trying to create samples by taking indices at different scales. For example: `{1, 5, 9, 3}` would be broken into `{1, 9}` and `{5, 3}`. – Torkoal Aug 23 '17 at 16:00
  • 2
    It depends on the nature and distribution of your numbers. A greedy algorithm might work well in many cases: match some of the largest with some of the smallest values so that their mean is close to the mean of all values. Remove the values you have taken and repeat. That could be the starting point of a genetic or simulated annealing solution. You could also try to adapt approaches to clustering, but with a different objective function: this is a kind of "anti-clustering" problem. – whuber Aug 23 '17 at 16:10

0 Answers0