During exploratory clustering with K-Means on agents with a range of events, I created two sets of models across clusters with K in {2,..,9}. In one set, the model is fit using raw counts of five kinds of events for any given agent, three of which are mutually exclusive; in the other set, the features are four percentages (two of the three from the set of mutually exclusive events) that express the percentage of an agent's total events. Both sets of features are scaled using MinMax to the range [-1,1].
I fit the models using PySpark's implementation of K-Means and was surprised to find that inertia (or WSSSE) calculated using the computeCost method, showed an inertia value two magnitudes higher for each cluster solution using percentages than the solution using counts. I'd have thought there would be little difference because both models used the same scaling, but it's almost as if the clusters built on percentages are somehow more diffuse than those built on counts.
What should I look for to help me understand why the inertia is so much higher fitted to percentages rather than to counts?