0

A very simple model:

Given N labeled training examples from two classes (say red and green), we use the distance from means approach to learning a classifier - that classifies new points as either red or green. The idea is to predict the class that has a closer mean.

Mathematically, it can be shown that this model learns a hyperplane decision boundary.

One of my university's course's lecture content remarks that - "This simple approach, if using Euclidean distances, can only learn linear decision boundaries. A reason: The basic approach implicitly assumes that classes are roughly spherical and equi-sized."

What is the reason and intuition behind the classes being roughly spherical and equi-sized? Could someone please explain? Thanks a lot!

  • The claim, as you write it, is: "The distance from means approach can only learn linear boundaries *because* it assumes sphericity and equal sizes." This claim is false. Yes, it can only learn linear boundaries, and spherical equally-sized groups are better. But these are two independent claims. So: are you interested in the two separate claims? (If so, I would recommend you ask two separate questions. One question per post is a better fit for us here.) – Stephan Kolassa Apr 03 '20 at 12:02
  • No, I'm not interested in two separate claims - I'm only interested in knowing why spherically "equally sized" groups are better. Why is that? @StephanKolassa – connected-subgroup Apr 03 '20 at 12:07

1 Answers1

1

Spherical groups are better because distance-to-means relies only on group means to classify new data, and group means (i.e., centers) are good summaries of groups only for spherical groups. For instance, suppose your two groups are concentric annuli (as in this answer). Since they are concentric, their means are identical. Of course, in a real world situation, the means will differ slightly, but which group a new data point will be assigned to will be very variable.

Alternatively, suppose your two groups are ellipsoids. Then points far out on the major axis will have a much higher impact on the group means (and therefore on the classifications for new data points) than points far out on the minor axis.

Equal sizes are good because the impact of each training data point will be disparate if group sizes are unequal. If one of your groups is very small, then a change in a single data point of that group can move the group's mean a lot, and therefore change the classification boundary a lot. Higher variance means higher error.

I very much recommend the thread on How to understand the drawbacks of K-means, which I took the link above from. Many of the problematic examples there straightforwardly suggest cases where distance-to-means also has problems.

Stephan Kolassa
  • 95,027
  • 13
  • 197
  • 357
  • *If one of your groups is very small, then a **change in a single data point** of that group can move the group's mean a lot.* What is meant by change in a data point? Why would I change the training data? Please clarify! – connected-subgroup Apr 03 '20 at 13:35
  • Also, what is meant by the disparate impact of data points, if group sizes differ largely? *Impact on what?* and *Why is it something we consider?* – connected-subgroup Apr 03 '20 at 13:37
  • 1
    You will typically not *change* the training data. But your training data will usually be a sample from an underlying population. And then the variance comes from whether you include one observation in your sample rather than another. You might be lucky and use a training sample that gives you a good model. Or you might be unlucky and happen to use a sample that gives you a bad model. This dependence on luck is the variance in your model, and therefore on your classifications of new data. Variance translates directly into higher error. – Stephan Kolassa Apr 03 '20 at 14:44
  • 1
    And that is the impact I was discussing. And the error is why we are interested in it. – Stephan Kolassa Apr 03 '20 at 14:44
  • Thanks a lot - makes sense now! – connected-subgroup Apr 03 '20 at 15:02