Is Support Vector Machine sensitive to the correlation between the attributes?

Question

I would like to train an SVM to classify cases (TRUE/FALSE) based on 20 attributes. I know that some of those attributes are highly correlated. Therefore my question is: is SVM sensitive to the correlation, or redundancy, between the features? Any reference?

My guess would be no, since generating a separation based on one variable would make the other correlated variables weak regarding further separations. There might be some instability regarding which variable is chosen, however. — mandata, May 04 '15 at 14:28
Yes, absolutely. You can design a kernel to explicitly deal with the correlations, if you'd like. — Danica, May 05 '15 at 05:40
@Dougal: If there are methods to eliminate the effect of correlation, doesn't that imply that standard SVM is sensitive to correlation? — cfh, May 05 '15 at 10:48

score 14 · Accepted Answer · answered May 05 '15 at 17:43

Linear kernel: The effect here is similar to that of multicollinearity in linear regression. Your learned model may not be particularly stable against small variations in the training set, because different weight vectors will have similar outputs. The training set predictions, though, will be fairly stable, and so will test predictions if they come from the same distribution.

RBF kernel: The RBF kernel only looks at distances between data points. Thus, imagine you actually have 11 attributes, but one of them is repeated 10 times (a pretty extreme case). Then that repeated attribute will contribute 10 times as much to the distance as any other attribute, and the learned model will probably be much more impacted by that feature.

One simple way to discount correlations with an RBF kernel is to use the Mahalanobis distance: $d(x, y) = \sqrt{ (x - y)^T S^{-1} (x - y) }$, where $S$ is an estimate of the sample covariance matrix. Equivalently, map all your vectors $x$ to $C x$ and then use the regular RBF kernel, where $C$ is such that $S^{-1} = C^T C$, e.g. the Cholesky decomposition of $S^{-1}$.

This is a very interesting answer; I'd like to read more about how to mitigate these kinds of problems. Can you add a reference or two? — Sycorax, May 05 '15 at 17:45
I don't know a good one off-hand, but I'll look around a bit for one, perhaps tonight. — Danica, May 05 '15 at 17:45
Awesome! Inbox me if you happen to find a cool article. I'm glad that my (+1) could put you over 3k. (-: — Sycorax, May 05 '15 at 17:47
The inverse of covariance matrix in Mahalanobis distance is a key. If you can estimate it reliably, this effected can be accounted for. — Vladislavs Dovgalecs, May 05 '15 at 19:00

Is Support Vector Machine sensitive to the correlation between the attributes?

1 Answers1

Linked