3

Do you know of any techniques that allows one to avoid and get rid of multicolinearity in SVM input data? We all know that if multicolinearity exists, explanatory variables have a high degree of correlation between themselves which is problematic in all regression models (the data matrix is not invertible and so on).

My question is actually a bit more subtle as relevant data has to be selected. In other words, multicolinearity must be avoided while keeping relevant input data. So in a way, I guess that my question is also about reducing the input data matrix dimension.

So, I've thought about PCA, which permits both reducing dimension and getting uncorrelated vectors (the PCs), but then I don't know how to deal with the eigenvector's signs.

jonsca
  • 1,790
  • 3
  • 20
  • 30
marino89
  • 95
  • 1
  • 7
  • If multi-colinearity is high then when you plot the sorted eigenvalues of the SVD (of the standardized covariance matrix) you should see a number of small values near the end. You can zero these out, and form a basis with reduced dimension. I don't know why the eigenvalue's sign is a problem. – muratoa Sep 05 '12 at 09:16
  • Also I don't think SVMs are too sensitive to multicollinearity, especially when used with kernels. If some dimensions are collinear, they will contribute identically to the pairwise kernel distances but the SVM will just seperate on the dimensions that do contribute. Multi-colinearity is more of an issue when $(X^tX)^{-1}$ is necessary for the fit/inference. – muratoa Sep 05 '12 at 09:20
  • Ok, so If I consider a given dataset, I carry out a PCA on its covariance matrix, then extract de most significant PCs and use them as input data for my SVM. Am I right? – marino89 Sep 05 '12 at 09:20
  • 1
    Yeah you can. But try using the full data and then try reducing with different (sum total) eigenvalue thresholds and see if there is a significant difference. Because if you fit an SVM with a transformed basis you have to apply this transform on each new data point when used for prediction. – muratoa Sep 05 '12 at 09:22
  • I'm using SVM to perform a C-classification. If for example I'm choosing to indicators that are correlated with each other, maybe this correlation will have an impact on classification. This correlation could overweight a given feature and then lead to misclassification. Idon't know if I'm right saying that... – marino89 Sep 05 '12 at 09:24
  • Multi-collinearity reduces the effective dimensionality of your data. For an SVM this means it has less space to separate the classes into. However the "kernel trick" can take a relatively low-dimensional space and map it into a much higher one where separation is simpler. So a modest rank input matrix may still be fine, in some models (linear regression) $(X^tX)^{-1}$ is necessary for both the fit and inference, so you can get into trouble. The SVM however only depends on kernel distances. – muratoa Sep 05 '12 at 09:29
  • 2
    Just a note that multicollinearity can exist even if there is no pair of variables that are highly correlated. Multicollinearity is a trait of sets of data. E.g., if you have 11 variables and one is a sum of the other 10, you will have perfect collinearity, but no high correlations: – Peter Flom Sep 05 '12 at 10:53

1 Answers1

9

Multicolinearity is not generally a problem for SVMs. Ridge regression is often used where multicolinearity is an issue, as the regularisation term resolves the invertibility issue by adding a ridge. The SVM uses the same regularisation term as ridge regression does, but with the hinge loss in place of the squared error.

Ridge regression has a link with PCA as explained by Tamino, essentially it penalises principal components with large eigenvalues less than components with low eigenvalues, so it is a bit like having a soft selection of the most important PCs, rather than a hard (binary) selection.

The important thing is to make sure that the regularisation parameter, C, and any kernel parameters are tuned correctly and carefully, preferably by minimising an appropriate cross-validation based model selection criterion. This really is the key to getting good results with an SVM (and kernel methods in general).

Dikran Marsupial
  • 46,962
  • 5
  • 121
  • 178