Why does increasing the number of features reduce performance?

Question

I'm trying to gain an intuition as to why increasing the number of features could reduce performance. I'm currently using an LDA classifier which performs better bivariately among certain features but worse when looking at more features. My classification accuracy is performed using a stratified 10-fold xval.

Is there a simple case of when a classifier would work better univariately than bivaraiately to gain a somewhat physical or spatial intuition of what is happening in these higher dimensions?

As a quick comment, adding irrelevant predictors can worsen performance on new data - increased variance of the prediction (over fitting). This is because you end up fitting to noise and dilute the "true signal". — B_Miner, Nov 01 '12 at 13:48

score 10 · Accepted Answer · answered Nov 01 '12 at 16:27

10

See "A problem of dimensionality: A simple example" -- a very short and very old article by G. V. Trunk. He considers a two class problem, with Gaussian class-conditional distributions where the features are all relevant but with decreasing relevance. He shows that the error rate of a classifier trained on a finite sample converges to 0.5, whereas the Bayes error approaches 0, as the number of features increases.

answered Nov 01 '12 at 16:27

Innuo

1,418
10
11

1

(+1) That's a cute little reference. – cardinal Feb 25 '13 at 15:08

score 3 · Answer 2 · answered Nov 01 '12 at 23:25

This is named as "Curse Of Dimensionality". I don't know is there any specific reason for LDA but in general having much dimension on feature vector results with the need of more complex decision boundaries. Having complex boundaries also comes with a question "In what degree?" since we also consider over-fitting. As another point, with additional dimensions the learning algorithm's complexity is increasing too. Thus working with relatively slow learning algorithm with huge feature vector makes your job event worse. In addition with the dimension you might have increasing possibility ti have correlated features in which is not good for lots of learning algorithms like Neural Net or some others.

You may count other reasons that are under "Curse Of Dimensionality" but the fact is to have enough number of instances with concise feature vector that is proceeded away by some feature selection routines.

Why does increasing the number of features reduce performance?

2 Answers2

Linked

Related