Overfitting when dealing with nearly the same number of features and observations

Question

When dealing with nearly same number of features and observations, one of the most common problem is overfitting. For my project I used 2 class LDA on a 1400 * 1000 dataset and to avoid overfitting, I used K-fold (k=10) cross-validation. I set 10% of my observation vector aside each time as a test dataset and trained the remaining data and repeated the whole procedure 10 times. Unfortunately I am not certain if the overfitting problem was resolved by using this method. So I have the following questions:

How can I measure if overfitting happened or not?
After using some additional techniques for overcoming / avoid this problem (such as cross-validation, regularization, early stopping, etc.), how can I know how much these extra techniques helped me avoid overfitting?
When dealing with nearly same number of features and observations, what is the best extra techniques one can use to prevent overfitting?
In my case, does increasing K, the number of folds, help me prevent overfitting?

Note:i don't know if this going to help or not , but my work is a supervised learning problem.

Maia I think you have a misunderstanding about cross validation. It doesn't stop over fitting itself. You use a method that reduces over fitting (eg variable selection, l2 regularisation etc) and use cross validation to decide how much simplification is required (eg is a 3 input model performing better than the 1000 input model). In other words you have some knob to regulate complexity of model and for a variety of these settings you evaluate the cross validation error, then you chose the setting that gives you the lowest error. — seanv507, Jun 19 '16 at 21:58

score 2 · Answer 1 · edited Apr 13 '17 at 12:44

1.how measure if overfitting happened or not?

You get a hint a model probably is overfitted when the performance on test that is unreasonably low compared to performance on train data or even compared to no-information model, but keep in mind algorithms are always expected to perform better on train data.

2.after using some additional techniques for overcoming/avoid this problem (such as cross-validation, regularization, early stopping, ...) how should i know how much these extra method help me to avoid the Overfitting problem?

The nearer your test performance gets to train performance the least overfitting there is. Caution is warranted because you may be leaving overfitting just to enter underfitting, i.e. train and test performance are reasonably similar yet both are bad.

3.when dealing with nearly same number of feature and observation, what is the best extra method one can use for preventing overfitting?

Quite difficult to answer that without being based on opinion. Have you tried to diminish a bit the number of features, like eliminating linear combinations or features that have near zero variance (this is part of the model optimization and so should be done inside the cross-validation)? Also, embedded regularization methods like lasso are worth a check (I see you mentioned it). Search-type feature selection methods might actually make overfit worse, i.e. the feature selection itself might be overfitted to training data.

4.and last but not least, in my case dose increasing K number help me prevent Overfitting?

The choice of $K$ must take into account the bias-variance tradeoff. A good read about it is Chapter 3 with emphasis on Sections 3.3-5 of Kohavi, R. (1995). Wrappers for performance enhancement and oblivious decision graphs (Doctoral dissertation, stanford university). The point is large optimistic bias leads to overfitting. Increasing $K$ reduces the bias, but might increase variance to the point of uselessness. Repeated cross-validation can be used to reduce variance, but repeating it too much leads to underestimation of the variance. Too small $K$, like $2$-fold CV also has large variance. $10$-fold is usually considered a good compromise.

This answer brings a heuristic to estimate overfitting, but I never tried it so can't really comment on it.

First ,thanks for answering all my question,while using k-fold cross-validation how much variance is good indicator that i overcome overfitting ? i mean as i understand , if all 10 repeating of taring-and-evaluating the classifier in k-fold give us nearly same performance , this indicate that we overcome the overfitting issue which otherwise without using k-fold was main drawback of dealing with same number of feature and observation? — maia, Jun 16 '16 at 08:00
No, you misunderstood. You use cross-validation to obtain an estimate of the generalized performance of your model, and there's bias and variance in that estimate. Diminishing both would be the ideal. The variance itself is no indicator of overfitting, it just makes the estimation worse. Now, the optimistic bias leads to overfit because your model building strategy is done through hyperparameter optimization based on performance estimates. — Firebug, Jun 16 '16 at 12:18

Overfitting when dealing with nearly the same number of features and observations

1 Answers1

Linked