I have a dataset of 800,000 observations and 11 features that I am using for a classification problem. I tried to optimize my model many times but in vain. The one thing I haven't tried is using SVM. I heard SVMs are only practical with smaller datasets. My concern is: shall I give it a try or is it going to take days to train the model?
-
[This](https://stats.stackexchange.com/questions/314329/can-support-vector-machine-be-used-in-large-data) thread may be of interest – user20160 Apr 19 '19 at 13:10
2 Answers
You can try the SVMlight implementation of support vector machines. For me it worked blazingly fast with about 10 000 observations and several hundreds of features, giving good results. They claim it's fast for several hundred thousands of samples, too. In addition there's a Python binding for SVMlight, which I haven't tried.
Using this implementation you can try different kernels (e.g. polynomial or rbf) and see if SVMs help with your classification problem. Maybe start with a subsample of your data first, though.

- 1,391
- 2
- 7
- 25
It will take many days for 800k on non-linear SVM. Start with smaller subsets of samples and move upwards. 1000-10000 for example. Look at how your target metric improves with increasing number of samples (learning curve). With only 13 features it is likely you will have diminishing returns way before 800k samples. Remember that you will need to tune hyperparameters also. Can usually be done on a smaller sample subset.

- 1,194
- 5
- 19