I'm looking for some methods of variable selection on large datasets.The number of variables are around 30-40, but the number of observations is quite large (around 36000000)
Any methods which I know of like stepwise regression or orthogonal matching (or rather their implementation in libraries like scikit-learn) can't really handle such data. The first problem is they always try to fetch all of the data in memory. I didnt get past that to know if there will be more problems.
Does anyone know of any libraries which maybe able to handle such datasets.
Even broader methods for selection is fine ( not necessarily a stepwise regression).
Related to
Variable selection in large datasets
(As pointed out by Richard hte terms may cause confusion, hence rewording the question a little)