I'm trying to train a regression tree with some very large data I have: approx 3Tb.
I'm using scikit-learn and of course there is no way I can load that amount of data on memory. Doing some online research I found that some scikit-learn algorithms have a partial_fit method which can be used for this purpose. Unfortunately scikit-learn decision trees don't have a partial fit method.
I wonder if any of you have come across this problem and if there is an alternative to handle it.
By the way, my data is stored in a pandas data frame.