I am curious to understand how data scientists attack exceedingly large datasets in order to build a regression model for y?
How does one decide where to start from? Reduce a large number of columns without the benefit of domain knowledge? Basic stats like removing - large number of null columns , single values aside what other steps do data scientists usually use ?