0

I am curious to understand how data scientists attack exceedingly large datasets in order to build a regression model for y?

How does one decide where to start from? Reduce a large number of columns without the benefit of domain knowledge? Basic stats like removing - large number of null columns , single values aside what other steps do data scientists usually use ?

  • 200 variables given your sample size is not that many. What you want: explanatory or predictive model? You can simply use regularized regression. – Tim Jan 29 '19 at 05:57
  • How does regularized expression help in reducing the manual work of going through 200 columns ? – Adurthi Ashwin Swarup Jan 29 '19 at 07:27
  • Regularization does automatic feature selection https://stats.stackexchange.com/q/4272/35989 – Tim Jan 29 '19 at 07:36

0 Answers0