0

My goal is to predict taxi demand in NYC depending on location and time. I have a dataset with ~18 million observations. With that said, I could add a large number of predictors.

But when would I run into the curse of dimensionality. E.g. adding dummies for all tracts in NYC, would result in 2165 (n-1) additional predictors.

vranjes
  • 65
  • 5
  • With 18 million data points, practically speaking you need not worry about having too many predictors. BTW, I do not understand the tracts comment. What do you mean by n? Is 2165 the number of tracts? Why would you multiply the number of tracts by n? – Joel W. Aug 31 '18 at 13:05
  • Thank you for your input Joel. There are 2166 tracts in NYC. Adding them to my model through dummies would lead to 2165 additional predictors. – vranjes Aug 31 '18 at 13:59
  • That number of variables should be fine, given your sample size. See a related discussion here: https://stats.stackexchange.com/questions/10079/rules-of-thumb-for-minimum-sample-size-for-multiple-regression – Joel W. Aug 31 '18 at 17:53

0 Answers0