I have a dataset with 200,000 entries with four columns (time_of_day, order_size, time_taken, shop_number). I need to build a model and predict time_taken using the other three variables. There are more than 30,000 shop ids.
My approach has been to: 1. Try to use one_hot_encoding to encode each shop no. However, this leads to very large number of columns in my dataset. 2. Build a separate model for each shop assuming all shops have different efficiencies independent of one another.
Any other approach will be appreciated? Also, what type of model should I use in my data? I have tried simple linear regression and bayesian regression till now.