Currently, I have a dataset which contains 200,000+ datapoints and it contains 20 features with ~10 features as categorical. These categorical columns are countries, state, localities which contains >150 country name and hence converting them to one hot encoding might increase the computation. Is there any feasible way to do it?
I am using sagemaker's inbuilt xgboost algorithm. Does it deal the categorical datasets by default or should I have to convert it into some form?