I am working on a binary classification problem with input variables like country
, state
,city
, product
, product type
, product segment
etc. Similarly, I have lot more hierarchical categorical variables
As you can see, variable city
is a granular level info of variable country
. Same with other hierarchial variables.
My questions are as follows
a) We want our ML model to find factors such as state
, country
, city
etc.
ex: We would like to predict in which country, state and city, does our product has high likelihood of selling? ex: `Product A has 90% likelihood of selling in Country A, State A and City A.
b) How to run correlation between hierarchical variables? Should we retain top level
variable or bottom/granular level
variable?
c) Does it make sense to feed all this hisrarchial variables into ML model? How to decide on feature selection here?
c) Any other suggestion on how to handle hierarchical variables during feature engineering and ML model building etc?
Can guide me on this?