I am working on a binary classification problem with 1000 rows but have multiple high cardinality variables.
So, I decided to use Hash encoder to avoid curse of dimensionality.
However, after feeding in my columns as shown below,
encoder = ce.HashingEncoder(cols=['market', 'Segment', 'Application',
'Product Classification','State', 'Pincode'
'Project Status','Country','line', 'DIV'], return_df=True)
categorical_data_transformed = encoder.fit_transform(categorical_data)
I got an output like below
My questions are
a) How do I know which hash value corresponds to what category or column?
b) How do I get the column names? instead of col1, col2, col3 etc
?
c) How do we use this if we wish to explain our predictions to business users?