How to handle the dummy variables with overlapping categories?

Question

Background of The Question

Let's say, I have four categories (A, B, C, D). Considering one (D) as a reference variable, there will be three categories on which I have to work. But the problem is one participant can be in several categories, in each observation. For example: In a single observation, a participant can be in both A and C categories which violates the rules of creating dummy variables as described here.

Then, My Question

Which type of variables (like dummy variable) can I use that will allow me to keep a participant in two or more categories, for a single observation?

Notes

I am aware of interaction variable. None of the categories (in my problem) can be that type of variable.
I know in CV, there are lots of questions regarding dummy variables. However, I did not find the answer of my question. Instead, I have mostly learned from those questions what should and what should not be done in case of dummy variables.
My question is similar to this one which is unanswered.

If you are okay with other encoding strategies, you may use Base-N Encoding (I'd prefer it in such a case). Or you may try with Binary Encoding and/or Hash Encoding. — Anant Kumar, Dec 09 '20 at 07:12
@AnantKumar Thanks for your response. Can you please answer with an example? — Md. Sabbir Ahmed, Dec 09 '20 at 07:23

score 1 · Answer 1 · answered Dec 09 '20 at 07:59

Here is an example for Base-N Encoding using Python. Please view the below example data :

import pandas as pd
df=pd.DataFrame({"A":['a','b','c','d','e','ab','bc','bd']})

When Base N Encoder is applied

import category_encoders as ce
encoder= ce.BaseNEncoder(cols=['A'],return_df=True,base=5)
data=encoder.fit_transform(df)
data.loc[:,"A"]=df.A

Base-N Encoder Data Output

    A_0 A_1 A_2 A
0   0   0   1   a
1   0   0   2   b
2   0   0   3   c
3   0   0   4   d
4   0   1   0   e
5   0   1   1   ab
6   0   1   2   bc
7   0   1   3   bd

Binary Encoder Strategy

encoder= ce.BinaryEncoder(cols=['A'],return_df=True)
data=encoder.fit_transform(df)
data.loc[:,"A"]=df.A

Binary Encoder Data Output

    A_0 A_1 A_2 A_3 A
0   0   0   0   1   a
1   0   0   1   0   b
2   0   0   1   1   c
3   0   1   0   0   d
4   0   1   0   1   e
5   0   1   1   0   ab
6   0   1   1   1   bc
7   1   0   0   0   bd

How to handle the dummy variables with overlapping categories?

1 Answers1