The title isn't clear, That's because the problem itself is weird for me. I was asked to create categories that will later be used in statistical learning. That means that the classification algorithms that will be used will classify feature occurrences in one of the categories I will create.
Here is an example of the data I have :
head(table_final[,28:35])
# A tibble: 6 × 8
DS_DTMALTA_SINIESTRO DS_COMPANIA TIPO_HAB_D DS_PARTE_D DS_ESTETICO_PARTIDA DS_PROFESIONAL DS_PROVINCIA_SINIESTRO
<chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Abril/2016 LBPAI DOR 1 SOL Directos WW , RENOVATECH Ain
2 Marzo/2016 ALLIANZ C2S COC 1 SOL Directos X, PEINTURES DECORATION Ain
3 Julio/2016 GMF ASSURANCES GRE 1 SOL Directos X , JEREMY Aisne
4 Julio/2016 GMF ASSURANCES ASE 2 1Mur Directos Y , JEREMY Aisne
5 Julio/2015 PACIFICA CAV 1Mur Directos Z , ERARD Aisne
6 Abril/2016 PACIFICA DOR 1 SOL Esteticos W , ANNICK Aisne
It's from this data columns that I need to create classes, for example DOR_1_SOL will be a class that contains both kinds of ESTETICO_PARTIDA (direct and aesthetic). ASE_2_1Mur another and so on... My question is: how to proceed in creating those categories? Is there a particular field of studies/science for this?