0

The title isn't clear, That's because the problem itself is weird for me. I was asked to create categories that will later be used in statistical learning. That means that the classification algorithms that will be used will classify feature occurrences in one of the categories I will create.
Here is an example of the data I have :

head(table_final[,28:35])
# A tibble: 6 × 8
  DS_DTMALTA_SINIESTRO    DS_COMPANIA TIPO_HAB_D DS_PARTE_D DS_ESTETICO_PARTIDA                             DS_PROFESIONAL DS_PROVINCIA_SINIESTRO
                 <chr>          <chr>      <chr>      <chr>               <chr>                                      <chr>                  <chr>
1           Abril/2016          LBPAI      DOR 1        SOL            Directos                         WW , RENOVATECH                       Ain
2           Marzo/2016    ALLIANZ C2S      COC 1        SOL            Directos                         X, PEINTURES DECORATION               Ain
3           Julio/2016 GMF ASSURANCES      GRE 1        SOL            Directos                         X , JEREMY                          Aisne
4           Julio/2016 GMF ASSURANCES      ASE 2       1Mur            Directos                         Y , JEREMY                          Aisne
5           Julio/2015       PACIFICA        CAV       1Mur            Directos                         Z , ERARD                           Aisne
6           Abril/2016       PACIFICA      DOR 1        SOL           Esteticos                        W  , ANNICK                          Aisne

It's from this data columns that I need to create classes, for example DOR_1_SOL will be a class that contains both kinds of ESTETICO_PARTIDA (direct and aesthetic). ASE_2_1Mur another and so on... My question is: how to proceed in creating those categories? Is there a particular field of studies/science for this?

nidabdella
  • 169
  • 1
  • 12

1 Answers1

0

Cluster analysis may serve your purpose.

There are many R-packages that implement it.

mzuba
  • 1,078
  • 8
  • 24