I am working in a genomics project and I ended up having a huge table with around 800 measurements (cases/rows), around 200 channels (columns/continuous variables) and 5 categories (one categorical column)
I would like to do two things:
- Try to find sub-groups in the different levels of the categorical variable that I already have
- create a new classification of these 800 measurements based only in the information
I have been doing my homework and read about using different strategies like (k-means or PCA) but I have found that it is very useful to get rid of redundant variables. How can I choose these properly?
Someone recommended me to use multinomial regression, any good resource you recommend to have a bite?
I am using R. Many thanks