I want to apply machine learning and deep learning.
I have categorical data on string. My first option was to perform dummy encoding on the columns (scikitlearn
). But there are some columns that have thousands of categorical values, if i use dummy encoding, this will expand the dataset enormously.
What other alternative do I have? If I simply perform a label encoder and then scale everything between 0 and 1 it could work?