I built a web scraper that drew in a bunch of data and I have more qualitative variables than I expected. Originally there were just a few quantitative variables that I had intended to consider but, from a statistical standpoint, I understand there would be an introduction of bias if the design matrix was erroneously reduced due to laziness or personal preference.
What I'm facing is numerous columns in my data matrix that have 20 to 40 unique values. I'm wondering what you all would do in such a situation? Do you create the dummy variables and update the design matrix or is there a more efficient way to do this?
Note, these values are not ordinal. For example, one of the columns is a vehicle's 'Front Suspension Type' and another one is 'Rear Suspension Type.'
Thoughts? Please let me know if you need additional info and I appreciate any feedback in advance.
Edit: Additionally, there doesn't seem to be immense variety. Most of the values have around 10 entries while, for rear suspension type, the max entries is around 700 (for multi-link).