I want to classify U.S. states into two groups, one for rich and another for poor and compare their socioeconomic factors by using hypothesis testing. In addition to median income, I'm hoping to include other factors, such as unemployment rate, etc to classify the states at first.
When I want to classify the states in the beginning before conducting any hypothesis testing, which of the following would you recommend?
- Manually classify them (From manual inspection, it is quite difficult to classify some states that fall in between the two groups)
- Use unsupervised machine learning technique to create two clusters
Thank you for your suggestions.