How to classify data into two groups

Asked Nov 02 '20 at 18:57

Active Nov 03 '20 at 07:04

Viewed 64 times

I want to classify U.S. states into two groups, one for rich and another for poor and compare their socioeconomic factors by using hypothesis testing. In addition to median income, I'm hoping to include other factors, such as unemployment rate, etc to classify the states at first.

When I want to classify the states in the beginning before conducting any hypothesis testing, which of the following would you recommend?

Manually classify them (From manual inspection, it is quite difficult to classify some states that fall in between the two groups)
Use unsupervised machine learning technique to create two clusters

Thank you for your suggestions.

edited Nov 03 '20 at 07:04

Adrian Keister

3,664
5
18
35

asked Nov 02 '20 at 18:57

golden

4

How would you apply an unsupervised technique without first specifying, quantitatively, what "rich" and "poor" mean? This sounds circular. – whuber Nov 02 '20 at 19:06
4

[Don't bin your continuous data](https://stats.stackexchange.com/q/68834/1352). Feed them into your algorithm as-is; potentially transform them using (e.g.) restricted cubic splines (see, e.g., Frank Harrell's *Regression Modeling Strategies*) to capture any nonlinearity. – Stephan Kolassa Nov 02 '20 at 19:28
let's say you will use algorithm to classify states into poor/rich, ok - then similar techniques will test hypothesis based on same data? labels should be given outside of algorithm... – quester Nov 02 '20 at 19:53

How to classify data into two groups

0 Answers0