I have a large dataset with mixed type of data (example):
Age | Price | Town Size | Interests | ||||
---|---|---|---|---|---|---|---|
Small | Middle | Big | Traveling | Cooking | TV | ||
21 | 0 | 1 | 0 | 0 | 1 | 1 | 1 |
34 | 100 | 0 | 1 | 0 | 0 | 1 | 0 |
81 | 200 | 0 | 0 | 1 | 1 | 1 | 0 |
54 | 0 | 0 | 0 | 1 | 1 | 0 | 1 |
and I want to perform a cluster analysis (hierarchical) and I am not sure about the metric I should use. I have searched that the Gower metric can be the way (Hierarchical clustering with mixed type data - what distance/similarity to use?), but I really want to use the weighted variables as I want to have only the age, price, town and interests having the same contribution in the final results and not performig analysis with the age, small town and big town on the same level. Is the Gower distance metric the right one? Is there a function in Python performing the Gower distance with variables weight adjustment? Is there anything else I can do (like dataset modification)?