Updated answer to "Data Preparation for Cluster Analysis":
Based on the discussions, data normalization and removing correlation among data are often recommended. References posts:
1) Are mean normalization and feature scaling needed for k-means clustering?
2) Why vector normalization can improve the accuracy of clustering and classification?
3) Is it OK to use correlated variables for cluster analysis?
4) Correlated variables in kmeans clustering
----------------------------------------------------------------------------
Original question: I see many threads discussing about the data standardization for preparing a PCA analysis. I guess PCA and Cluster Analysis are interconnected in nature (correct me if I am wrong). So that is why doing data standardization is often a first step for both of them (reference Quick-R: Clustering analysis). Maybe I can refer to PCA data preparation steps, but it still might be beneficial to make these questions clear:
1) What are the recommended data preparation steps for Cluster Analysis?
2) What are the characteristics of the data sets that are likely to have good clustering results?
Example datasets:
If I want to do cluster analysis on a variety of social-economic factors, including continuous and discrete variables (e.g., housing unit density, population density, green space area, count of schools/health centers numbers, etc.).
My understanding to Questions
Question 1): removing missing data and rescale variables is often a necessity. So I used the scale() function to standarlize data. Is the scale function working for both continuous and discrete variables?
Question 2): PCA analysis indicated 9 principle components would explain 90% variation. I feel like that is not a successful clustering result. Any suggestion on how to reformat/organize the data to better reveal meaningful clusters? And actually what kind of data are likely to have successful clustering results?