I'm trying to run a cluster analysis on a large dataset (70k+ observations to cluster) with mixed variables (numeric, ordinal, binary and nominal). I don't think I can create the distance matrix using SAS over the entire dataset. So, I have tried to run a hierarchical clustering using Gower's distance over a subsample of my data. I've got some questions.
If the above method (hierarchical clustering of a subsample) is appropriate, how can I then score the rest of the observations and assign (classify) them to the clusters obtained?
If the above method isn't good, what are other recommended methods to cluster a large dataset with mixed variables? (Available in SAS if possible.)
How can I check for correlations/multicolinearity among mixed variables? I don't know if running something like PCA or factor analysis makes sense with categorical data.