Can I use log-likelihood distance on data of only continuous variables?

Question

I have to run a SPSS two-step cluster analysis. All my 4 variables are continuous scalar standardized parameters (with normal distribution). The dataset includes 10,000 cases.

SPSS suggest to use euclidean distance with such a dataset, but the resuls are not significant (2 clusters: 99% and 1%), while using the log-likelihood distance option the clusters seem much more meaningful (both if I specify a fixed number of clusters and if I do not).

Question:

Which may be the reason of such a meaningless results with euclidean distance? maybe noise handling? And is it incorrect to use the log-likelihood distance even if my variables are all continuous?

score 1 · Accepted Answer · edited Apr 13 '17 at 12:44

1

You can use log-likelihood distance with variables all continuous; in fact it is the default.

It is difficult to say without the data why your euclidean results seem poor. Automatic detection of number of clusters with BIC or AIC criterions is probably somewhat more apt with log-likelihood distance because they are based on the same paradigm as it. With euclidean distance, I recommend you to specify various fixed number of clusters and check if the clusters are meaningful to you. Also, check if your 4 variables are highly correlated (two-step cluster method assumes no or weak correlation).

edited Apr 13 '17 at 12:44

Community

1

answered Nov 13 '11 at 04:21

ttnphns

51,648
40
253
462

the variables are not correlated. thank you for the suggestion: I will compare several clustering – en. Nov 13 '11 at 12:59
You might take interest in various clustering criterions which suggest optimal number of clusters. Find them on my web-page. – ttnphns Nov 13 '11 at 13:43

score 1 · Answer 2 · answered Nov 13 '11 at 05:51

1

You may want to use the silhouette plots (available with the STATS CLUS SIL exension command) to get some graphical insight into the quality of your clusters. You can get this command and prerequisite Python Essentials from the SPSS Community website at www.ibm.com/developerworks/spssdevcentral.

answered Nov 13 '11 at 05:51

JKP

1,349
10
7

Yes, I am checking also the silhouette plots and the best results are still with log-likelihood distance. (the IBM link you provided seem not to work, is there a mispelling maybe?) – en. Nov 13 '11 at 12:57

Can I use log-likelihood distance on data of only continuous variables?

2 Answers2