my data set has got 821.000 rows and 18 columns. It is about online clickstream behavior. My variables are number of shopping baskets, number of items in the shopping basket, number of product pages viewed, number of category pages viewed, existing customer, new customer, buy/cancel shopping basket...
Analysis of the descriptive statistics has shown that many of the variables are right-skewed and have different variance. Therefore I have done a z-standardization. Since the range of all variables varies a lot, I wonder if this is a problem of calculating the distances of K-Means? Should the variables be normalized (min-max normalization) after z-standardization?
summary (Baur_WKA_scale)
BASKETS_NZ PIS PIS_AP PIS_DV PIS_PL PIS_SDV
Min. :-8.7663 Min. :-0.7741 Min. :-0.48168 Min. :-0.45676 Min. :-0.3508 Min. :-0.3565
1st Qu.: 0.1139 1st Qu.:-0.5921 1st Qu.:-0.48168 1st Qu.:-0.45676 1st Qu.:-0.3508 1st Qu.:-0.3565
Median : 0.1139 Median :-0.3736 Median :-0.48168 Median :-0.45676 Median :-0.3508 Median :-0.3565
Mean : 0.0000 Mean : 0.0000 Mean : 0.00000 Mean : 0.00000 Mean : 0.0000 Mean : 0.0000
3rd Qu.: 0.1139 3rd Qu.: 0.2089 3rd Qu.:-0.02117 3rd Qu.: 0.07832 3rd Qu.:-0.1749 3rd Qu.:-0.1012
Max. : 8.9942 Max. :17.8668 Max. :32.21453 Max. :26.29717 Max. :24.1894 Max. :35.9036
PIS_SHOPS PIS_SR QUANTITY
Min. :-0.43738 Min. :-0.3764 Min. :-0.54754
1st Qu.:-0.43738 1st Qu.:-0.3764 1st Qu.:-0.54754
Median :-0.38040 Median :-0.3764 Median :-0.26601
Mean : 0.00000 Mean : 0.0000 Mean : 0.00000
3rd Qu.:-0.03852 3rd Qu.:-0.1092 3rd Qu.: 0.01552
Max. :22.63957 Max. :29.2868 Max. :39.42954