Imagine that I have 100k users and 1k categories. For each user, up to 5 categories, I know how much money they have spent. Obviously my data is very sparse.
Now I want to group users by the money they spend on different categories. This way, I could group together users who are 'cheap' in some certain categories and 'snobby' in some other categories.
After standardizing the values by calculating the number of times of standard deviation they deviate from the category means, I have tried k-means clustering but I ended up one cluster getting bigger and bigger while others shrink to clusters that contain only few users as the number of iterations k-means do increases.
How can I tackle clustering with sparse data problem? Any pointers, suggestions or ideas are appreciated.