This is more of a conceptual question rather than a methodological one I guess.
Let's assume that we have a dataset coming from a questionnaire and after some feature scaling we run a PCA to reduce the dimentionality. The results indicate that we need "a lot of" principal components (let's say close to the original number of questions) in order to capture let's say 80% of the variability. What would that indicate? What are the most probable scenarios?
- The questionnaire needs smarter questions so that more info is captured by fewer questions?
- The population consists of very complicated individuals?
- Some questions (let's say the ordinal ones) require more levels?
Edit: Some additional details about the project
The population we are investigating are all the existing customers of a specific platform. Our random sample consists of approximately 4.2K individuals from the population who answered a questionnaire of ~80 questions (behavioural, personality, preferences etc). The objective is to i) understand the persona/behavioural groups that exist in our customer database ii) collect some golden questions to be able to classify more users afterwards without having to ask them all 80 questions again. Most of these questions are ordinal and some of them are categorical.
I've already done an initial clustering using PAM and Gower's distance but I wanted to look deeper and try more stuff. My plan is to run a Hierarchical K-means clustering after a PCA and then try some SOMs as well. My plan afterwards is to train a classification model to be able to classify the future users.
When I did the PCA in each category (I think 8 in total) I saw that for most cases the "best" PC had close to 12% variability explained which I found kind of low and it made me a bit curious. Hence the question