I have household survey data with 32 questions about assets the household has or doesn't. I assume that taken together the answers to these asset questions (e.g. how many televisions does the household own) are an indication of wealth, and could be used to make a good index of wealth, e.g. using the first component in a principal components analysis.
What I want to do, however, is to choose 10 of these variables that jointly explain the largest possible proportion of the variation in wealth and use those as the questions in a shorter questionnaire that I am developing. What is the best way of doing this?
One possibility that has occurred to me is to calculate the wealth index using PCA then regress this on every possible combination (60 million or so I think) of 10 variables from the 32, and see which gets the highest R-squared. I'm hoping there's an easier way.
Ideally I'm looking to implement this in Stata.