To ask my question, consider the following example. I have data on the price of 100 stocks over time. I want to have good indicators on the price of my stocks without having to look into the data of each stock every day.
The first thing I can do is average all the prices on the stocks at each time, but with this data set the average of all the prices is not well correlated with any of the stocks.
The next thing I could do is group stocks. Lets say I have no apriori knowledge about a natural grouping for the stocks. My thoughts would be to build the correlation matrix. I would then look at stock A and select all the stocks which have a >0.90 correlation with stock A. I would then check that they all mutually have >0.85 correlation and eliminate those that don't. I would then average the prices of those stocks for each time, and the resulting series would be reasonably correlated with the group of stocks.
This would involve a lot of trial and error and manual work though. Is there a more procedural way to go about it?
Very broadly speaking, I seek a set of variables that each maximizes the amount of stocks that are correlated >0.90 with it.