I would like to have a scholarly discussion, if that is allowed on this site. If it is not allowed, please direct me to the relevant site, or better, please migrate this question.
My question is embarrassingly fundamental. I would like to know, in general, does correlation help or hurt? In what way is a data analysis/statistical problem harder when you have correlation? What are some ways in which correlation can help?
Well, to start off, we have the central limit theorem (or laws of large numbers) for independent random variables. Would you say that having correlated random variables makes a limit law harder? Yes, if we look at it from a mathematician's point of view: the analysis will be harder. But, doesn't it actually help us because now the variables cannot behave erratically? Shouldn't it be easier then to expect a limit and even compute it by looking perhaps at suitable chunks?
Proper correlation helps in detection of noise from signal. If the eigenvalues of the population dispersion matrix are all very close to each other, then the dispersion matrix is close to identity, which implies that the coordinates will be independent mostly. In such a case, every coordinate is same. But with correlation, we know the eigenvalues have to differ in magnitudes, and hence if the largest eigenvalue is large enough, we can get the principal axis explaning a sizeable part of the data. This is the well known base for PCA.
Practitioners complain that a lot of theoretical results deal with independent observations. But, it seems to me that that theory tackles the harder case, a case about which nothing really can be predicted until someone comes along with a result. Correlation should make our lives easier. What are your thoughts on this?