Pearson correlation of variables pooled across all samples vs mean correlation coefficients of samples in factor levels

Question

I have a multivariate dataset with 12 variables. My samples are Sites (factor of 10 levels) at each Year (factor of 13 levels) and so overall pooled together I have 130 samples.

I want to test for correlation between variables but I am not sure whether I should correlate variables of all sites pooled together (n=130), or correlate variables for each Site (n = 13) and calculate the mean of correlations coefficients across all Sites. I find the latter hard to interpret but not sure it's not the right way to go.

In general I want to be able to say something about the relationship between variables, but these are vary a lot between sites.

In which case I should use each method?

As an environmental scientist (of sorts) I know what you mean when you say 130 samples. As a statistical person (of sorts) I advise that statistical people expect you to say that you have a sample of size 130. That's trivial, but the big deal here is graphics, graphics, graphics. Combining different sites and years is natural scientifically, but all sorts of situations can arise. The check on whether an overall correlation makes sense is whether sites behave in similar ways across years and (exaggerating slightly) only graphics that show trajectories for each site will tell you that. — Nick Cox, Apr 29 '20 at 10:39
https://stats.stackexchange.com/questions/190152/visualising-many-variables-in-one-plot contains some ideas that may not be obvious: separate scatter plots and time series plots for each site, but with backdrop data for the other sites. — Nick Cox, Apr 29 '20 at 11:26
@NickCox, your advice is absolutely right. I've already done that and plotted the different variables for different sites and I know that Sites can vary a lot in their temporal trends of some variables, while other variables are more similar across sites. However, if I want to say something about the relationships between variables in the study area, which approach is more correct? pooling all samples together? or averaging the correlation coefficients of across sites? the results of these approaches vary slightly. — opel, Apr 30 '20 at 01:32
Neither is more correct. The science of the problem should tell you what makes sense. I've calculated thousands of correlations and never yet regarded an average correlation as an answer, but projects vary. — Nick Cox, Apr 30 '20 at 07:43
@NickCox, this makes a lot of sense. I guess it is case dependent. I will have to give this issue some more thought in the context of my study. Appreciate your help! Thanks! — opel, Apr 30 '20 at 10:24

Pearson correlation of variables pooled across all samples vs mean correlation coefficients of samples in factor levels

0 Answers0