Correlation analysis on two different groups of continuous heterogeneous variables with different range/scales in R

Question

I would like to perform in R a initial simple correlation analysis, between a gene signature that i have identified, and some continuous clinical parameters, measured on the same patients, to identify any interesting correlation patters. However, my main problem is the following:

as my gene expression microarray data have already been preprocessed/normalized (rma in R, log2, etc), my clinical variables are continuous but however not normalized/transformed. Moreover, some of the 8 variables in total are highly skewed towards zero, as also having different ranges and "outlier values". For example the range from one variable is from 0.002518 to 27.300000, whereas for another from 0.971517 to 1.432967.

Thus, in your opinion:

1) A correlation approach with Spearman might be robust, due to non-parametric assumptions and more robust to the above possible outlier values of the clinical variables ?

2) If so, i should still perform an initial separate transformation only in the clinical variables, before merging with the gene expression ? Or for example, i could just scale the total variables together and then perform the correlation ?

For example, a log2 transformation might not be so useful, because it results in relatively big negative values for some of these variables.

Please excuse me for my many questions, but my extensive search there are many different opinions, so any idea on this issue would be essential !!

Correlation analysis on two different groups of continuous heterogeneous variables with different range/scales in R

0 Answers0