My data looks like this:
Color | X_1 X_2 ... X_n
-----------------------
red | 0.5 0.9 ... 0.2
green | 0.7 0.7 ... 0.3
red | 0.8 0.3 ... 0.2
blue | 0.7 0.4 ... 0.2
...
I want to test for a correlation between the categorical variable Color and each interval variable X_i. What is the best way to calculate this (in R)?
For the sake of full disclosure, I'm trying to generate a simulation showing how easy it is to find spurious correlations when you have a small number of data points with a large number of features.