5

I'm fairly new to statistics and R, and I hope to get your help on this issue. I have a dataset from an experiment with consists of the following variables:

IV1: Age (interval) IV2: Gender (factor) IV3: Condition (factor) IV4: Trait Score (ordinal 10-50) DV1: Reported Happiness (ordinal 0-8) DV2: Reported Intimacy (ordinal 0-9)

#Creating variables
Age <- c(28, 33, 23, 65, 43, 22, 19, 20, 20, 18)
Gender <- c(1, 2, 2, 2, 1, 1, 2, 2, 2, 1)
Condition <- c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2)
TraitScore <- c(16, 48, 43, 33, 32, 31, 25, 26, 28, 37)
ReportedHappiness <- c(8, 0, 0, 4, 1, 7, 3, 3, 4, 4)
ReportedIntimacy <- c(9, 9, 0, 4, 2, 8, 8, 0, 5, 2) 

#Changing a few classes of variables
Gender <- as.factor(Gender)
Condition <- as.factor(Condition)

#Creating dataframe
Data <- data.frame(Age, Gender, Condition, TraitScore, ReportedHappiness, ReportedIntimacy)

So, my issue is that I would like to do what corresponds to a correlation matrix between all IV's and DV's in the dataset, but how do that when I have a mixture of different types of variables?

Ps. dumbing down is greatly appreciated!

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Marc Andersen
  • 53
  • 1
  • 1
  • 3
  • You don't since correlation does not work for categorical variables, you have to do something else with those, t-tests and such. – user2974951 Oct 02 '18 at 09:24
  • Sure, that's why I wrote "...what corresponds to a..." –  Oct 02 '18 at 09:46
  • Nice example. So basically you would like to vary correlation method (pearson, spearman etc) depending on the type of variable? If you could be more precise in what methods you want to correspond to your type of variables it would be easier to answer this question programmatically. If this is a statistical question I would suggest StackExchange. – FilipW Oct 02 '18 at 10:26
  • Perhaps this Q&A https://stats.stackexchange.com/questions/73065/correlation-coefficient-for-non-dichotomous-nominal-variable-and-ordinal-numeric/73118#73118 might help you. – mdewey Oct 02 '18 at 16:03

1 Answers1

2

First, to find correlation coefficients suitable for different variable types there are already many posts here, so I will only link some: continuos/categorical, continuous/ordinal, binary/ordinal, categorical/categorical and others (just search this site).

Then, if you want, you could put this various correlation coefficients into a matrix as some covariance matrix (you would also have to decide on how to generalize the variances to put on the diagonal). This could be just fine as a way of presenting this information in a compact way. But is it really a covariance matrix? That is, does it have the usual properties of a covariance matrix? The answer is no. It is not necessarily positive definite, so using it in any type of procedure which requires a covariance matrix as input would be, at least, problematical.

So if you want more than just a compact presentation of some coefficients, you are better of telling us what is your real analytical goal, and then search for some way of answering that directly. You could ask that as a new question (linking back to this one).

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467