I am trying to extract the dependence structure connecting some Categorical (CV), Ordinal (OV) and Interval (IV) variables (definitions). Basically, I am trying to build the structure into a copula, such that I can apply various marginals (mostly obtained from KDE), and create correlated samples. Here is my homework so far:
- The obvious read about how to find the strength of association between CV, OV and IV (refs: 1, 2, 3, 4)
- A good and thorough read about copulas (refs: 5, 6, Parts I and II of 7, 8, 9) [I would recommend reference 7, 8 and 9 as a great starting point.]
- Required relevant statistical background.
Now, I have come to the point where theory is transformed into practice, and I promptly got a little stuck and seek help. I have the Cramer's V computed for all combinations of data. How do I use it to build the copula? For example:
I have education categories, and employment industry. Education is OV, and employment is CV. The Cramer's V between them is 0.42. A list of some variables I have are: Age (IV), Education (OV), Employment (CV), Unemployment (CV), Disability (CV), Income (IV), Gender (CV), Nationality (CV), House Type (OV), ... And I have Cramer's V for all of them, pair-wise: Age (IV) and Education, Age and Employment, etc.
I can put them in a matrix similar to the correlation matrix, but I am a little doubtful if it functions the same way as a correlation matrix. According to this QnA, there is a difference between correlation and Cramer's V. If the Cramer's V matrix is analogous to a correlation matrix, my task is done. I will simply sample random continuous variables and preserve the Cramer's V in the form of correlation, build my copula, and then reverse it back using the marginal that best describes each of my variables.
However, (and this I feel strongly) Cramer's V is not analogous to correlation, although they both serve the same purpose. I may be wrong, hence I seek guidance. Also, if I am correct, how can I transform the Cramer's V matrix into a correlation matrix?
Ps. Please comment about the proposed technique to build a copula if you think it is incorrect (i.e. generating RV with the same correlation, and using it to build the copula).