Lets say I want to generate data with particular association matrix. I am taking phi coefficient as a measure of degree of association.
Here are examples using R.
require(psych)
var1 <- sample(c("P", "A"), 10000, replace = TRUE)
var2 <- sample(c("P", "A"), 10000, replace = TRUE)
mydf <- data.frame (var1, var2)
# degree of association
require(psych)
# No association case:
# random variables means 0 association expected
phi(table(var1, var2))
[1] -0.01
# copy of same variable, 1 association expected.
var3 <- var1
phi(table(var1, var3))
Assuming that I have 4 x 4 matrix of phi coefficients between the four categorical variables. Say the following is association matrix (just like correlation matrix)
amat <- matrix (c(1,0.5,0.4, 0.3, 0.5,1,0.5,0.3, 0.4,0.5,1,0.2, 0.3, 0.3, 0.2,1), 4)
rownames(amat) <- c("VarA", "VarB", "VarC", "VarD")
colnames (amat) <- c("VarA", "VarB", "VarC", "VarD")
amat
VarA VarB VarC VarD
VarA 1 0.5 0.4 0.3
VarB 0.5 1 0.5 0.3
VarC 0.4 0.5 1 0.2
VarD 0.3 0.3 0.2 1
Is there any way to generate a data with four variables with say 10000 observations that approximately hold the above association?
I know from the post how we can do similar thing in quantitative variables. The examples does not need to be R specific, I want to know only the idea, which can translated into any programming language.