5

How to simulate data for Multinomial Logistic regression?

For Example i want to generate a high dimensional data set with 90 subjects and 500 independent predictors. The ratio of Classes should given as 30:30:30.

Like, Class 1, Class 2 and Class 3 in equal proportion ?

This is what i have done so far. But i think this is not correct what i want is something different

# High Dimensional Data 
mX = matrix(rnorm(45000), 90, 500)
vCoef1 = rep(0, 500)
vCoef2 = rnorm(500)
vCoef3 = rnorm(500)


# vector of probabilities
vProb = cbind(exp(mX%*%vCoef1), exp(mX%*%vCoef2), exp(mX%*%vCoef3))

# multinomial draws
mChoices = t(apply(vProb, 1, rmultinom, n = 1, size = 1))
y = apply(mChoices, 1, function(x) which(x==1))
dfM = cbind.data.frame(y = apply(mChoices, 1, function(x) which(x==1)), mX)
count(y)

The count of y shows different count for each class here. I am struggling to get equal ratio of classes.

Typically, what i want is 30 significant variables of the 500 predictor variables to be generated from three different normal distributions.

The predictors are to be generated from normal distribution with standard deviation of 1,2 or 3.

Among the 30 significant variables:
The first 10 variables to be generated from N(0,$\sigma^2$) for class 1, N(1,$\sigma^2$) for class 2 and N(2,$\sigma^2$) for class 3.
The Next 10 variables were generated from N(0,$\sigma^2$) for class 1 and N(1,$\sigma^2$) for class 2 and 3.
The next 10 variables were generated from N(0,$\sigma^2$) for class 1 and 2 and N(1,$\sigma^2$) for class 3

And the remaining 470 predictor variables to be generated from one normal distribution N(0,$\sigma^2$).

botloggy
  • 51
  • 4

0 Answers0