4

Say I have a linear model $Y = a + bX$. $X$ represents subject gender.

I've been asked to simulate $Y$, for different groups of subjects (in this example, males and females), using the fitted model estimates.

The aim is to explore the distribution of this simulated data graphically.

Does it make sense to do this? What has been gained by simulating the outcome Y from the fitted model estimates, compared to exploring the observed outcome?

Ferdi
  • 4,882
  • 7
  • 42
  • 62
edstatsuser
  • 423
  • 4
  • 11
  • I think the revised question has changed enough that a) it's answerable and b) the two comments above might be headed down the wrong path. – Matt Krause Nov 09 '17 at 08:47
  • Check https://stats.stackexchange.com/questions/115157/what-are-posterior-predictive-checks-and-what-makes-them-useful/125576#125576 – Tim Nov 09 '17 at 10:24

1 Answers1

1

What simulating amounts to is investigating the underlying distributions. Actually, this amounts to predicting what you would observe if you were to recruit new participants. You can learn a lot of interesting things this way. For instance:

  • Suppose you recruit a new male or a new female. How probable is that she or he is $Y<Y_0$? Or that $Y>Y_0$?
  • Suppose you recruit one new male and one new female. How probable is that the associated $Y$ satisfy $Y_m<Y_f$? Or that $Y_m>Y_f$?
  • Or invert the question. Suppose you sample people at random from the top or bottom 2% of the unconditional distribution. What distribution of males vs. females can you expect? (Hint: tiny differences in means, with equal variances and normal distributions, can leverage to huge differences in representations at the tails.)

This gets even more interesting if the two classes differ not only in the mean, but also in variances (or higher moments).

Stephan Kolassa
  • 95,027
  • 13
  • 197
  • 357