2

Consider a population generation question where we are trying to generate couples that conform to a local areas demographics. We know the age distribution for Partner 1, $x_1\sim D_1$, and for Partner 2, $x_2\sim D_2$, but in addition we know that the difference in age is normally distributed, that is $x_1 - x_2 \sim \mathcal{N}(0,\sigma^2)$. I want to simulate the joint distribution on $(x_1,x_2)$, hewing as closely all three distributions as possible.

I know that there are many joint distributions that give the same marginals and I imagine that there exist $D_1$ and $D_2$ such that the problem is impossible, but I just need it to be close (and I would appreciate any insight on the constraints).

Is there a computationally good way to do this? Are there any recommended resources on such problems, either mathematically or computationally?

Nate
  • 116
  • 4
  • 3
    Look up *copulas*, and try some out to see which ones gives results you find reasonable. But the whole set-up is fishy: in the most obvious case, where the Partner 1’s are husbands and the Partner 2’s are wives, the distribution of age differences is probably not normal. There are actual curves from 2003 for England and Wales in Figure 1 of Wilson and Smallwood’s “Age Differences in Marriage and Divorce”, available at https://www.researchgate.net/profile/Steve-Smallwood/publication/23168277_Age_differences_at_marriage_and_divorce/ – Matt F. Jun 24 '21 at 20:23
  • 1
    @MattF. Thank you for the nod towards copulas, those look like exactly what I was interested in. Thank you as well for the reference, I am indeed looking at more complicated pairings and will probably be using something more like the demographic study you linked to when this gets more developed. – Nate Jun 24 '21 at 22:57
  • @PeterO, those are just comments: I find the stated setup of arbitrary or empirical distributions with a nicely normal difference too implausible for the question to have a good answer. – Matt F. Jun 25 '21 at 08:43
  • 1
    @MattF. I suspect that the answer (especially in the more arbitrary cases I will eventually have to consider...) is going to involve some optimization, say linearly interpolating between the distributions and trying to minimize some joint error. But I want to make as good of a crack understanding any relevant analytic framework before that. – Nate Jun 25 '21 at 21:08
  • I do not think the question has an answer considering that (a) three arbitrary marginal distributions are most likely incompatible and (b) the approximation error or utility is not spelled out. – Xi'an Jul 14 '21 at 13:21
  • Possibly related is [this answer](https://stats.stackexchange.com/a/28808/6633) which cites a theorem of H. Cramér that says that $X_1$ and $X_2$ certainly can't be independent unless $D_1$ and $D_2$ themselves happen to be normal distributions. – Dilip Sarwate Jul 14 '21 at 16:03
  • Thank you all for your help. Yes, the problem has simple counter examples if you're trying to find an analytic solution. In this case I'm looking for a computational approximation. We've actually had quite a bit of success iteratively fitting to the marginals; there are also simulated annealing methods and as mentioned above interpolating between marginals that seems promising. I'm still be interested in any information about this problem space anyone might have – Nate Jul 15 '21 at 18:47

0 Answers0