how do I know what is the relative probability of the process of generation being a gaussian MM with this particular parameter combination instead of say a neural network with that parameter configuration.
Your $\theta$ is the set of parameters in your model. So for a Gaussian mixture model they are the means, covariances, and mixing parameters. In a Neural Network they are the weights and biases. These are totally different sets of quantities, so there's no reason to think that the $P(\theta)$ in either case will be related, either a priori or after seeing $D$.
$P(D \mid \theta)$ is the part of the formula that will be realised as a mixture model or a network, or whatever. But you have to decide, otherwise your prior is for the wrong quantities, which makes no sense.
And further it is intuitive to think of one process generating the data, whose parameters we are guessing. But instead here we have multiple processes generating the data in tandem, i.e. a sense of a true model is lost.
You already think of the data as being potentially generated by different values of $\theta$ before any Bayesian questions arise. After all, the likelihood tells you how likely the data would have been generated under different sets of values. But your 'in tandem' idea suggests you think they all do it 'all at once' in the Bayesian case, so there is no sense of 'one true model'. That's a mistake. Maybe think of it like this:
Call the 'true model parameters' $\theta_0$. Bayesians and everybody else can agree that these are the things we want to know about. Then $D$ is actually a sample from $P(D \mid \theta_0)$. We just don't happen to know the $\theta_0$ is.
Our $P(D \mid \theta)$, where $\theta$ is any setting of parameters, just specifies the mechanism by which $D$ is assumed to be generated if we knew what the parameters were - a 'forward model' if you like. Often it's straightforwardly physical, think of the $\theta$ as settings in a control panel. Bayesian methods start with $P(\theta)$ - your opinions or knowledge about what $\theta_0$ might be before seeing $D$, and then condition on $D$ to get $P(\theta \mid D)$ - your new opinions or knowledge about what $\theta_0$ is after seeing $D$.
The sum you present above is actually mostly useful just as a normalising constant on the way to getting $P(\theta \mid D)$ which actually is useful. It's our updated beliefs about $\theta_0$. It has some other roles, as 'evidence', but for the purposes of your question these aren't relevant.