1

Lets say I have two datasets $y_0,...,y_T$ and $y^*_0,...,y^*_T$ with like 300 observations (point estimates) each and the observations are like normally distributed.

$y$ is observed from the "Real Generating Process" while $y*$ are the estimates from Model M*.

I am struggling to find a way to calculate the marginal likelihoods straight from theses distributions. Shouldn't it be possible to estimate the probability of $M*$ is being the "Real Generating Process" given data $y$?

Plazi
  • 247
  • 2
  • 9

1 Answers1

1

I assume $y_{i}^{*}$, $i=1,...,T$ are predicted values from a model rather than the estimates of the paramters in the model? If so then say you entertain the model $Y_{ij}=\beta_{0}+\beta_{1}x_{i1}+..+\beta_{p}x_{ip}+\epsilon_{i}$ for the $i^{th}$ person, as the DGP (data generating process) for your observed data $\{y_{i}\}_{i}$. Then of course you can test the hypothesis

$H_{0}:E[Y_{i}]=\beta_{0}+\beta_{1}x_{i1}+..+\beta_{p}x_{ip}\hspace{50pt}i=1,...,N$

subject to an assumption like $\epsilon_{i}\sim N(0,\sigma^2)$, all independent - i.e. with an F-test. Say your observed F statistic has p-value 0.01, then as you may know, this is supposed to be interpreted that given $H_{0}$ is true, and subject to the distributional assumption on the errors, that less than 1% of the time (if we repeated our data collection over and over) would we expect such a good model fit. Is this what you were getting at mathematically?

This tells you your model is pretty good, but is it the true DGP? Well clearly to suggest there is some random number generator out there adding random Gaussian noise onto some linear function is clearly ludicrous. This is obviously a philisophical point but an imortant one. See these links here for a far more eloquent explanation than I can give;

What is the meaning of "All models are wrong, but some are useful"

In regression analysis what's the difference between data-generation process and model?

dandar
  • 638
  • 5
  • 14