2

I'm watching this video on Bayesian modelling for the stock market by Thomas Wiecki, Thomas has a slide with two posterior distribution over the mean parameter in his stock return model. Around 18:26 we see this slide:

this slide

The phrase “in-sample” and “out-of-sample” in regard to a posterior distribution, both in regard to the same latent variable, confuses me.

If one wants to look at the latent variables, the way I understand it, one can:

  • Look at the prior of a latent variable, which is designed by the user.
  • Look at the posterior of a latent variable, which one can obtain after observing some data.

At no point in the Bayesian workflow does an “out-of-sample” vs “in-sample posterior” come up, just posterior vs prior.

Question: What does “in-sample posterior” vs “out-of-sample posterior” mean?

MarianD
  • 1,493
  • 2
  • 8
  • 17
user27886
  • 585
  • 5
  • 14
  • I haven't watched the video, but I'd imagine he is referring to a posterior distribution estimated under training data (in-sample in finance") and test data (out-of-sample). – Forgottenscience Oct 30 '20 at 20:09
  • @Forgottenscience One might think, but in both cases one would normally use the same posterior distribution of a latent variable for in-sample and out-of-sample predictions and in his presentation they are clearly different. Look at the image of the slide I posted if you haven't already. Updating the posterior using out-of-sample data would be inappropriate, otherwise it would be called in-sample data. – user27886 Oct 31 '20 at 02:28

1 Answers1

3

This is explained earlier in the talk, from around 3:12 (link).

enter image description here

Quantopian users train financial models on historical data ("in-sample period"), and test them on new data ("out-of-sample period"). Due to overfitting, the models usually do better on the in-sample data.

The speaker's job is to evaluate how well these financial models perform, by considering both their in-sample and out-of-sample performance. He fits a model to evaluate the performance data generated by the users' models. This is where the terminology gets confusing!

Eoin
  • 4,543
  • 15
  • 32