0

The Bayes theorem as applied for a machine learning application is $$ p(\theta|D) = \frac{ p(D|\theta) p(\theta) }{ p(D) } $$ where $D$ is the data, $\theta$ are the model parameters, $p(\theta)$ is the prior, $p(\theta|D)$ is the posterior, and $p(D|\theta)$ is (?) the likelihood.

My question is about $D$. Typically, the machine learning (ML) model is fit to a collection of $N$ training data "points" $d_k, k=1\ldots N$, and the likelihood factors over the data.

In the typical ML scenario, does $D$ refer to a single data point, or to the collection of all of the $N$ data points. Or can it be either, or refer even to a subset of data points?

In other words, is the theorem used and valid in these cases: 1. $D \equiv d_3$ (a particular data value), 2. $D \equiv \{ d_1,d_5,d_6 \}$ (a subset of values), 3. $D \equiv \{d_k, k=1\ldots N \}$ (all values).

Steffen Moritz
  • 1,564
  • 2
  • 15
  • 22
basicidea
  • 127
  • 4

1 Answers1

1

What does the Bayes theorem

$$ p(\theta|D) = \frac{ p(D|\theta) \,p(\theta) }{ p(D) } $$

say, is that given the prior $p(\theta)$ and data $D$ you can get the posterior. So if what you have is single datapoint, $D$ is the datapoint, if you use larger dataset, $D$ is the larger dataset (in fact, it works the same if you do this all-at-once, or sequentially). So $D$ is the data that you use for the update.

Tim
  • 108,699
  • 20
  • 212
  • 390
  • So I assume then that D could also be any subset of the data? Seems obvious, but since I am a unsure on this I do not want to assume anything. – basicidea Nov 29 '18 at 02:11
  • @basicidea what I'm saying is that it is an abstract formula, it has nothing to do with what exactly is your data. – Tim Nov 29 '18 at 05:50