The Bayes theorem as applied for a machine learning application is $$ p(\theta|D) = \frac{ p(D|\theta) p(\theta) }{ p(D) } $$ where $D$ is the data, $\theta$ are the model parameters, $p(\theta)$ is the prior, $p(\theta|D)$ is the posterior, and $p(D|\theta)$ is (?) the likelihood.
My question is about $D$. Typically, the machine learning (ML) model is fit to a collection of $N$ training data "points" $d_k, k=1\ldots N$, and the likelihood factors over the data.
In the typical ML scenario, does $D$ refer to a single data point, or to the collection of all of the $N$ data points. Or can it be either, or refer even to a subset of data points?
In other words, is the theorem used and valid in these cases: 1. $D \equiv d_3$ (a particular data value), 2. $D \equiv \{ d_1,d_5,d_6 \}$ (a subset of values), 3. $D \equiv \{d_k, k=1\ldots N \}$ (all values).