0

In machine learning iid assumption means that examples in the dataset are independent and drawn from the same probability distribution (i.e., identically distributed).

Here, the probability distribution is denoted by $p(x,y)$ where $x$ is vector and $y$ is a scalar. I have a confusing understanding $p(x,y)$. Are both $x$ and $y$ random variables? When people say iid, are they referring to $x$ or $y$ or both? Or do we have here a single random variable?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Sanyo Mn
  • 799
  • 6
  • 15
  • 1
    **$x$** is a realisation of a random vector **$X$** (i.e. multivariate random variable) with (say) $k$ components $x_1,...,x_k$ and $y$ is a random variable $Y$(i.e. a random vector of size $1$). $(x,y)=(x_1,...,x_k,y)$ is a random vector with $k+1$ components, the first $k$ of which form a realisation of **$X$**. This Wikipedia article will help you become acquainted with random vectors: https://en.wikipedia.org/wiki/Multivariate_random_variable. – Mickybo Yakari Jan 20 '20 at 10:28
  • @MickyboYakari Can we say both X and Y iid? – Sanyo Mn Jan 20 '20 at 11:08
  • 1
    No because being i.i.d. is a property that applies to samples. One would have to say that $(\boldsymbol{X_1},Y_1),(\boldsymbol{X_2},Y_2),...,(\boldsymbol{X_n},Y_n)$ are i.i.d., where $n$ is the sample size and each index pertains to one observation in the sample. We basically sampled $n$ réalisations of the random vector $(\boldsymbol{X},Y)$ independently. – Mickybo Yakari Jan 20 '20 at 11:19

0 Answers0