What exactly is $p(x,y)$ in the context of iid assumption in machine learning?

Asked Jan 20 '20 at 09:33

Active Jan 12 '21 at 13:45

Viewed 154 times

In machine learning iid assumption means that examples in the dataset are independent and drawn from the same probability distribution (i.e., identically distributed).

Here, the probability distribution is denoted by $p(x,y)$ where $x$ is vector and $y$ is a scalar. I have a confusing understanding $p(x,y)$. Are both $x$ and $y$ random variables? When people say iid, are they referring to $x$ or $y$ or both? Or do we have here a single random variable?

edited Jan 12 '21 at 13:45

kjetil b halvorsen

63,378
26
142
467

asked Jan 20 '20 at 09:33

Sanyo Mn

1

**$x$** is a realisation of a random vector **$X$** (i.e. multivariate random variable) with (say) $k$ components $x_1,...,x_k$ and $y$ is a random variable $Y$(i.e. a random vector of size $1$). $(x,y)=(x_1,...,x_k,y)$ is a random vector with $k+1$ components, the first $k$ of which form a realisation of **$X$**. This Wikipedia article will help you become acquainted with random vectors: https://en.wikipedia.org/wiki/Multivariate_random_variable. – Mickybo Yakari Jan 20 '20 at 10:28
@MickyboYakari Can we say both X and Y iid? – Sanyo Mn Jan 20 '20 at 11:08
1

No because being i.i.d. is a property that applies to samples. One would have to say that $(\boldsymbol{X_1},Y_1),(\boldsymbol{X_2},Y_2),...,(\boldsymbol{X_n},Y_n)$ are i.i.d., where $n$ is the sample size and each index pertains to one observation in the sample. We basically sampled $n$ réalisations of the random vector $(\boldsymbol{X},Y)$ independently. – Mickybo Yakari Jan 20 '20 at 11:19

What exactly is $p(x,y)$ in the context of iid assumption in machine learning?

0 Answers0

Linked