Clarifications on I.I.D. assumption in machine learning

Question

In this question, it was stated that the assumption of i.i.d. for data comes in the form of $$(X_i,y_i)∼P(X,y),∀i=1,...,N \\(X_i,y_i) \;independent\; of \;(X_j,y_j),\;∀i≠j∈{1,...,N} $$ I am clear with the definition of i.i.d. and its concepts, however it is still rather unclear to me how this assumption is applicable.

To illustrate my confusion with an example, say we are looking at a classification problem, where $X$ is the input feature and $y$ is the label.

When we generate $n$ samples for training, I would think of it as drawing $(X_i,y_i)$ from the joint distribution of $X$ and $y$. How is the concept of independent and identical distribution relevant here then? Aren't $(X_i,y_i),\; for \; i =0,...,n$ all being drawing from the same distribution of $X$ and $y$.

The [related answer](https://stats.stackexchange.com/questions/445453/445477#445477) might also help. — Ben, Sep 30 '21 at 10:14

score 0 · Accepted Answer · answered Sep 30 '21 at 06:23

0

The details are discussed in the On the importance of the i.i.d. assumption in statistical learning thread, but answering your question: the $n$ samples your observed are considered as random variables. So all the $(X_i, y_i)$ pairs are thought as $n$ random variables. Only random variables can be independent or have probability distributions, so if they are "independent and identically distributed", we must be talking about random variables. To be able to think about your data in probabilistic terms, you need to think of them as random variables.

answered Sep 30 '21 at 06:23

Tim

108,699
20
212
390

The main part I am confused is that, say we are doing an image classification task for animals, so $X$ would be the input feature and therefore all possible image vector that contain animals, for each training data, isn't it just a sample from this population? Why do we need to consider each sample as a random variable on its own? – tangolin Sep 30 '21 at 07:02
@tangolin because otherwise you cannot build a probabilistic model for it. If those are "just numbers" they don't have probability distributions etc. You need to think of them as random variables so you can put them into the probabilistic framework. – Tim Sep 30 '21 at 07:05

Clarifications on I.I.D. assumption in machine learning

1 Answers1