I.I.D. for the layperson?

Asked May 15 '20 at 19:26

Active May 15 '20 at 23:21

Viewed 65 times

Question: For the layman, what does it mean for data (say $n$ samples covering $m$ variables) to be identically distributed, and how is it practically achieved when conducting machine learning?

So let's say each sample measures X and Y predictors, and I want to predict outcome Z.

To satisfy IID, should I be aiming for $X$ and $Y$ to look the same when I plot their estimated density functions? Or does it mean something else entirely?

I have read about "normalizing" your sample data by subtracting the mean and dividing by the standard deviation. Is the purpose of this to fulfill IID?

The context of my question is machine learning, specifically XGBoost for a regression problem. It is performing in a mediocre to poor fashion and I want to understand why. Perhaps my data is not IID. I thought it didn't matter for tree-based models, but I am obviously missing something.

edited May 15 '20 at 23:21

Jeremy Miles

13,917
6
30
64

asked May 15 '20 at 19:26

rocksNwaves

1

You probably shouldn't be using XGBoost (or other sophisticated ML frameworks) if you can't properly define IID random variables. – Forgottenscience May 15 '20 at 20:34
1

@Forgottenscience Noted and dismissed. – rocksNwaves May 15 '20 at 20:36
I marked your question as a duplicate of two other questions. I feel that they both answer your question. If not, please tell us why? – Tim May 15 '20 at 22:23

I.I.D. for the layperson?

0 Answers0