How to generate data in order to fit the i.i.d. assumption in many machine learning applications?

Asked Oct 04 '19 at 15:15

Active Oct 04 '19 at 17:35

Viewed 136 times

In many examples in data science and machine learning, the training data and the target is assumed to be generated in an i.i.d. fashion.

Example:

I'm not curious as to why we need the data to be i.i.d. This is obviously to simplify the math.

But how can this assumption be satisfied in real life? How do I ensure that the data set that I generate use for training is i.i.d.

edited Oct 04 '19 at 17:35

asked Oct 04 '19 at 15:15

Curaçao Hajek

1

It's hard to see what you're looking for in an answer. Ultimately we use either a computer pseudo random number generator or physical devices like coin flips, dice, radioactivity measurements, and so on to generate random data. Are you asking about that? Or perhaps asking how we can check that these devices produce data that are sufficiently independent for simulation? Or something else? – whuber Oct 04 '19 at 17:39
@whuber I'm genuinely confused as to why this question is unclear. We train a neural network with some data (along with targets). And in many analysis, it is assumed that the data was generated using an independent and identically distributed process. These data ranges from scalar variables, video frames, images of people, image of stars in the sky, sequences of voices, alphabets, control signal to a car,, basically anything. I am merely asking how to generate data in an i.i.d. fashion. How do you do it physically? (See the first post in the Linked) – Curaçao Hajek Oct 04 '19 at 22:08

0 Answers0