What does it mean to obtain a sample $S$ of size $n$ according to a distribution $D$ over a set $X$ in machine learning?

Question

What @user2974951 says. I don't understand the role of $S$ in particular. Are you asked to sample according to $D$ and call the resulting set of numbers $S$? Or is $S$ an integer, and you are supposed to sample $S$ realizations of $D$? — Stephan Kolassa, Sep 19 '19 at 09:01
Re the edit that introduced "X:" you have changed the terminology in the question (but not in its title!) without really changing the question. Now "X" refers to the population being sampled whereas originally you used "S" to refer to it, and now "S" refers to the sample. This inconsistency will render both your question and my answer unintelligible, so I am taking the liberty of restoring your original notation while trying to respect the additional precision of the edit. — whuber, Sep 20 '19 at 14:50

score 3 · Answer 1 · answered Sep 19 '19 at 14:45

When the sampling is termed "random" it usually means you do the equivalent of the following, as described in more detail at https://stats.stackexchange.com/a/54894/919 and https://stats.stackexchange.com/a/96000/919:

Write down the name of each element of the set $S$ on one or more slips of paper (the "tickets").
Place these tickets into a box in such quantities that the proportion of tickets identified by any $\omega\in S$ equals the probability of $\omega$ assigned by the distribution $D.$
Repeatedly perform the following operation as many times as required $(n):$
- Thoroughly mix the tickets.
- Blindly withdraw one.
- Record the element of $S$ indicated on that ticket.
- Replace the ticket in the box.

The result of this process, the sample, is an ordered list of elements of $S.$ In order to proceed with analysis, usually some array of numbers, known in ML as "features," is associated with each element. Consequently (when every element of $S$ has the same $k$ features) the sample can be represented as an $n\times k$ array of numbers.

Random sampling can be analyzed mathematically by considering the properties of the ticket-mixing process. This is the basis for applying probability theory to statistics. When the sample is not called "random," there is the possibility the tickets were withdrawn from the box without mixing them or even after peeking at them to make the selection. Non-random samples are difficult or impossible to analyze mathematically.

The physical mechanism used to implement these steps may vary. In modern lotteries the "tickets" are balls in a ball machine; Francis Galton invented a set of dice; but most often pseudo-random number generators in computers are exploited to conduct the sampling.

What does it mean to obtain a sample $S$ of size $n$ according to a distribution $D$ over a set $X$ in machine learning?

1 Answers1