The difference between sample and case in machine learning and statistics?

Question

I find that in this question and this API of Keras a sample means a case in statistics as the documentation of that API states that:

Optional Numpy array of weights for the test samples, used for weighting the loss function. You can either pass a flat (1D) Numpy array with the same length as the input samples (1:1 mapping between weights and samples), or in the case of temporal data, you can pass a 2D array with shape (samples,sequence_length), to apply a different weight to every timestep of every sample. This argument is not supported when x is a dataset, instead pass sample weights as the third element of x.

As I understand, the sample here aligns very well with that in the aforementioned question, then my question is why we refer to a sample in machine learning a case in statistics? In statistics, a sample compromises multiple cases and is a part of a population.

score 2 · Accepted Answer · answered Aug 15 '20 at 10:07

2

I think the main explanation is simply different traditions. They started pretty much randomly with different terminology and they just continue. I do not think there is any deep, philosophical explanation. By the way, the terminology in applications of statistics and machine learning also varies across the fields; some (biology?) probably use samples instead of sample for statistical problems as well, as they just copy the term(s) from their domain.

answered Aug 15 '20 at 10:07

Richard Hardy

54,375
10
95
219

1

Elsewhere I wrote: In statistics, a sample includes several values, and repeated sampling is a high theoretical virtue, but one rarely practised, except by simulation ... In many sciences, a sample is a single object, consisting of a lump, chunk or dollop of water, soil, sediment, rock, blood, tissue, or other substances ...; far from being exceptional, taking many samples may be essential for any serious analysis. Here every field's terminology makes perfect sense to its people, but translation is sometimes needed. – Nick Cox Aug 15 '20 at 12:03
https://stats.stackexchange.com/questions/202879/what-misused-statistical-terms-are-worth-correcting/202886#202886 – Nick Cox Aug 15 '20 at 12:03

The difference between sample and case in machine learning and statistics?

1 Answers1