Questions tagged [sample]

a sequence of objects or individuals collected from a larger (possibly infinite) population or process.

In practice it is rarely possible to take measurements from every object composing the population of interest. Hence, statistics, in general, is concerned with using samples to make inferences about the parameters governing the population.

There exist different strategies for selecting representative samples from the population, such as random sampling, stratified sampling, and longitudinal sampling, among others.

917 questions
103
votes
25 answers

Locating freely available data samples

I've been working on a new method for analyzing and parsing datasets to identify and isolate subgroups of a population without foreknowledge of any subgroup's characteristics. While the method works well enough with artificial data samples (i.e.…
EAMann
  • 163
  • 3
  • 4
  • 7
45
votes
5 answers

What is the difference between a population and a sample?

What is the difference between a population and a sample? What common variables and statistics are used for each one, and how do those relate to each other?
Baltimark
  • 2,028
  • 3
  • 19
  • 20
28
votes
3 answers

What if your random sample is clearly not representative?

What if you take a random sample and you can see it is clearly not representative, as in a recent question. For example, what if the population distribution is supposed to be symmetric around 0 and the sample you draw randomly has unbalanced…
Joel W.
  • 3,096
  • 3
  • 31
  • 45
26
votes
7 answers

Do we need hypothesis testing when we have all the population?

From what I understand hypothesis testing is done to identify if a finding in the sample population is statistically significant. But if I have a census data, do we really need hypothesis testings? I was thinking may be I should perform multiple…
22
votes
9 answers

How do I figure out what kind of distribution represents this data on ping response times?

I've sampled a real world process, network ping times. The "round-trip-time" is measured in milliseconds. Results are plotted in a histogram: Ping times have a minimum value, but a long upper tail. I want to know what statistical distribution this…
20
votes
2 answers

What is the difference between random variable and random sample?

These two expressions confused me a lot when I was learning statistics. It seems to me that they are totally different things. A random sample is to randomly take a sample from a population, whereas a random variable is like a function that maps the…
18
votes
1 answer

Why do we use term “population” instead of “Data-generating process”?

I have always been confused about the use of the term “population” in statistics. In my first statistics course I was taught that we need a sample, because surveying the whole population is too costly. So there is the whole population and there is…
18
votes
2 answers

How do sample weights work in classification models?

What does it mean to provide weights to each sample in a classification algorithm? How does a classification algorithm (eg. Logistic regression, SVM) use weights to give more emphasis to certain examples? I would love going into the details to…
17
votes
2 answers

What is the difference between sample variance and sampling variance?

What is the difference between sample variance and sampling variance? They seem same. Aren't they?
ilhan
  • 932
  • 3
  • 11
  • 19
16
votes
3 answers

Bootstrap: the issue of overfitting

Suppose one performs the so-called non-parametric bootstrap by drawing $B$ samples of size $n$ each from the original $n$ observations with replacement. I believe this procedure is equivalent to estimating the cumulative distribution function by the…
James
  • 2,600
  • 1
  • 14
  • 26
15
votes
1 answer

Large sample asymptotic/theory - Why to care about?

I hope that this question does not get marked "as too general" and hope a discussion gets started that benefits all. In statistics, we spend a lot of time learning large sample theories. We are deeply interested in assessing asymptotic properties of…
Sam
  • 2,104
  • 19
  • 29
15
votes
2 answers

How can the central limit theorem hold for distributions which have limits on the random variable?

I've always taken issue with, and never been given a good answer, for how it is possible that the central limit theorem - the classical version where the distribution of sample means approaches normality - can apply to say a Poisson or Gamma…
13
votes
4 answers

Is any quantitative property of the population a "parameter"?

I'm relatively familiar with the distinction between the terms statistic and parameter. I see a statistic as the value obtained from applying a function to the sample data. However, most examples of parameters relate to defining a parametric…
Jeromy Anglim
  • 42,044
  • 23
  • 146
  • 250
13
votes
5 answers

If not a Poisson, then what distribution is this?

I have a data set containing the number of actions performed by individuals over the course of 7 days. The specific action shouldn't be relevant for this question. Here are some descriptive statistics for the data set: $$ \begin{array}{|c|c|}…
Dcook
  • 733
  • 1
  • 7
  • 8
13
votes
4 answers

How to take many samples of 10 from a large list, without replacement overall

I've got a large set of data (20,000 data points), from which I want to take repeated samples of 10 data points. However, once I've picked those 10 data points, I want them to not be picked again. I've tried using the sample function, but it doesn't…
robintw
  • 1,977
  • 4
  • 24
  • 23
1
2 3
61 62