Questions tagged [small-sample]

Refers to statistical complications or problems due to having few data. If your question is about a small sample relative to the number of variables, please use the [underdetermined] tag instead.

Only having access to a small sample may cause certain problems, for example: statistical tests may be underpowered; assumptions may be more difficult to verify; if data are not sufficiently normal, the Central Limit Theorem may not be able to ensure the normality of relevant sampling distributions, etc.

603 questions
124
votes
7 answers

How to choose between t-test or non-parametric test e.g. Wilcoxon in small samples

Certain hypotheses can be tested using Student's t-test (maybe using Welch's correction for unequal variances in the two-sample case), or by a non-parametric test like the Wilcoxon paired signed rank test, the Wilcoxon-Mann-Whitney U test, or the…
81
votes
4 answers

Can bootstrap be seen as a "cure" for the small sample size?

This question has been triggered by something I read in this graduate-level statistics textbook and also (independently) heard during this presentation at a statistical seminar. In both cases, the statement was along the lines of "because the sample…
James
  • 2,600
  • 1
  • 14
  • 26
58
votes
6 answers

Warning in R - Chi-squared approximation may be incorrect

I have data showing fire fighter entrance exam results. I am testing the hypothesis that exam results and ethnicity are not mutually independent. To test this, I ran a Pearson chi-square test in R. The results show what I expected, but it gave a…
49
votes
6 answers

Best method for short time-series

I have a question related to modeling short time-series. It is not a question if to model them, but how. What method would you recommend for modeling (very) short time-series (say of length $T \leq 20$)? By "best" I mean here the most robust one,…
Tim
  • 108,699
  • 20
  • 212
  • 390
48
votes
5 answers

What can we say about population mean from a sample size of 1?

I am wondering what we can say, if anything, about the population mean, $\mu$ when all I have is one measurement, $y_1$ (sample size of 1). Obviously, we'd love to have more measurements, but we can't get them. It seems to me that since the sample…
thedu
  • 505
  • 4
  • 6
29
votes
2 answers

How to cope with exploratory data analysis and data dredging in small-sample studies?

Exploratory data analysis (EDA) often leads to explore other "tracks" that do not necessarily belong to the initial set of hypotheses. I face such a situation in the case of studies with a limited sample size and a lot of data gathered through…
chl
  • 50,972
  • 18
  • 205
  • 364
28
votes
4 answers

How to perform Student's t-test having only sample size, sample average and population average are known?

Student's $t$-test requires the sample standard deviation $s$. However, how do I compute for $s$ when only the sample size and sample average are known? For example, if sample size is $49$ and sample average is $112$, I will then attempt to create a…
Kit
  • 423
  • 1
  • 4
  • 8
25
votes
1 answer

Using bootstrap under H0 to perform a test for the difference of two means: replacement within the groups or within the pooled sample

Suppose that I have a data with two independent groups: g1.lengths <- c (112.64, 97.10, 84.18, 106.96, 98.42, 101.66) g2.lengths <- c (84.44, 82.10, 83.26, 81.02, 81.86, 86.80, 85.84, 97.08, 79.64, 83.32, 91.04, 85.92, …
24
votes
2 answers

Mother milk of 6 Corona-positive (COVID-19) women does not contain the virus - can we make a confidence statement about this?

I am asking this question because I believe it would be great if the statistics community could make a contribution to solving this serious puzzle until more evidence is available. The UK Royal Office of Obstetricians and Gynecologists publishes…
24
votes
7 answers

Appropriate normality tests for small samples

So far, I've been using the Shapiro-Wilk statistic in order to test normality assumptions in small samples. Could you please recommend another technique?
23
votes
2 answers

Can a small sample size cause type 1 error?

I've learnt that small sample size may lead to insufficient power and type 2 error. However, I have the feeling that small samples just may be generally unreliable and may lead to any kind of result by chance. Is that true?
even
  • 2,147
  • 6
  • 18
  • 13
23
votes
2 answers

Topic stability in topic models

I am working on a project where I want to extract some information about the content of a series of open-ended essays. In this particular project, 148 people wrote essays about a hypothetical student organization as part of a larger experiment. …
22
votes
11 answers

Does machine learning really need data-efficient algorithms?

Deep learning methods are often said to be very data-inefficient, requiring 100-1000 examples per class, where a human needs 1-2 to reach comparable classification accuracy. However, modern datasets are huge (or can be made huge), which begs the…
20
votes
2 answers

Is Random Forest suitable for very small data sets?

I have data set comprising 24 rows of monthly data. The features are GDP, airport arrivals, month, and a few others. The dependent variable is number of visitors to a popular tourism destination. Would Random Forest be suitable for such a…
hughesdan
  • 301
  • 1
  • 2
  • 3
19
votes
2 answers

Mean(scores) vs Score(concatenation) in cross validation

TLDR: My dataset is pretty small (120) samples. While doing 10-fold cross validation, should I: Collect the outputs from each test fold, concatenate them into a vector, and then compute the error on this full vector of predictions (120…
user13420
  • 825
  • 2
  • 9
  • 10
1
2 3
40 41