Questions tagged [small-sample]

Refers to statistical complications or problems due to having few data. If your question is about a small sample relative to the number of variables, please use the [underdetermined] tag instead.

Only having access to a small sample may cause certain problems, for example: statistical tests may be underpowered; assumptions may be more difficult to verify; if data are not sufficiently normal, the Central Limit Theorem may not be able to ensure the normality of relevant sampling distributions, etc.

603 questions

124

votes

7 answers

How to choose between t-test or non-parametric test e.g. Wilcoxon in small samples

Certain hypotheses can be tested using Student's t-test (maybe using Welch's correction for unequal variances in the two-sample case), or by a non-parametric test like the Wilcoxon paired signed rank test, the Wilcoxon-Mann-Whitney U test, or the…

asked Oct 29 '14 at 03:02

Silverfish

20,678
23
92
180

votes

4 answers

Can bootstrap be seen as a "cure" for the small sample size?

This question has been triggered by something I read in this graduate-level statistics textbook and also (independently) heard during this presentation at a statistical seminar. In both cases, the statement was along the lines of "because the sample…

bootstrap small-sample

asked Aug 16 '14 at 20:23

James

2,600
1
14
26

votes

6 answers

Warning in R - Chi-squared approximation may be incorrect

I have data showing fire fighter entrance exam results. I am testing the hypothesis that exam results and ethnicity are not mutually independent. To test this, I ran a Pearson chi-square test in R. The results show what I expected, but it gave a…

r categorical-data chi-squared-test small-sample error-message

asked Jan 07 '14 at 12:00

ferrelwill

votes

6 answers

Best method for short time-series

I have a question related to modeling short time-series. It is not a question if to model them, but how. What method would you recommend for modeling (very) short time-series (say of length $T \leq 20$)? By "best" I mean here the most robust one,…

time-series forecasting small-sample

asked Jan 26 '15 at 21:03

Tim

108,699
20
212
390

votes

5 answers

What can we say about population mean from a sample size of 1?

I am wondering what we can say, if anything, about the population mean, $\mu$ when all I have is one measurement, $y_1$ (sample size of 1). Obviously, we'd love to have more measurements, but we can't get them. It seems to me that since the sample…

mean sample-size small-sample unbiased-estimator

asked Jun 18 '15 at 15:21

thedu

votes

2 answers

How to cope with exploratory data analysis and data dredging in small-sample studies?

Exploratory data analysis (EDA) often leads to explore other "tracks" that do not necessarily belong to the initial set of hypotheses. I face such a situation in the case of studies with a limited sample size and a lot of data gathered through…

multiple-comparisons epidemiology small-sample exploratory-data-analysis

asked Oct 01 '10 at 21:52

chl

50,972
18
205
364

votes

4 answers

How to perform Student's t-test having only sample size, sample average and population average are known?

Student's $t$-test requires the sample standard deviation $s$. However, how do I compute for $s$ when only the sample size and sample average are known? For example, if sample size is $49$ and sample average is $112$, I will then attempt to create a…

t-test standard-deviation small-sample

asked Aug 18 '10 at 01:39

Kit

votes

1 answer

Using bootstrap under H0 to perform a test for the difference of two means: replacement within the groups or within the pooled sample

Suppose that I have a data with two independent groups: g1.lengths <- c (112.64, 97.10, 84.18, 106.96, 98.42, 101.66) g2.lengths <- c (84.44, 82.10, 83.26, 81.02, 81.86, 86.80, 85.84, 97.08, 79.64, 83.32, 91.04, 85.92, …

r hypothesis-testing bootstrap small-sample permutation-test

asked Feb 07 '15 at 00:56

Newbie_R

votes

2 answers

Mother milk of 6 Corona-positive (COVID-19) women does not contain the virus - can we make a confidence statement about this?

I am asking this question because I believe it would be great if the statistics community could make a contribution to solving this serious puzzle until more evidence is available. The UK Royal Office of Obstetricians and Gynecologists publishes…

mathematical-statistics confidence-interval binomial-distribution small-sample

asked Mar 18 '20 at 19:20

tomka

5,874
3
30
71

votes

7 answers

Appropriate normality tests for small samples

So far, I've been using the Shapiro-Wilk statistic in order to test normality assumptions in small samples. Could you please recommend another technique?

hypothesis-testing goodness-of-fit normality-assumption small-sample

asked Aug 13 '10 at 12:42

aL3xa

2,113
3
23
27

votes

2 answers

Can a small sample size cause type 1 error?

I've learnt that small sample size may lead to insufficient power and type 2 error. However, I have the feeling that small samples just may be generally unreliable and may lead to any kind of result by chance. Is that true?

hypothesis-testing small-sample

asked Apr 17 '11 at 21:55

even

2,147
6
18
13

votes

2 answers

Topic stability in topic models

I am working on a project where I want to extract some information about the content of a series of open-ended essays. In this particular project, 148 people wrote essays about a hypothetical student organization as part of a larger experiment. …

machine-learning model-selection small-sample topic-models dirichlet-process

asked Jul 01 '13 at 15:07

Patrick S. Forscher

3,122
23
43

votes

11 answers

Does machine learning really need data-efficient algorithms?

Deep learning methods are often said to be very data-inefficient, requiring 100-1000 examples per class, where a human needs 1-2 to reach comparable classification accuracy. However, modern datasets are huge (or can be made huge), which begs the…

machine-learning neural-networks sample-size small-sample efficiency

asked Jun 08 '21 at 22:08

MWB

1,143
9
18

votes

2 answers

Is Random Forest suitable for very small data sets?

I have data set comprising 24 rows of monthly data. The features are GDP, airport arrivals, month, and a few others. The dependent variable is number of visitors to a popular tourism destination. Would Random Forest be suitable for such a…

random-forest small-sample

asked Jan 25 '16 at 06:53

hughesdan

votes

2 answers

Mean(scores) vs Score(concatenation) in cross validation

TLDR: My dataset is pretty small (120) samples. While doing 10-fold cross validation, should I: Collect the outputs from each test fold, concatenate them into a vector, and then compute the error on this full vector of predictions (120…

classification cross-validation small-sample

asked Aug 19 '12 at 05:21

user13420

2 3

…

40 41 Next