Questions tagged [multiple-comparisons]

Signals situations where one is concerned about achieving intended power and size when more than one hypothesis test is performed.

In statistical hypothesis testing, the size is the largest chance of rejecting the null when the null is true (a "false positive" error). The power is the chance of not rejecting the null; it depends on the "effect size" (a measure of how far reality actually departs from the null). Caeteris paribus, power and size are inversely related (one must increase if the other is decreased), so considerations often focus on size, which is simpler to analyze.

When more than one hypothesis test is performed to make a binary decision, the chance of a false positive is usually greater than the size of any of the tests used for that decision. For example, suppose groups of "control" and "treatment" subjects are randomly selected from the same population and each subject is given a questionnaire comprising 20 yes-no questions. Let the groups be compared separately for each question using a test of size .05. If the comparisons are independent, then the chance of at least one of them rejecting the null equals $1 - (1 - 0.05)^{20}$ = 0.64. Thus a nominal false positive rate of 0.05 in each test is inflated to a decision false positive rate of 0.64.

To avoid unacceptably large chances of reaching mistaken conclusions in such "multiple comparisons" cases, either an overall test of significance is initially conducted or the sizes of the individual tests leading up to the decision are decreased (that is, the tests are made more stringent). Examples of the former are the F-test in an ANOVA setting and Tukey's HSD test. Exemplary of the latter approach is the Bonferroni correction.

1632 questions

votes

4 answers

Look and you shall find (a correlation)

I have several hundred measurements. Now, I am considering utilizing some kind of software to correlate every measure with every measure. This means that there are thousands of correlations. Among these there should (statistically) be a high…

correlation multiple-comparisons permutation-test

asked Dec 25 '10 at 22:16

David

votes

1 answer

40,000 neuroscience papers might be wrong

I saw this article in the Economist about a seemingly devastating paper [1] casting doubt on "something like 40,000 published [fMRI] studies." The error, they say, is because of "erroneous statistical assumptions." I read the paper and see it's…

hypothesis-testing multiple-comparisons spatial neuroimaging neuroscience

asked Jul 25 '16 at 17:09

R Greg Stacey

2,202
2
15
30

votes

5 answers

Is adjusting p-values in a multiple regression for multiple comparisons a good idea?

Lets assume you are a social science researcher/econometrician trying to find relevant predictors of demand for a service. You have 2 outcome/dependent variables describing the demand (using the service yes/no, and the number of occasions). You have…

regression multivariate-analysis predictive-models multiple-regression multiple-comparisons

asked Sep 30 '10 at 14:07

Mikael M

votes

3 answers

When combining p-values, why not just averaging?

I recently learned about Fisher's method to combine p-values. This is based on the fact that p-value under the null follows a uniform distribution, and that $$-2\sum_{i=1}^n{\log X_i} \sim \chi^2(2n), \text{ given } X \sim \text{Unif}(0,1)$$ which I…

hypothesis-testing p-value multiple-comparisons central-limit-theorem combining-p-values

asked Dec 04 '13 at 23:11

Alby

2,103
3
19
22

votes

5 answers

Why is multiple comparison a problem?

I find it hard to understand what really is the issue with multiple comparisons. With a simple analogy, it is said that a person who will make many decisions will make many mistakes. So very conservative precaution is applied, like Bonferroni…

hypothesis-testing multiple-comparisons

asked Aug 09 '10 at 18:03

AgCl

votes

3 answers

Significance contradiction in linear regression: significant t-test for a coefficient vs non-significant overall F-statistic

I'm fitting a multiple linear regression model between 4 categorical variables (with 4 levels each) and a numerical output. My dataset has 43 observations. Regression gives me the following $p$-values from the $t$-test for every slope coefficient:…

regression hypothesis-testing multiple-comparisons multiple-regression t-test

asked Mar 15 '12 at 19:56

Leo

2,484
3
22
29

votes

5 answers

The meaning of "positive dependency" as a condition to use the usual method for FDR control

Benjamini and Hochberg developed the first (and still most widely used, I think) method for controlling the false discovery rate (FDR). I want to start with a bunch of P values, each for a different comparison, and decide which ones are low enough…

multiple-comparisons non-independent false-discovery-rate

asked Aug 13 '14 at 18:39

Harvey Motulsky

14,903
11
51
98

votes

1 answer

Multiple comparisons on a mixed effects model

I am trying to analyse some data using a mixed effect model. The data I collected represent the weight of some young animals of different genotype over time. I am using the approach proposed…

r anova mixed-model multiple-comparisons repeated-measures

asked Dec 08 '10 at 11:22

nico

4,246
3
28
42

votes

2 answers

Should we address multiple comparisons adjustments when using confidence intervals?

Suppose we have a multiple comparisons scenario such as post hoc inference on pairwise statistics, or like a multiple regression, where we are making a total of $m$ comparisons. Suppose also, that we would like to support inference in these…

confidence-interval multiple-comparisons inference

asked Sep 09 '14 at 19:09

Alexis

26,219
5
78
131

votes

5 answers

Data "exploration" vs data "snooping"/"torturing"?

Many times I have come across informal warnings against "data snooping" (here's one amusing example), and I think I have an intuitive idea of roughly what that means, and why it may be a problem. On the other hand, "exploratory data analysis" seems…

multiple-comparisons interpretation exploratory-data-analysis

asked Sep 16 '13 at 15:36

kjo

1,817
1
16
24

votes

6 answers

Variable selection procedure for binary classification

What are the variable/feature selection that you prefer for binary classification when there are many more variables/feature than observations in the learning set? The aim here is to discuss what is the feature selection procedure that reduces the…

machine-learning classification multiple-comparisons multivariate-analysis feature-selection

asked Jul 22 '10 at 11:10

robin girard

6,335
6
46
60

votes

2 answers

How to cope with exploratory data analysis and data dredging in small-sample studies?

Exploratory data analysis (EDA) often leads to explore other "tracks" that do not necessarily belong to the initial set of hypotheses. I face such a situation in the case of studies with a limited sample size and a lot of data gathered through…

multiple-comparisons epidemiology small-sample exploratory-data-analysis

asked Oct 01 '10 at 21:52

chl

50,972
18
205
364

votes

4 answers

Correcting p values for multiple tests where tests are correlated (genetics)

I have p values from a lot of tests and would like to know whether there is actually something significant after correcting for multiple testing. The complication: my tests are not independent. The method I am thinking about (a variant of Fisher's…

correlation multiple-comparisons statistical-significance genetics

asked Sep 18 '10 at 11:21

Stephan Kolassa

95,027
13
197
357

votes

4 answers

Why don't Bayesian methods require multiple testing corrections?

Andrew Gelman wrote an extensive article on why Bayesian AB testing doesn't require multiple hypothesis correction: Why We (Usually) Don’t Have to Worry About Multiple Comparisons, 2012. I don't quite understand: why don't Bayesian methods require…

hypothesis-testing bayesian multiple-comparisons

asked Mar 24 '16 at 00:01

user46925

votes

1 answer

Comparing levels of factors after a GLM in R

Here is a little background about my situation: my data refer to the number of prey successfully eaten by a predator. As the number of prey is limited (25 available) in each trial, I had a column "Sample" representing the number of available prey…

r generalized-linear-model references multiple-comparisons tukey-hsd-test

asked May 29 '13 at 14:01

Anne

2 3

…

99 100 Next