Questions tagged [multiple-comparisons]

Signals situations where one is concerned about achieving intended power and size when more than one hypothesis test is performed.

In statistical hypothesis testing, the size is the largest chance of rejecting the null when the null is true (a "false positive" error). The power is the chance of not rejecting the null; it depends on the "effect size" (a measure of how far reality actually departs from the null). Caeteris paribus, power and size are inversely related (one must increase if the other is decreased), so considerations often focus on size, which is simpler to analyze.

When more than one hypothesis test is performed to make a binary decision, the chance of a false positive is usually greater than the size of any of the tests used for that decision. For example, suppose groups of "control" and "treatment" subjects are randomly selected from the same population and each subject is given a questionnaire comprising 20 yes-no questions. Let the groups be compared separately for each question using a test of size .05. If the comparisons are independent, then the chance of at least one of them rejecting the null equals $1 - (1 - 0.05)^{20}$ = 0.64. Thus a nominal false positive rate of 0.05 in each test is inflated to a decision false positive rate of 0.64.

To avoid unacceptably large chances of reaching mistaken conclusions in such "multiple comparisons" cases, either an overall test of significance is initially conducted or the sizes of the individual tests leading up to the decision are decreased (that is, the tests are made more stringent). Examples of the former are the F-test in an ANOVA setting and Tukey's HSD test. Exemplary of the latter approach is the Bonferroni correction.

1632 questions
68
votes
4 answers

Look and you shall find (a correlation)

I have several hundred measurements. Now, I am considering utilizing some kind of software to correlate every measure with every measure. This means that there are thousands of correlations. Among these there should (statistically) be a high…
David
  • 855
  • 1
  • 8
  • 7
66
votes
1 answer

40,000 neuroscience papers might be wrong

I saw this article in the Economist about a seemingly devastating paper [1] casting doubt on "something like 40,000 published [fMRI] studies." The error, they say, is because of "erroneous statistical assumptions." I read the paper and see it's…
60
votes
5 answers

Is adjusting p-values in a multiple regression for multiple comparisons a good idea?

Lets assume you are a social science researcher/econometrician trying to find relevant predictors of demand for a service. You have 2 outcome/dependent variables describing the demand (using the service yes/no, and the number of occasions). You have…
57
votes
3 answers

When combining p-values, why not just averaging?

I recently learned about Fisher's method to combine p-values. This is based on the fact that p-value under the null follows a uniform distribution, and that $$-2\sum_{i=1}^n{\log X_i} \sim \chi^2(2n), \text{ given } X \sim \text{Unif}(0,1)$$ which I…
46
votes
5 answers

Why is multiple comparison a problem?

I find it hard to understand what really is the issue with multiple comparisons. With a simple analogy, it is said that a person who will make many decisions will make many mistakes. So very conservative precaution is applied, like Bonferroni…
AgCl
  • 603
  • 5
  • 6
40
votes
3 answers

Significance contradiction in linear regression: significant t-test for a coefficient vs non-significant overall F-statistic

I'm fitting a multiple linear regression model between 4 categorical variables (with 4 levels each) and a numerical output. My dataset has 43 observations. Regression gives me the following $p$-values from the $t$-test for every slope coefficient:…
39
votes
5 answers

The meaning of "positive dependency" as a condition to use the usual method for FDR control

Benjamini and Hochberg developed the first (and still most widely used, I think) method for controlling the false discovery rate (FDR). I want to start with a bunch of P values, each for a different comparison, and decide which ones are low enough…
36
votes
1 answer

Multiple comparisons on a mixed effects model

I am trying to analyse some data using a mixed effect model. The data I collected represent the weight of some young animals of different genotype over time. I am using the approach proposed…
nico
  • 4,246
  • 3
  • 28
  • 42
35
votes
2 answers

Should we address multiple comparisons adjustments when using confidence intervals?

Suppose we have a multiple comparisons scenario such as post hoc inference on pairwise statistics, or like a multiple regression, where we are making a total of $m$ comparisons. Suppose also, that we would like to support inference in these…
Alexis
  • 26,219
  • 5
  • 78
  • 131
34
votes
5 answers

Data "exploration" vs data "snooping"/"torturing"?

Many times I have come across informal warnings against "data snooping" (here's one amusing example), and I think I have an intuitive idea of roughly what that means, and why it may be a problem. On the other hand, "exploratory data analysis" seems…
kjo
  • 1,817
  • 1
  • 16
  • 24
30
votes
6 answers

Variable selection procedure for binary classification

What are the variable/feature selection that you prefer for binary classification when there are many more variables/feature than observations in the learning set? The aim here is to discuss what is the feature selection procedure that reduces the…
29
votes
2 answers

How to cope with exploratory data analysis and data dredging in small-sample studies?

Exploratory data analysis (EDA) often leads to explore other "tracks" that do not necessarily belong to the initial set of hypotheses. I face such a situation in the case of studies with a limited sample size and a lot of data gathered through…
chl
  • 50,972
  • 18
  • 205
  • 364
29
votes
4 answers

Correcting p values for multiple tests where tests are correlated (genetics)

I have p values from a lot of tests and would like to know whether there is actually something significant after correcting for multiple testing. The complication: my tests are not independent. The method I am thinking about (a variant of Fisher's…
29
votes
4 answers

Why don't Bayesian methods require multiple testing corrections?

Andrew Gelman wrote an extensive article on why Bayesian AB testing doesn't require multiple hypothesis correction: Why We (Usually) Don’t Have to Worry About Multiple Comparisons, 2012. I don't quite understand: why don't Bayesian methods require…
user46925
27
votes
1 answer

Comparing levels of factors after a GLM in R

Here is a little background about my situation: my data refer to the number of prey successfully eaten by a predator. As the number of prey is limited (25 available) in each trial, I had a column "Sample" representing the number of available prey…
1
2 3
99 100