P-value corrections on a growing dataset

Question

My study design is following:

I have N different financial funds and their performance over T years. At the start of each year I want to test hypothesis that they perform better than the benchmark.

Question 1: As I am performing N independent tests at some $\alpha$ level, then I should apply some correction for multiple testing i.e. Bonferonni and test each at $\frac{\alpha}{N}$ level. Is this correct?

More difficult question concerns the time dynamics. As I test them each year, I am repeatedly using (old + past year) dataset.

Question 2: Does this mean that for example in Year X I should test all N funds at $\frac{\alpha}{X \cdot N}$ level or is there some more complicated relationship as the tests for each fund through time are not independent as they share large amount of data?

I was searching the internet for multiple testing problem under growing dataset, but only paper I found was (Trunk and Coleman 1982) which show that in a limit any null hypothesis will be rejected at the constant $\alpha$ level when the dataset is growing forever.

See https://stats.stackexchange.com/questions/120362/whats-wrong-with-bonferroni-adjustments/ for why you should not apply the Bonferroni correction here. — fblundun, Apr 04 '21 at 17:39
Thank you for the link. I am actually comparing different multiple testing approaches, so even that I know it's not really appropriate here I am using it. — Kobayashi, Apr 06 '21 at 11:03
How many hypotheses are you testing? Is it $N$ hypotheses, each of the form "the i-th fund outperforms the benchmark", or $NT$ hypotheses, each of the form "the i-th fund outperforms the benchmark in the j-th year"? Whatever that number is, divide $\alpha$ by it to find the Bonferroni-compliant significance level at which to individually test each hypothesis. This achieves the Bonferroni guarantee that your family-wise Type I error rate is bounded by $\alpha$ (not that there's any good reason you should care about this guarantee.) — fblundun, Apr 06 '21 at 12:47
Thank you, what is not totally clear to me is that I wanted to emulate "discovery process" as if the tests were performed exactly when the data are available, that is as I wrote for example in total cumulatively $1*N$ tests in Year 1, $2*N$ in Year 2 etc. Is this somehow valid approach? Now that I think of it, it's probably not, because it assumes that at a given year I know that the study won't continue in future years, which is not true... — Kobayashi, Apr 06 '21 at 15:10
You can't use the Bonferroni correction unless you know exactly how many hypotheses you are testing. This is another problem with it. — fblundun, Apr 06 '21 at 15:18

P-value corrections on a growing dataset

0 Answers0