Suppose I have some i.i.d. normal observations from $\mathbb{R}^f$ with parameters $(\mu, \Sigma)$ and $\Sigma$ is known to be the identity matrix.
I have the following hypotheses: $H_0^i$: $\mu_i = 0$
I calculate the statistics $ T_i = \frac{1}{\sqrt{n}} \sum_j^n x_{ji}$ which are $N(0, 1)$ under $H_0^i$, and the corresponding p-values.
I combine the test statistics/p-values in some way and test the null-hypothesis $H_0 = \bigcap_i H_0^i$.
If I can't reject, I declare $\mu = 0$. If I'm able to reject, I choose the $ T_i$ with the lowest p-value, say $i = 3$, and declare $\mu = [0, 0, \frac{1}{n} \sum_j^n x_{j3}, 0, \dots ]$.
This seems like a bad idea:
- p-values for feature selection
- Comparing p-values from pairwise permutation tests
- Comparing 2 p values of different sample size
etc.
But maybe it's a good idea? https://stats.stackexchange.com/a/207396/142710
I'm reading a famous paper "Unbiased Recursive Partitioning: A Conditional Inference Framework" by Hothorn, Hornik and Zeileis, and if I'm reading it correctly, this is exactly what they're advocating for.
My question is, is it sometimes acceptable to compare p-values for model selection? What is the right way to think about them in this case?