I want to calculate the pooled p-value of a regression coefficient across K fold cross validation. I have a model
$$Y \sim \mathrm{Intercept} + \mathrm{Cov}_1 + \mathrm{Cov}_2 + \mathrm{Cov_3} + X$$
and I'm interested in the pooled p value estimate of the variable X, after adjusting for the 3 covariates. To this end I perform cross validation, and I fit this logistic regression for each fold.
Following the procedure described in Calculating pooled p-values manually, I get a K×3 matrix, where the first column are the coefficients, the second column are the variances, and the last column are the p values. However, I am not 100% sure that this is applicable (since I'm not doing imputation, I just have K folds, so no variation due to imputation, but only sampling variation). The main problem is that, for some variables X I obtain p values > 1 (after multiplying the t_test result by 2).
Based on the link above I have a few questions:
- Is this procedure at all applicable for this task? Even without imputation? If not, is there another way to pool the p value estimates?
- I am only using the p values to test for significance, so is it really a problem that it's > 1 (since anyting > 0.05 is considered not significant anyway)?
- For n, should I use the complete sample size (N), or the sample size in each fold (~ N * ((K-1) / K))?