How much of a problem is inference after model selection when few models are manually compared?

Question

tl;dr: I found a better model than the one I first thought of while inspecting the data and performed a few steps of variable selection/model fine-tuning. I assume that this is a (mild) case of inference after model selection.

I performed an experiment that went wrong, but the error made it possible to test another hypothesis that I generated upon noticing the error (and before knowing the outcome). Parameter A should have been held constant but was not, so I hypothesized that Y depends on A, which is biologically plausible. Because of repeated measures and heteroskedasticity, I used the following least squares model:

gls(Y~A,
    correlation = corAR1(form = ~1|individual),
    weights = varPower())

While inspecting the data on the level of the individuals I noticed a group that behaved differently from the rest. This group corresponded to an actual group in the experiment, but I didn't expect this grouping factor (called B in the following) to be relevant at first. So I updated the model (AIC improved):

gls(Y~A*B,
    correlation = corAR1(form = ~1|individual),
    weights = varPower())

I also checked if other correlation structures and variance structures would lead to a better model by visually checking the residuals and ended up with this model (AIC improved):

gls(Y~A*B,
    correlation = corAR1(form = ~1|individual),
    weights = varIdent(form=~1|test)) # the same test was repeated five times in all individuals, B varied across the repetitions

Finally, I checked if two other covariates had any effect on the model, which I didn't expect and wasn't the case (AIC worsened, graphically investigated the effect), but I expected to be asked if it had been tested.

Possible solutions I thought of/read about:

Warn the reader that one limitation of the study and the interpretability of the results is that model selection and inference was performed using the same dataset.
Split the dataset (66% train - 33% test), however grouping factor B is not balanced, many more observations are from one of the groups - that could cause other problems - and I fear a loss of power since the dataset is rather small.
Use the full model for inference (useless parameters and covariates included).
Add noise to the data while redoing the model selection as I have read here: https://arxiv.org/pdf/1507.06739.pdf. However:
1. this approach might not be so honest since I now know which model is best and might be biased while checking again if the elements that directed my decisions are still present in the noisy data
2. this approach might be applicable only with specific model (forward stepwise) and variable (lasso) selecting methods
3. I am unsure of how to implement this method (specifically the estimation of the noise variance and mean). I also found no R package doing this.
Somehow adjust the results for post-selection inference, although I could not find a method applicable to the comparison of manually selected models (and I do not have the knowledge to adapt the conditional probabilities method described in this article to my needs)

Resulting questions:

Does this actually constitute a case of inference after model selection or did I understand this wrong (eg. is this only relevant when choosing from a higher number of models and/or when performing variable selection)?
Is the selection of the optimal correlation/variance structure also affected by this problem?
Does any of the solutions to overcome this problem make sense or do you have another suggestion?

I apologize if this is completely wrong or unclear, I have limited experience in statistics/bioinformatics and no theoretical background...

How much of a problem is inference after model selection when few models are manually compared?

0 Answers0