I'm asking this question as I found little explanation of this phenomenon otherwhere. I am wondering about how to best deal with overfitting that comes from the model selection itself. Say I want to run some regression on a set of observations. My choice of which model to use (linear, log, exponential) is already in some sense a parameterisation. Even more so if I run several regressions using different models and then choosing the best one. For example, if I am to compare a linear with an exponential model of some sort, am I not (implicitly) doing a regression of the sort:
where I is a binary variable that I still determine based on a fitting procedure. Is there a way to quantify (or qualify) to what extent a model may be overfitted because of freedom in model selection?