IMHO, this is fine, provided that it is acknowledged that you don't have an unbiased performance estimate for the final model. Whether over-fitting is a problem depends on how many models there are too choose from, how well they are specified by the training data and the variance of the hold-out estimate used to select the best model. In many cases, the over-fitting from the discrete choice between three models is likely to be negligible.
Consider this alternative procedure. We use k-fold cross-validation to estimate the performance of the model. In each fold, we use hold-out set to choose between M1, M2 and M3 in each stage. We then build the final model, using all of the available data, with a hold-out set to choose between M1, M2 and M3. In this case, we build the final model in the same way suggested in the question, but the performance estimate is approximately unbiased (for LOOCV at least), as the choice between M1, M2 and M3 is performed separately in each fold, so the performance estimate includes a component due to the over-fitting caused by choosing between models based on a hold-out set. The procedure set out in the question is just the same, just without the attempt to get a performance estimate.