Suppose we have a DIY market and for all wrenches we are selling, we collect a various number of more or less important attributes (size, weight, price, hardness of material, etc., all of them quantitative). Also we know how many items we sold last year, so we can get the revenue. I'd like to model the revenue in a linear way depending on the attributes.
Choosing the right model is obviously an important question. I've thought of generating all possible linear models (in R revenue ~ size, revenue ~ size + weight, weight, revenue ~ size + weight + price, ...
) and compare the models. The method of comparison would be leave-one-out cross validation, comparing the average squared error of each model.
Is this approach of model comparison a good idea? If not, why? I'm aware of the possible memory issues if our data has too many different attributes (ie rows in the data frame). In a stackoverflow answer on how to generate all possible models in R I've found the following warning:
Make very sure that is what you want, in general this kind of model comparison is strongly advised against. Forget about any kind of inference as well when you do this.
Is the squared error from leave-one-out cross validation a good measure to pick the best model, especially when comparing models with a different number of independent variables? Are there other or better ways to compare the models?