I have ran three regressions and they are all very statistically significant.
How do I choose which one is the best to use?
i.e do I look for high F-statistic, low p-values ect.
I have ran three regressions and they are all very statistically significant.
How do I choose which one is the best to use?
i.e do I look for high F-statistic, low p-values ect.
The general approach to model selection involves assessing the accuracy of a model when fit to previously unseen data. This is the rationale behind use of training and test data sets - models are first fit to training data sets, and the model that produces the most accurate predictions when applied to test data sets is then chosen as "best".
In order to evaluate the performance of a statistical learning method on a given data set, we need some way to measure how well its predictions actually match the observed data. That is, we need to quantify the extent to which the predicted response value for a given observation is close to the true response value for that observation. In the regression setting, the most commonly-used measure is the mean squared error (MSE), given by $$MSE = \frac{1}{n}\sum_{i=1}^{n}(y_{i}-\hat{f}(x_{i}))^2$$ The MSE is computed using the training data that was used to fit the model, and so should more accurately be referred to as the training MSE. But in general, we do not really care how well the method works on the training data. Rather, we are interested in the accuracy of the predictions that we obtain when we apply our method to previously unseen test data.[1]
Cross-validation can be employed to evaluate model performance to aid in the model selection process:
Cross-validation in plain english
How to choose a predictive model after k-fold cross-validation?
Best to use for what purpose? If it's purely for prediction and you don't care about explanation, you can go with the one that has the lowest AIC, BIC, SBC or some similar score.
If it's for explanation, then go with the one that best advances the field you are researching.