I fitted a MARS model and an OLS model to my data. The main goal is prediction. How can I compare the result and decide which is better? Since I don't have much records I did not split the data in Training and testing data... Until now I only considered R^2- fit for the fitted data. Can I do some cross-validation?
Asked
Active
Viewed 183 times
0
-
1Cross validation is done by splitting the data into training and testing sets. So you basically contradict yourself in sentence 4 against sentence 6. – Richard Hardy Oct 30 '15 at 19:44
2 Answers
1
The standard approach to picking the model with the best predictive generalization is to do cross-validation and measure some appropriate out-of-sample metric. I don't know what might be the most appropriate for your context and problem, but one standard choice would be to pick the model that minimizes out-of-sample squared error.

Sycorax
- 76,417
- 20
- 189
- 313
-
But how to implement CV for the MARS model? And what about outliers. These could have huge effect in the CV? – R_FF92 Oct 30 '15 at 14:56
-
I'm not sure I understand your question -- what would make implementing CV in MARS different than any other algorithm? Outliers probably won't be predicted well, so... that's the cost of trying to make any model. – Sycorax Oct 30 '15 at 15:15
-
Problem is I used the Salford Software for Mars model and also build a Cart model there. I think both use during the model buildung a Cv. For better fit I averaged Mars and Cart. Now I don't know how CV works for this averaged model... – R_FF92 Oct 30 '15 at 15:47
-
@fabian92 I'm not familiar with that software suite, and software questions are not within the scope of CV. To understand how the software works, you'll have to contact the manufacturer, or some sort of forum with that dedicated purpose. – Sycorax Oct 30 '15 at 16:11
-
@fabian92, if your data is cross-sectional, CV is trivial: just make different splits of the original sample into estimation set and test set, run your model on each estimation set and test its performance on the corresponding test set. If your data is time series, you could use rolling windows instead of random splits. – Richard Hardy Oct 30 '15 at 19:43
0
R^2- fit for the fitted data is a valid measure, but ...
Both models might have a different bias/ variance trade off. This is a separate subject all together. What you are looking for is the model that has the best R^2 under generalization.
A good staring point is: Question about bias-variance tradeoff
Cross-validation is indeed one of the ways to assess this trade-off.
HTH.