0

I am looking for a measure to see if the model that I have used is predicting correctly. E.g., I created a model that uses multiple independent features to predict a dependent feature. The model that I show here is Support Vector Regressor (SVR), although I use other models. During the training, I obtained an R-Squared of 0.29 and during the test, R-squared is -0.67.

SVR_model.score(X_train, y_train) -> 0.29
SVR_model.score(X_test, y_test) -> -0.67

I am looking for a measure that gives me a single value that helps me to evaluate if the model is predicting well or not.

  1. What could be the reasons to have a positive R-Squared in the training and a negative R-squared in the testing?
  2. How can I say if this model is predicting well?
  3. What is the best measure to use? Root Mean Square, Adjusted R-Squared?
xeon123
  • 225
  • 2
  • 6

1 Answers1

1

Positive R^2 in the train set and negative R^2 in the test set is just an (extreme) example of overfitting. Your model just generalizes poorly: A model that would've just received the mean of the test set would perform better.
If your test set is well-posed, you can say with confidence that the model is not predicting well.
However, there's also the possibility that your problem is with the test set and not the model. If your test set is very small, receiving its mean is actually receiving a lot of information about it... make sure your test set isn't pathologically small.
Changing the measure (MSE, RMSE, R^2, Adjusted R^2) isn't what will change the issue.

Itamar Mushkin
  • 672
  • 3
  • 19
  • Thank you for your comment. I am using cross-validation (KFold) to train the model, and I store the R-Squared for each iteration. As result, I have a list of R-Squared. Can I use an average of R-Squared values of all iterations to validate my model, or should I use another measure? – xeon123 Jul 30 '20 at 10:29
  • 1
    I'd inspect the list to see if it looks 'off' in any way, and then take the mean. – Itamar Mushkin Jul 30 '20 at 11:54