1

I have a dataframe containing the IDs of 2000 questions, a list of scores representing difficulty, and the following features: how often the question was answered, how often the answer has been changed because the students were undecided, a normalized "frequency of changing the answers" (so the last two feature divided) and the average time spent on a question. The most important seems to be this normalized frequency (50%), then the average time (22%), how often the question was answered (17%) and how often the answer was changed overall (11%).

I used Google AutoML which is optimized for RMSE and I got:

MAE = 0.135
RMSE= 0.177
RMSLE = 0.112
MAPE  = 29.37%
R^2= 0.394

Should I worry about the R^2? How come the others look good? Is the model underfitted?

1 Answers1

1

How do you know that the model is fitting well to the data? How do you know that the results look good?

RMSE and MAE are in units of your target value, and they do not have a common/general range. The values could be bad. If you have to predict a value around 0.01 meters and you are predicting 0.145 meters, that would give a MAE of 0.135. That is not good. But if you are predicting a value of 34.001 meters and your output is 34.145, that looks good, and have the same MAE.

The two values that are dimensionless and have a common range is MAPE and R^2. Your MAPE is 29% which means that you are erring (on average) by 30% of the correct value) Is this good or is this bad for your case?

Jacques Wainer
  • 5,032
  • 1
  • 20
  • 32
  • 1
    [There are known shortcomings of MAPE](https://stats.stackexchange.com/questions/299712/what-are-the-shortcomings-of-the-mean-absolute-percentage-error-mape), but knowing the context for errors is important, absolutely. – Dave Jul 09 '21 at 18:56