Study on Human Behavior: what is a good value of RMSE when a linear regression is performed on a dataset that represents human behavior?

Question

I am completely new to machine learning. I am working on my undergrad thesis that tries to predict how satisfied a person will be in a specific area by using linear regression. Since this study contains human behavior, ie different person's satisfaction level will be different in the same area, even if their demographics are the same, the dataset contains a lot of noise. The range of the dependent variable (Satisfaction) is 1-5. What can I consider as a good value of RMSE here? Currently the value is 1.08 but it changes if I divide my dataset based on some demographic, ie their jobs or gender, but the value is always more than 1

The proposed duplicate is stated in terms of SMAPE and MASE, but the exact same reasoning applies: there is no general benchmark. Compare your model to a very simple "benchmark" one, like the overall average (and do so with actual predictions, out-of-sample). If you can't even beat the simplest possible model, you should be thinking about what you are doing. And simple models can be surprisingly hard to beat. — Stephan Kolassa, Nov 23 '20 at 15:42
@StephanKolassa Do you even think RMSE is an appropriate metric? This sounds like an ordinal regression, not a regression where $1$ means half of $2$.. (Or is RMSE some kind of square root of an ordinal Brier score? Hmm...might that be the question I post today?) — Dave, Nov 23 '20 at 15:43
@Dave: very good point, that. Satisfaction is indeed likely an ordinal variable, so RMSE does not make a lot of sense. Hm. So we should look, on the one hand, at the likelihood of some ordinal logistic model or similar, and alternatively at the quality of out-of-sample predictions, to guard against overfitting. But no good error measure for ordinary predictions come to mind, except possibly proper scoring rules for a full (ordinal) predictive density, but you at least knew I was going to say that (my hammer is proper scoring rules).. ;-) — Stephan Kolassa, Nov 24 '20 at 06:16

Study on Human Behavior: what is a good value of RMSE when a linear regression is performed on a dataset that represents human behavior?

0 Answers0