0

I am very new to data science and I have this problem that I just can't explain. The data I am using is 3-d time series data. I split the time series by years and predict the last four with a multiple linear regression. When computing the spatial R² and spatial RMSE for each year they behave as expected. But when I am doing the same for the detrended data, my R² becomes horrible, even though the RMSE stays similar.

Please help.

The functions I am using are these: (Using scipy)

rmse = np.sqrt(mean_squared_error(y_obs_detrended, y_pred_detrended))


corr, _ = pearsonr(y_obs_detrended, y_pred_detrended)
r2 = corr**2
Tobi
  • 1
  • 1
  • You might find that the examples I present at https://stats.stackexchange.com/a/13317/919 shed some light on this issue. In particular, $R^2$ likely is not telling you what you think it is. – whuber Jan 14 '20 at 14:05
  • Thanks for your answer. However all of the examples in my understanding would lead to a bad RMSE if the R² is bad. However my RMSE is relatively good... – Tobi Jan 14 '20 at 14:16
  • Then I believe you did not see how those examples apply to your problem. In particular, after you detrended the data, you completely changed the denominator of $R^2,$ so the values aren't even remotely comparable. The main point is that $R^2$ represents a complex relationship beween the MSE and other factors which often are irrelevant; as such, you shouldn't even be looking at $R^2:$ in your application it's worse than useless. – whuber Jan 14 '20 at 14:26
  • Hi, Whuber. Thanks again for taking the time to answer. Yes, I did not see it apply. So you would argue, that detrending changes the variance of the data? Where exactly did you read that? – Tobi Jan 14 '20 at 18:32
  • I don't need to read it: it's a mathematical theorem. If your detrending is any good, the variance of the residuals is less than that of the original response. (Otherwise your detrending procedure has a *negative* $R^2$!) – whuber Jan 14 '20 at 18:54

0 Answers0