This blog post has a nice description of when the square of the Pearson correlation coefficient, r, is equal to the coefficient of determination, $R^2$. Specifically, states that they will be the same when the model, $f$, meets these three conditions:
- $f$ is the model that minimizes squared-error loss
- Because it is the optimum (in the sense of item 1), there is no shift of $f$ that will improve the fit.
- Because it is the optimum (in the sense of item 1), there is no scaling of $f$ that will improve the fit.
Surely the last two conditions aren't redundant, but I can't think of an instance where $f$ is optimal in the least squares sense, but improvable by shifting or scaling. Can someone explain how that might happen?