Coefficient of Determination (R-Squared) definition in Matlab

Question

First of all, I have to say that my knowledge of statistics is very basic. I was trying to fit data with a linear regression in Matlab, and I came across the problem of $R^2$ definition. I am using the free Ezyfit toolbox, but my question is about which definition of $R^2$ do I have to use in my analysis. I fitted the data both with a linear regression with intercept $y = ax + b$ and without intercept $y = ax $. I calculated $R^2$ from the definition $$ R^2 = 1 - \frac{SS_{res}}{SS_{tot}} $$

where $SS_{res}$ is the sum of squares of residuals and $SS_{tot}$ is the total sum of squares.

I saw that in the Ezyfit toolbox (and from what I can understand in the Matlab Statistics Toolbox as well, http://uk.mathworks.com/help/stats/coefficient-of-determination-r-squared.html) the definition used for $R^2$ instead is

$$ R^2 = \frac{SS_{reg}}{SS_{tot}} $$

where $SS_{reg}$ is the regression sum of squares.

From what I understand, the first definition is the most general one, and it is equivalent to the second definition only in some cases (when $SS_{tot} = SS_{reg} + SS_{res}$, see https://en.wikipedia.org/wiki/Coefficient_of_determination). When I calculated the $R^2$ for my linear regression with intercept, the two definitions gave the same result. But when I calculated it for the regression with intercept 0, the results of the two $R^2$ definitions differ. And in my case, if I use the first definition the regression with the intercept is the best fit, with a higher $R^2$, and if I use the second one the 0 intercept is the best fit. If I use Ezyfit, that finds the regression parameters by minimising $SS_{reg}$, it gives me an intercept significantly different from 0, even if the $R^2$ of the 0 intercept is higher according to the definition used inside the toolbox.

I am interested in understanding why Matlab and Ezyfit use the second definition, and in what cases in general it is possible or preferable to use it. In my particular case, which of the two definitions is the more appropriate? Thank you in advance

Edited: I think my question is different than Removal of statistically significant intercept term increases $R^2$ in linear model, because there it's R that uses a modified version of $R^2$ if you don't set the intercept, while here I am calculating the $R^2$ myself. From my results, the $R^2$ definitions are equivalent for a fit with intercept, but different without intercept, and I would like to understand why.

Edited (with answer): Thanks everyone. The core of my problem was that the two versions of $R^2$ calculated from me had different results in the case of a linear regression without intercept. This was my mistake, as the second version of $R^2$ is not applicable for a fit without intercept. Now I have a different problem, that I'm going to ask in a different question

I think my problem is different. I edited the question, I hope it's clearer now — Letty, Oct 29 '15 at 15:38
How are the SS calculated in these two cases. Typically, there is a "corrected total" which is used when you have the intercept. I don't know what MATLAB does. What happens when you calculate the SS manually / naively (& equally) for both models? — gung - Reinstate Monica, Oct 29 '15 at 15:52
If your question is about why $1 - \frac{SS_{res}}{SS_{tot}}$ does not match $\frac{SS_{reg}}{SS_{tot}}$ when you drop an intercept, without modifying the way the SS are calculated, [there's a geometric interpretation here](http://stats.stackexchange.com/q/123651/22228) and a [more statistical answer here](http://stats.stackexchange.com/q/47527/22228). Could you clarify in what way your question differs from these? — Silverfish, Oct 29 '15 at 16:13
@gung I naively calculate as follows: SS_tot = sum((y_i - mean_y)^2), SS_res = sum((y_i - yfit_i)^2) and SS_reg = sum((yfit_i - mean_y)^2). If I calculate R^2 for an intercept, the two R^2 definitions give the same result. For a fit without intercept, the first definition gives a lower R^2 than the second one — Letty, Oct 29 '15 at 16:14
@Silverfish thank you for your comment, I think the geometric interpretation is what I was looking for. If the partition of the sum cannot be performed without intercept, does this mean that the standard definition of $R^2$ (the first one) is always the correct one? How can it be compared to modified definitions of $R^2$ (for example the one used in R)? — Letty, Oct 29 '15 at 17:02
I don't think it's sensible to talk about a single definition of $R^2$ being "the correct one" - that's why I thought [Removal of statistically significant intercept term boosts R2 in linear model](http://stats.stackexchange.com/q/26176/22228) is at least a close duplicate of your question, since possible alternative definitions of $R^2$ are explicitly explored in the answer. As @cardinal explains there, $1 - \frac{SS_{res}}{SS_{tot}}$ is more general than $\frac{SS_{reg}}{SS_{tot}}$, but other definitions of $R^2$ can also be used in "no intercept" regressions. — Silverfish, Oct 29 '15 at 17:11
Thank you @Silverfish, I think I understand the problem better now. I was confused because my fit toolbox was always converging towards an intercept as the best fit, but in the meantime it was also giving me a greater $R^2$ without intercept, all of this using always the same definition (the second one), without changing it like in R. Now the problem would be how to compare the two fits, I'll look into that — Letty, Oct 29 '15 at 17:27
I think if you were to refocus your question on "To what extent are the different versions of $R^2$ comparable, particularly without an intercept" you might be able to create a separate question from that. If you look at the bottom of @cardinal's [answer here](http://stats.stackexchange.com/a/26205/22228), you can see Andy Clifton wrote "'I'm missing something. Is what R does, correct? I mean is the R^2 value that is reported, even remotely comparable between the with and without intercept cases?" This seems to be the crux of your question too. — Silverfish, Oct 29 '15 at 18:35
If this is the angle you are going for, you might want to *severely* edit this question (I'd recommend to completely de-emphasise or even remove all implementation-specific issues re Matlab or your toolbox) or post a fresh one. It would be considered inappropriate to post a new question rather than editing your old one if your new question is intended as a "fix" for your old one, but in this case it seems you have established an answer to your original question from one of the linked threads, and now have a follow-up that does not sound like a duplicate anymore. — Silverfish, Oct 29 '15 at 18:36
(Regardless of whether you post a fresh question, or edit this one, you should make it clear what you have read and understood so far, and explain why your question is asking something new that wasn't covered there. Otherwise there is a chance of it being closed as a duplicate, or of getting answers that simply go over the same ground as you already know.) — Silverfish, Oct 29 '15 at 18:39
Thank you very much, @Silverfish. I'll write a new question and edit this one, to explain what I have understood. I hope I'm doing it in the right way, thank you again. — Letty, Oct 30 '15 at 10:05

Coefficient of Determination (R-Squared) definition in Matlab

0 Answers0