2

So I've fitted a linear trend to my data and calculated R^2 in two different ways (in Matlab), one is using corrcoef and the other is "by hand". These return different results and both seem to make sense, so I'm not sure why that is. My methods are as follows, with x being the number of years and y my values:

(1)
rsq1 = corrcoef(x, y);

(2)
%// fitting the model
p = polyfit(x,y,1);
yfit = polyval(p,x);

%// calculating R^2
yresid = y - yfit;
SSresid = sum(yresid.^2);
SStotal = (length(y)-1) * var(y);
rsq2 = 1 - SSresid/SStotal;

Since I'm very new to this I can't seem to figure out why rsq1 and rsq2 are different. I have a feeling I'm missing something obvious... does anyone have an idea?

Thanks for any help!

Kim H
  • 23
  • 4

1 Answers1

5

The r-squared you get out of linear regression is equal to the square of the pearson correlation coefficient, which you have calculated as rsq1. So, both rsq1 and rsq1^2 have meaning, the former is the Pearson correlation, the second is the R-squared value you would obtain by linearly regressing y against x, which in your example is rsq2.

Alex
  • 3,728
  • 3
  • 25
  • 46
  • +1 because it is the correct answer and it took a bit of luck given that the initial problem formulation was somewhat ambiguous. – usεr11852 Feb 18 '16 at 00:21
  • @Alex I have one follow up question: so if I have two variables, e.g. temperature and plant growth, I could calculate the correlation coefficient which tells me how they are correlated. But when I have e.g. plant growth over **time**, and I fit a linear trend to it, maybe I'll rather use R^2, because it can tell me how well the trend fits the data. Would you say this makes sense? – Kim H Feb 18 '16 at 18:04
  • I think that would qualify as a separate question, for which 'canonical' answers could be found here: http://stats.stackexchange.com/questions/83347/relationship-between-r2-and-correlation-coefficient. These answers all seem very math heavy so I will give my take on it: The correlation coefficient is used as a measure of how strong the linear relationship between the two variables are... – Alex Feb 18 '16 at 22:32
  • ... R-squared on the other hand compares two models for your observed `y`. The first model returns the following: given x, guess the mean of y as the value for observed y. The second model uses the line of best fit: given x, guess the y value of this line as the value for the observed y. R-squared quantifies how much better the second guess is when compared to the first guess. – Alex Feb 18 '16 at 22:34
  • so to answer your question: "how well does the linear trend fit the data (observed y)", you should use R-squared. – Alex Feb 18 '16 at 22:34
  • @Alex Thank you that makes sense; and I'll have a look into it! – Kim H Feb 23 '16 at 14:44