2

This question was edited to clairify my question, for the old question see the edit log
I found this about regression and correlation:

Regression is different from correlation because it try to put variables into equation and thus explain causal relationship between them, for example the most simple linear equation is written : Y=aX+b

Based on the infromation mentioned here:

However such results do not allow any causal explanation of the effect of x on y, indeed x could act on y in various way that are not always direct, all we can say from the correlation is that these two variables are linked somehow, to really explain and measure causal effect of x on y we need to use regression method, which will come next.

If we plot data and it shows a clear linear trend we can test if this linear trend is significant using a correlation test (I assume), if this is the case we can apply a linear model to this and then inspect the p value of the slope to determine if the slope is probably the same as we could expect in our popultion.

I'm not sure if the above assumption is correct so I'm wondering what the p value of the correlation test tells us and what the P value of the slope tells us?

KingBoomie
  • 683
  • 4
  • 7
  • 20
  • 2
    Short answer is always to follow the regression. It's more flexible if you change the model whereas the correlation answers one question only. The linear fit looks plausible for the data but an exponential fit also works well (try it) and doesn't have the downside of going negative at some point. (Still, if biological plausibility is a goal here, as it should be, it also seems possible that too much acid would just kill the fungus, so growth might stop any way.) – Nick Cox Jan 18 '17 at 14:17
  • What do you mean whith "follow the regression"? @NickCox – KingBoomie Jan 18 '17 at 14:22
  • 1
    What could "slope" possibly mean without an assumption of a linear relationship in the first place?? – whuber Jan 18 '17 at 14:22
  • I understand @whuber, what I meant is cor.test is to test the significance of a possible linear relationship. However what would the significance of the slope tell me? – KingBoomie Jan 18 '17 at 14:24
  • 2
    The regression is the analysis to follow. It's just a coincidence here that regression and correlation overlap so much. If you decided that the relationship was curved, and here there is a serious case for that, then the correlation is immediately secondary to the regression and not directly pertinent. – Nick Cox Jan 18 '17 at 14:24
  • I plot the data, see that there is a probable linear relationship, I test the significane of it using a correlation test, but then I'm confused about what the significance of the slope tells me? @NickCox I'm just new in statistics so it's hard to understand what you mean – KingBoomie Jan 18 '17 at 14:28
  • 1
    The p-value tells you whether it's statistically away from zero. Therefore, whether the coefficient is useful or not in the regression. – SmallChess Jan 18 '17 at 14:31
  • Your low p-value tells you the predictor is strong in your linear relationship. – SmallChess Jan 18 '17 at 14:32
  • 1
    Note also that with just one predictor the $p$-values are identical. – mdewey Jan 18 '17 at 14:37
  • I am, believe me, sympathetic but unfortunately "I'm confused" isn't a question. If your comments just recycle the tone of the question, we can't tell what you don't understand. You are using an introductory book which I know about but haven't used. As you will want to do a good job on the science here, presumably your main goal, you need a book that talks about which models make biological sense too. I am not a biologist and don't know your mathematical level, so can't advise on that. – Nick Cox Jan 18 '17 at 14:41
  • In your sample data set, you can use the p-value for the regression to conclude your linear relationship is super strong. – SmallChess Jan 18 '17 at 14:55
  • One other thing to consider is that correlation is usually employed when you want to know how two variables co-vary. For example "fungus growth" and "fungus weight". However, in this case you actually added something that will actively influence fungus growth, i.e. the acid, it can be considered an actual treatment and qualifies as a **predictor** variable. I would also use a regression analysis in this case since it will allow you to predict fungus growth for other acid concentrations. In a plot that examines correlation, I also would not add a regression line! – Stefan Jan 18 '17 at 15:02
  • 1
    If you want to test how a predictor $X$ (or treatment) influences your outcome $Y$ use regression. If you just want to check how two variables co-vary (you don't know if $X$ is predicting $Y$), e.g. height and weight , use correlation. – Stefan Jan 18 '17 at 15:32
  • It's your question, but I am disappointed that you deleted the data plot which made your question concrete. Also, several comments here (e.g. alternative analyses or mentioning the variables in your example) are now made more difficult to follow. – Nick Cox Jan 18 '17 at 17:53
  • @Stefan It might surprise you, then, that the p-values produced by `R`'s `lm` and `cor.test` functions will be identical (provided neither variable is a constant). – whuber Jan 18 '17 at 23:46
  • @whuber I am not sure why you say it would surprise me? In this simple regression example with one predictor, both p-values (`lm()` and `cor.test()`) must be the same because both are testing the same thing, i.e. whether the slope is different from zero. Was there something wrong with what I said? – Stefan Jan 19 '17 at 01:42
  • 1
    @whuber I shouldn't have been so quick with my last comment. `cor.test()` actually tests whether `alternative hypothesis: true correlation is not equal to 0`. So it doesn't test whether the slope is equal to 0. However I found out that [The correlation coefficient, $r$, is the slope of the regression line when both variables have been standardized first.](http://tinyurl.com/hg4pjeo) and [How does the correlation coefficient differ from regression slope?](http://tinyurl.com/h2qesbl). Thanks for poking me! Learned something new! – Stefan Jan 19 '17 at 05:28
  • 2
    @Stefan One reason it might surprise you is that the assumptions underlying a correlation test and a regression slope test are different--so even when we understand that the correlation and slope are really measuring the same thing, *why should their p-values be the same?* That shows how these issues go deeper than simply whether $r$ and $\beta$ should be numerically equal. – whuber Jan 19 '17 at 19:04
  • @whuber this is very interesting! I didn't know that and I will follow up later with a separate question. – Stefan Jan 19 '17 at 19:28

1 Answers1

2

There are different questions in this question. Neither correlation nor linear regression can prove causal relationship. But in your mind and in the model, the correlation is not directed but regression is. There is no difference in correlation, whether you think one value is the reason for the other whereas the formulation of a linear regression modell usually implies a direction. At least with ordinary least squares, it is not the same, whether you write $Y = aX+b$ or $X = cY+d$. However $cor(X,Y) = cor(Y,X)$.

Correlation and linear regression are familar, but the link is the $R^2$ value which results from linear regression and is indeed the square of the correlation coefficient $r$. You have not mentioned $R^2$ in your post so maybe this will help to get a better understanding.

The p-value mainly tells you, whether you sampled a large enough sample to conclude, which sign the correlation coefficient a and the regression coefficient r have.

Bernhard
  • 7,419
  • 14
  • 36
  • Can you please explain when correlation is usefull and when linear regression is usefull – KingBoomie Jan 18 '17 at 16:36
  • As a general rule, correlation is easy to grasp and will give a correlation coefficent and a p-value. If that is all you need, use correllation for its simplicity. Linear regression is more complex and yields all the information of correlation and then some more. Use regression if you want to know, whether X values are usually larger then Y values. Use regression, if you want to take a third or fourth value into account. E. g. Use regression if you want to see if there are differences in the slope between women and men. Use regression if you want to predict Y-values from X-values. – Bernhard Jan 18 '17 at 16:41
  • Once you start to deal with regression, you can develop it further and further and it will become more and more powerful. There are so many types and techniques of regression built upon linear regression, you can problably learn the rest of your life about. However, sometimes there is a lot of sense in brevity. When you write a poster or an abstract or you are in a situation with very little time to speak, sometimes "r = .008" or " r = 0.996" says it all and will be understood by people who only had an introductory course in statistics. That's why you should learn both. – Bernhard Jan 18 '17 at 16:47
  • @RickBeeloo http://stats.stackexchange.com/questions/2125/whats-the-difference-between-correlation-and-simple-linear-regression – SmallChess Jan 18 '17 at 22:54