2

It's known that values must have a linear relationship to count Pearson correlation between them. I'm wondering if there are any formal tests (preferably) or at least graphical way to check for the data linearity? As R user i'd glad to see R code example as well.

Denis
  • 439
  • 2
  • 9

2 Answers2

4

Actually the Pearson correlation can be computed for any data, not just linear data. However its distribution theory relies on linearity (which you need if you want to test the null hypothesis $\rho=0$ or compute a confidence interval) and interpretation may be a bit more tricky if the relationship is not linear.

The best check for linearity is to do a scatterplot of the data, where you can see whether there are obvious issues with it. There are formal tests, but I'm not very keen on them for model assumption checking - they test linearity against specific alternatives, but a non-rejection doesn't mean that linearity is actually fulfilled. The idea that if you have a formal test, the result will tell you reliably whether you can or can't use this-or-that method is a myth.

Christian Hennig
  • 10,796
  • 8
  • 35
2

A linear relationship has nothing to do with data linearity, it just means that when one variable increases or decreases, the other variable increases or decreases too.

For example,

> x <- 1:10
> y <- x^2
> cor(x,y)
[1] 0.9745586

Here $y$ is not linear, but is linearly correlated with $x$ because it increases when $x$ increases.

> x <- -10:10
> y <- x^2
> cor(x,y)
[1] 0

Here $y$ is not linearly correlated with $x$ because:

  • if $x<0$, then $y$ decreases when $x$ increases;
  • if $x>0$, then $y$ increases when $x$ increases.
Sergio
  • 5,628
  • 2
  • 11
  • 27
  • Thank you very much! But i'm confused now. I understand your explanation of `linear relationship`. I'm wondering, what is the `data linearity`? So, the assumption for `Pearson correlation` is `linear relationship` or `data linearity` eventually? – Denis Aug 18 '20 at 13:35
  • 1
    @Denis Linear relationship. For another example: x – Sergio Aug 18 '20 at 13:59
  • 1
    @Denis It's not a matter of assumptions: the Pearson correlation coefficient measures the linear relationship between two variables. It may be $0$ if they are not _linearly_ related. – Sergio Aug 18 '20 at 14:05
  • Thank you again. Sorry, do you know where i can find a clear explanation for the data `linearity` concept and how i may check my own data for that in `R`? I realized now that it's not related to `Pearson correlation`, but i saw this term many times in the papers as important assumption for other statistic tests. – Denis Aug 19 '20 at 17:07
  • 1
    @Denis For example, you should check that in a linear regression there actually is a linear relationship between predictors and outcome, and you can use scatterplots and residual analysis, but it's just an example. You should be more specific, and eventually post another question. – Sergio Aug 20 '20 at 10:06
  • Thank you very much for your help. I've posted a separate question last year. But because of my limited knowledge in statistics i didn't find a clear way to do it in `R`. Please check my post:https://stats.stackexchange.com/questions/424202/check-if-dependency-between-two-variables-is-linear – Denis Aug 23 '20 at 15:22