4

Let's say I have many objects described by X and Y both of which are numbers. These numbers change in time and I want to figure out if a rise of X also correlates with a rise of Y.

I've no idea where to even start and I'd appreciate any pointers on which directions should I investigate.

Fabis
  • 141
  • 1
  • 1
    In the meantime that someone who knows more will answer, you could compare the derivative of both values. Speaking of discrete values, you should calculate the discrete derivative of X values and you get how the values are changing in time, and compare that with the "rateo of growth" of Y. – Pleasant94 Jul 30 '20 at 16:43
  • 2
    You have the data $\{X(t),Y(t)\};$ why not just regress $Y=mX+b?$ – Adrian Keister Jul 30 '20 at 18:00
  • 1
    A good starting point would be to do some research through Google by searching for something like "correlation", "find correlation" or "how to find correlation between two variables". The fact that you already know it's called "correlation" should make your search fairly trivial. You could most likely also find code for that by adding the language of your choice to your search query. If you'd like a deeper understanding and easier introduction of the topic, I might suggest taking an introductory statistics course that also covers this on somewhere like Coursera. – Bernhard Barker Jul 31 '20 at 08:51
  • You might also be interested in reading https://stats.stackexchange.com/questions/133155/how-to-use-pearson-correlation-correctly-with-time-series/133171#133171 and https://stats.stackexchange.com/questions/27691/how-do-i-interpret-my-regression-with-first-differenced-variables – Adrian Aug 01 '20 at 19:45

2 Answers2

5

If X and Y are normally distributed, you can use a Pearson correlation. If they are not, you can use a Spearman rank correlation.

Here is some R code.

> a <- c(1,2,3,4,5,6,7)
> b <- c(2,4,6,8,10,12,14)
> c <- c(2,5,4,10,8,13,11)
> d <- c(7,6,5,4,3,2,1)
> e <- runif(7, min=1,max=14)
> e
[1]  6.938054  1.347591  1.561456 10.867986
[5]  1.044163  1.870397 12.238245
> 
> cor(a,b, method="spearman")
[1] 1
> cor(a,c, method="spearman")
[1] 0.8928571
> cor(a,d, method="spearman")
[1] -1
> cor(a,e, method="spearman")
[1] 0.2857143
> 

A perfect correlation has a value of 1. A perfect negative correlation has a value of -1. 0 means there is no correlation. the runif() command generates random data. The c() command creates a vector. The data could also be put in tables, with rows and columns.

This page described what I did here more fully: http://www.sthda.com/english/wiki/correlation-test-between-two-variables-in-r

3

As Larry suggested in another answer, a simple correlation might be sufficient. If you want to allow that the relationship can be delayed or lagged, you can use cross-correlation. This is similar to autocorrelation, which is a cross-correlation of a function with itself, in that it will give you many coefficients, each corresponding to a value of lag.

Fato39
  • 762
  • 8
  • 21