1

I am trying to determine the correlation between two sets of data points which span the same time period (20 minutes) but have different resolutions. The first set was recorded at 1-minute intervals, while the second was recorded at 2-second intervals. Both sets of data contain 10 minutes before an event, and 10 minutes after the event. The recordings are of completely different effects resulting from the event, so the 1-minute and 2-second data are not related.

When I plot the two sets against their common time scale, I get a visual match in the pattern of the graphs which indicates that there is a definite correlation between the two effects - the same reaction profiles are present in the data. The events were repeated 19 times with multiple recording devices for each effect, 94% of which indicate this visual correlation.

However I need to do the proper statistics to back up my claim that there is a strong correlation, but it isn't my field of expertise. I was advised to use Pearson, (an online tool into which you paste the two data sets and it does all the work) but this requires two data sets of equal length.

I have looked into downsampling the 2-second data to 1-minute data through aggregation but this will take extremely long. I've also considered downsampling by only keeping every 30th data point (decimating the other 29 points) and re-terming it a 1-minute measurement. Decimation will also take long, but less time than aggregation.

Q1. Is there a different statistical method that I am not aware of that will let me compare the two sets as they are?

Q2. If I downsample, which method is better? Aggregation or Decimation?

I am currently working in LibreOffice Calc, which is the Linux version of Excel.

Ani
  • 11
  • 3
  • 1
    Why downsampling (via agregation) should take extremely long? If you want correlations only then agregating is the way to go. Only you have various ways to do the agregation and correlation might be different in each case. But that is ok. You can imagine your problem as determining the correlation between one variable (lower resolution) and many variables (high resolution variable sampled and one low resolution period). – mpiktas Oct 03 '14 at 10:52
  • Computers should let us do repetitive operations easily and quickly. Think about creative ways to use your software to allow you to manipulate the data with a minimal amount of human effort. – Joel W. Oct 05 '14 at 13:36
  • @mpiktas Downsampling would take very long as I am dealing with 19 separate events, using 3 stations for the first effect, each recording 4 different directions (1minute data), and up to 4 stations recording the second effect (2sec data). This results in a 19-page spreadsheet with multiple columns and hundreds of rows. Each event has different characteristics resulting in different key time values at which the downsampling needs to occur, and different ranges for which the correlation needs to be performed. I'm trying decimation, as aggregation may change the statistical distribution. – Ani Oct 07 '14 at 23:11
  • @JoelW. If there was a faster way to do it, I would use it. But as I have explained in my previous post, each case varies. Each station, each direction, each effect has differing characteristics for which I need to reduce the data. I can't just write some formulas and replicate them because each one has different input values lying in different cells. I have to do a lot of manual manipulation. Thank you both for your input though! Decimation does not change the statistical distribution of the data, so I am using it. It's also a little faster. – Ani Oct 07 '14 at 23:14
  • You say that when you plot the data you see a clear pattern. There should be a way to capture and measure that pattern statistically, without cleaning up the data manually. – Joel W. Oct 08 '14 at 00:08
  • @JoelW. That is exactly what I was looking for, but I haven't found it yet - a method to measure the pattern statistically. I was recommended Pearson, but I can now say for certain that Pearson doesn't work in this case. My correlation coefficient results are.. useless. I'm not sure if it's possible to statistically correlate these two effects at the data point level. Perhaps a visual correlation is the best that can be done, at least by me, anyway. Thank you for your comments though! – Ani Oct 08 '14 at 13:15
  • Can you post the plot you mention and part of the data file that produced the plot? – Joel W. Oct 08 '14 at 14:03
  • @JoelW. I'm not sure how to upload an image? – Ani Oct 09 '14 at 17:51

0 Answers0