1

I have temperature and humidity readings from two different sensors (located in the same geographical area). The data is logged every hour and I have data for more than a year, logged this way 2/1/2019 15:00. Now, I want to validate that the sensors are working correctly and that they should produce similar results for a given time period. Like their hourly readings, in general, shouldn't vary a lot. How do I statistically measure their similarity?

I have tried a two-sample Kolmogorov–Smirnov test from scipy. However, I am not sure if this is the right approach as the result gave a very small value of pvalue=4.082677479167249e-05. Their histogram plots look almost identical (normal distribution). And from inspection, it appeared that the sensors are producing similar results.

stats.ks_2samp(df1.temp,df2.temp)

  • 1
    The KS-test tests whether two year-long records have the same dist'n. However, it assumes observations are independent, and I doubt that temp or humidity readings in adjacent hours are indep. Also, there are--admittedly improbable--circumstances under which pairs of readings at the same hour would always be unacceptably different and yet year-long observations would be essentially identical. // You have a lot of data. How about taking 1000 specific hours during the yr at random and taking the difference of the pair of sensors. Then doing a test to see if the mean (or median) difference is 0. – BruceET Aug 21 '19 at 17:17
  • 2
    Measurement comparison in health studies usually relies on methods developed by Bland and Altman. There are some questions on this site tagged [tag:bland-altman-plot] which may give you ideas. Their papers are well worth reading too. As @BruceET hints you may get more defensible results by sampling from your streams. – mdewey Aug 21 '19 at 17:39
  • Thank you for the suggestions, I will try them out. – Sakib Shahriar Aug 22 '19 at 04:55

1 Answers1

0

As commented by @mdewey, I think the bland-altman-plot provides a solution to my problem.