2

I have daily temperature for 2 cities, and I am trying to see if we can conclude that one city is warmer than the other. I could use a Mann–Whitney for a whole year, or I can bin the temperature into maybe weeks or 2–3 days at a time. Maybe I can use a Chi-squared? Are there also technical issues (not related to area knowledge, i.e., how temperature is experienced) to consider when binning?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
MSIS
  • 447
  • 3
  • 10
  • 1
    If you're comparing the two cities for the same year, then I wonder if you have paired data. If so, you want Wilcoxon signed rank test for 365 days. If you want to compare 'how temperature is experienced' maybe you want a temperature index that takes humidity and wind into account: 'feels-like' temperatures. // Your description is sketchy and telegraphic, but just from what you say, I see no advantage to binning. – BruceET Nov 19 '19 at 21:09
  • 1
    @BruceET: I don't get it, if I have data for each city taken at each day, how is not paired data? – MSIS Nov 19 '19 at 21:11
  • 1
    I'm saying I think it is paired data. But don't use Mann-Whitney-Wilcoxon 2-sample test for paired data. – BruceET Nov 19 '19 at 21:12
  • 2
    Beware; these are time series; the temperature differences are not going to be independent. I agree with "don't bin" advice; it's rarely beneficial. Sometimes it doesn't hurt much. – Glen_b Nov 20 '19 at 02:15
  • @Glen_b-ReinstateMonica: I worry too about variance of individual data points being high-enough that ranks may be flipped, e.g., if the difference in one day is 0.8 but variance is 1. How would we address this? – MSIS Nov 20 '19 at 19:30
  • I'm not sure I follow; that doesn't seem to relate to the points being made. – Glen_b Nov 21 '19 at 04:50
  • @Glen_b-ReinstateMonica: Yes, it is somewhat -oblique, maybe unrelated. Maybe an example illustrates best: If temp1 =23, temp2=22 and variance(temp1)=3 , then 20 temp2 may not be reflective of the " True Rank". – MSIS Nov 22 '19 at 23:52

1 Answers1

2

Since this is temperature time series, there is certainly autocorrelation, which must be taken into account. Let the time series be $Y_{jt}, j=1,2;\quad t=1,2 \dotsc, T$. Since the interest is in the paired comparison calculate the difference time series $D_t = Y_{2t}-Y_{1t}$. The mean temperature difference can be estimated by the mean of $D_t$ (other estimators as the median or some trimmed mean ... could replace the mean).

But the autocorrelation makes it non-trivial to find the standard error of this estimate. Some ideas:

  1. Estimate the autocorrelation function and use it to find the se.

  2. Use moving block bootstrap?

  3. Calculate an autocorrelation-resistant standard error?

Related Qs on site with interesting answers:
Determining standard error of the mean from a correlated, stationary time series using known autocorrelation without block averaging,
How to estimate confidence interval of the sample mean of a non-stationary time series?,
T-test in the presence of autocorrelation,
Estimating accurately the mean of an autocorrelated bounded integer time series,
Calculating error of mean of time series,
Newey-West t-statistics

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467