Using Binning before Mann–Whitney for Temperature Data

Question

I have daily temperature for 2 cities, and I am trying to see if we can conclude that one city is warmer than the other. I could use a Mann–Whitney for a whole year, or I can bin the temperature into maybe weeks or 2–3 days at a time. Maybe I can use a Chi-squared? Are there also technical issues (not related to area knowledge, i.e., how temperature is experienced) to consider when binning?

If you're comparing the two cities for the same year, then I wonder if you have paired data. If so, you want Wilcoxon signed rank test for 365 days. If you want to compare 'how temperature is experienced' maybe you want a temperature index that takes humidity and wind into account: 'feels-like' temperatures. // Your description is sketchy and telegraphic, but just from what you say, I see no advantage to binning. — BruceET, Nov 19 '19 at 21:09
@BruceET: I don't get it, if I have data for each city taken at each day, how is not paired data? — MSIS, Nov 19 '19 at 21:11
I'm saying I think it is paired data. But don't use Mann-Whitney-Wilcoxon 2-sample test for paired data. — BruceET, Nov 19 '19 at 21:12
Beware; these are time series; the temperature differences are not going to be independent. I agree with "don't bin" advice; it's rarely beneficial. Sometimes it doesn't hurt much. — Glen_b, Nov 20 '19 at 02:15
@Glen_b-ReinstateMonica: I worry too about variance of individual data points being high-enough that ranks may be flipped, e.g., if the difference in one day is 0.8 but variance is 1. How would we address this? — MSIS, Nov 20 '19 at 19:30
I'm not sure I follow; that doesn't seem to relate to the points being made. — Glen_b, Nov 21 '19 at 04:50
@Glen_b-ReinstateMonica: Yes, it is somewhat -oblique, maybe unrelated. Maybe an example illustrates best: If temp1 =23, temp2=22 and variance(temp1)=3 , then 20 temp2 may not be reflective of the " True Rank". — MSIS, Nov 22 '19 at 23:52

kjetil b halvorsen · Accepted Answer · 2021-05-16T04:18:29.017

Since this is temperature time series, there is certainly autocorrelation, which must be taken into account. Let the time series be $Y_{jt}, j=1,2;\quad t=1,2 \dotsc, T$. Since the interest is in the paired comparison calculate the difference time series $D_t = Y_{2t}-Y_{1t}$. The mean temperature difference can be estimated by the mean of $D_t$ (other estimators as the median or some trimmed mean ... could replace the mean).

But the autocorrelation makes it non-trivial to find the standard error of this estimate. Some ideas:

Estimate the autocorrelation function and use it to find the se.
Use moving block bootstrap?
Calculate an autocorrelation-resistant standard error?

Related Qs on site with interesting answers:
Determining standard error of the mean from a correlated, stationary time series using known autocorrelation without block averaging,
How to estimate confidence interval of the sample mean of a non-stationary time series?,
T-test in the presence of autocorrelation,
Estimating accurately the mean of an autocorrelated bounded integer time series,
Calculating error of mean of time series,
Newey-West t-statistics

Using Binning before Mann–Whitney for Temperature Data

1 Answers1