0

I am trying to do a study to determine if average annual temperature is related to number of cases of a particular disease. I have data for 15 different states over ten years. I have done multiple transformations and cannot get the average annual temperature to be approximately normally distributed. Does any one have any suggestions to get temperature normally distributed? The histogram doesn't look awful, there are just dips in the middle where there should be peaks.

Is a correlation robust to deviations from normal if the sample size is 150? If not, can you suggest a good nonparametric alternative? Any suggestions would be greatly appreciated!

Nick Stauner
  • 11,558
  • 5
  • 47
  • 105
  • 1
    Why do you want average annual temperature to look normal? For that matter, what are you using correlation to do - simply to answer the question "are they related?"... ? If that's the case, the ordinary correlation may not be a good choice for how related they are (it would seem that something that measures monotonic association might be more relevant). But presumably this isn't random sampling. What is it you hope to show? – Glen_b Mar 24 '14 at 03:01
  • An illustration of what can be done appears in the analysis of the relationship between temperature and another variable presented at http://stats.stackexchange.com/a/35717. This definitely is not the only approach! – whuber Mar 24 '14 at 20:10

1 Answers1

5

There are two parts to this:

  1. Why are you transforming at all? Regression models don't require that any marginal distribution (outcome or predictors) be normal (Gaussian). This point is made repeatedly in threads on transformation in this forum, so I advise search of previous posts.

  2. I advise against trying any transformations of temperature, assuming that you have either Fahrenheit or Celsius measurements, which are both standard examples of interval scale variables with arbitrary zeros. You run a great risk of producing something highly arbitrary, if not meaningless.

You don't say so, but I guess here that 15 states means 15 states of the United States. If so, then I guess that you are using Fahrenheit and none of your annual temperatures are zero or negative, but in principle they could be and it is a bad idea to use any transformation that would be undefined for possible values of your data. (Zero or negative temperatures would not mix with reciprocal, logarithmic and square root transformations.)

Further, whatever you would do to Fahrenheit should mesh with whatever you would do to Celsius measurements, and vice versa. That rule alone renders most transformations moot as contingent on an arbitrary choice of measurement units.

Your bimodal distribution sounds like an artefact of which states you have chosen (or for which you have available data), one fairly cold group with low averages and some with rather higher averages. Even if you had 50 states annual temperature would not necessarily be normally distributed. But this bimodality is as likely to be useful as harmful, and I advise leaving the distribution as it is.

(A side-issue is that this is a thoroughly international forum, so spelling out which areas you are working with does no harm. There is no presumption that it's one particular country.)

Nick Cox
  • 48,377
  • 8
  • 110
  • 156