1

I am trying to answer a question as to whether or not the month of the year affects the geographical distribution of hurricanes in the Atlantic ocean.

I know for example about hurricane season, and that time affects the frequency of hurricanes. But does it affect the geographic distribution of hurricanes?

Would this be a good use of ANOVA, where month is the category and location is the continuous variable? And if so, how would you apply ANOVA when there are two dependent variables that you care about (latitude and longitude)? Would you just do it twice, once for latitude and once for longitude?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
rocksNwaves
  • 299
  • 1
  • 9
  • 1
    I am not sure it makes sense to "hold the number of hurricanes fixed". Also, look into spatial statistics for modeling locations. Regarding months, *don't* treat months as a factor. That would amount to treating April 1 and April 30 as identical, but May 1 as completely different from either one. That doesn't make sense. Also, you will need to expend 11 degrees of freedom to model 12 months. Much better to use a (possibly periodic) spline transform of the date. You will get a smooth relationship with enough flexibility for a fraction of the dfs. – Stephan Kolassa Apr 01 '21 at 16:27
  • Hi @StephanKolassa, I'm afraid you just spoke a lot of Greek as far as I'm concerned. I understand only the very basics of probability distributions and statistical tests, so you'll have to dumb it down a bit. Sorry about that :/ – rocksNwaves Apr 01 '21 at 16:57
  • 1
    No problem. I recommend you read up on splines. Frank Harrell's *Regression Modeling Strategies* has a very readable introduction. However, if you want to do spatial modeling, you should really dig rather deeply into the appropriate statistics, or find a statistician as a collaborator. – Stephan Kolassa Apr 01 '21 at 17:03
  • @StephanKolassa Fair enough. I thought this question was effort in the direction of digging into "appropriate statistics". Sorry if it seemed like a shallow question, but it's hard to know where to start. Also thanks for your comments. I'm used to a lot more snarkiness when I ask questions, so this is a welcome exchange. – rocksNwaves Apr 01 '21 at 17:10
  • 2
    It is definitely a good question, not shallow at all, and a great place to start! The problem I see is that the treatment of time (monthly dummies vs. spline transform) is much easier than a reasonable spatial model, so your underlying challenge is really not the one you started with here. Unfortunately, while I could discuss the time transform, I am not qualified to opine on spatial statistics. We do have experts on spatial statistics here, it's just that it sounds like you would most profit from a textbook first. – Stephan Kolassa Apr 01 '21 at 17:15
  • I would start out with visualization! If you can share your dataset we could try ... You could add the tag [tag:data-visualization] – kjetil b halvorsen Apr 02 '21 at 10:38
  • 1
    @kjetilbhalvorsen That's a good idea. I have visualized the data via geopandas, but I'm not sure if the difference in distributions month to month is due to the sparsity of the data after aggregation by month or some actual relationship. That's why I want to find an appropriate statistical test. – rocksNwaves Apr 02 '21 at 13:12
  • I don't know about geopandas, but the ideas in [R's `coplot` function](https://stats.stackexchange.com/questions/381072/how-to-test-whether-the-association-between-two-continuous-variables-varies-by-a/381082#381082) could be useful. Then you do not need a strict binning into 12 months or 4 quartes! – kjetil b halvorsen Apr 03 '21 at 17:55

0 Answers0