I am trying to determine if wildfires occur closer to forests or not. For this, I have two matrices with the same shape:
- A matrix of ones and zeros where each value represents whether a fire is present or not.
- A matrix where each value represents the distance to the nearest forest. As this was derived from another matrix that represented forest areas, all the distances are integers starting from zero.
Following the approach of Kumar et al. (2014), I extracted two different groups of distances to compare:
- The distances where a fire occured (N = 2407).
- All the distances in the study area (N = 58544).
and ran a two-sample Kolmogorov-Smirnov test to check whether the two groups of distances belong to the same distribution.
For some context, here is a plot with the ECDFs of the first group of distances (red) and the second (blue).
Should anyone would want to access the data and reproduce the test, here is a link to download it and a small R snippet to load it:
> all <- scan("all.txt")
> fire <- scan("fire.txt")
I got the following result for the Kolmogorov-Smirnov test:
> ks.test(fire, all)
Two-sample Kolmogorov-Smirnov test
data: fire and all
D = 0.056674, p-value = 7.098e-07
alternative hypothesis: two-sided
Warning message:
In ks.test(fire, all) : p-value will be approximate in the presence of ties
My interpretation, taking a look at the small p-value
, is that the test suggests that both groups of distances belong to different distributions. However, the D statistic relatively small, suggesting that they might belong to the same distribution after all. Also, given the nature of the data, there are many ties. I'm wondering if there is a better approach to achieve my objective.
I took a look at a similar question and one of the answers suggests using a Chi-Squared test. As the groups of distances do not share the same size (N), I'd think this is not a possibility.
Furthermore, I have other study areas and in some cases the distributions seem to match, in other fires appear to occur closer to the forest and in other fires appear to occur farther from the forests. Is there any test that can statistically tell me whether the distributions are "smaller" (closer), "larger" (farther) or similar?