How to analyse spatial data where the depending variable is binary

Question

I have to test which factors influence game damage in fields. I mapped areas with damage and those without. It was not always possible to map 100% of a field, so there are also areas where it is unsure. Since the "unit" damage is not objective because it is not possible to determine where one ends and the next starts, I put a grid over the area and calculated for every cell independently the distance to different structures (forest, roads etc.). The resulting data look like this:

| damage |  id | dist_forest | dist_maiz | dist_roads |...
|0       |   51|           30|         20|          70|...   
|0       |   51|           20|         10|          60|...   
|0       |   52|           60|         10|          80|...   
|0       |   52|           40|         70|          10|...   
|0       |   52|           20|         60|          50|...   
|1       |   53|           10|         10|          50|...   
|1       |   53|           05|         20|          30|...   
|1       |   54|           20|         30|          20|...   
|1       |   54|           30|         20|          90|...   
|1       |   54|           40|         10|          10|...

(I have about 100 individual polygons which lead to 100000 lines when resolved in square meters)

I wanted to use a binary logistic regression with random effects. To solve the problem with the non-independent data I was adding the id of the damage-polygons as a random factor. The resulting model was like this:

glm <- glm(damage ~ dist_forest + dist_maiz + dist_roads + (1|cat), family=binomial(logit),data=data)

The problem ist now, that all my parameters are all highly significant. I already asked that here and it was suggested to use a special model for spatial data.

if data are (substantially) spatially dependent then standard significance tests don't apply any way!

Does anybody have any suggestions how to proceed further?

Just because your data are spatial it doesn't necessarily follow that they are spatially dependent. For an explanation of spatial dependence see http://stats.stackexchange.com/questions/18406/what-is-the-difference-between-spatial-dependence-and-spatial-heterogeneity. — Adam Bailey, Jan 16 '14 at 07:51

score 2 · Answer 1 · answered Jan 16 '14 at 02:18

You may want to try something like Indicator Kriging. Where you plot the locations where "damage" occurred and then run a kriging model. Then, you can extract the calculated probability of "damage" at the location of each feature and fit a logistic model based on distances to the remaining features. This way, you know that the distance to one of your features is 0 so you are effectively controlling for a feature in each instance.

Anyway, the above is just a sketch of some spatial modelling you could do. I'd look into Kriging/Geostatistical or spatial stastical models if you are modeling geometric relationships.

score 0 · Answer 2 · answered Mar 25 '14 at 11:47

0

I solved the Problem now with a Generalize Additive Modell (GAM). I was including the term s(lat,long) to consider the spatial dependency of the points.

answered Mar 25 '14 at 11:47

meles

151
1

How to analyse spatial data where the depending variable is binary

2 Answers2

Linked