interpreting a log-log-linear model of two continous variables with or without interaction terms

Question

I am regressing ecological distances between communities (as expressed as similarity) over their spatial and temporal distance on a regular grid of 360 sampling stations divided over six time points. The approach is known as distance-decay or time-decay in community analysis, and is a standard method to assess the influence of geographic or temporal distance on the relatedness of two communities. It regresses all pairwise community distances, expressed as similarities (e.g. (1-bray curtis) or (1- jaccard)) versus all pairwise spatial or temporal distances. The linear regression is very often formulated as log/log-linear (see onlinelibrary.wiley.com/doi/full/10.1111/…).

Since there are samples with large spatial distances (up to 10 m) sampled at the same time (so time.elapsed=0), and samples with large temporal distances (1 year) but small spatial distances (as low as 50 cm; no location was sampled twice), the question is how space and time interact with each other to shape communities.

my non-interactive log-log-linear model glm(log10(Similarity)~log10(Time+1) + log10(Space)) yields:

glm(formula = log10(com.sim) ~ log10(Space) + log10(Time + 1), data = a)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-0.28927  -0.01443   0.01030   0.02782   0.07214  

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
(Intercept)     -0.1075605  0.0004254 -252.84   <2e-16 ***
log10(Space)    -0.0086474  0.0004695  -18.42   <2e-16 ***
log10(Time + 1) -0.0087252  0.0001608  -54.27   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for gaussian family taken to be 0.001929398)

    Null deviance: 252.89  on 127805  degrees of freedom
Residual deviance: 246.58  on 127803  degrees of freedom
AIC: -436155

Number of Fisher Scoring iterations: 2

(as you can see i have ~70000 data points, which is another story by itself)

So, very small effect sizes, but increasing time and space between two samples each decreases community similarity. If i am correct in interpreting log-log-linear regression, a 1% chance of space (or time+1)= would result in a decrease of -0.008 similarity units.

My interactive model looks like this:

   glm(formula = log10(com.sim) ~ log10(Space) * log10(Time + 1), data = a)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-0.29134  -0.01438   0.01033   0.02779   0.07238  

Coefficients:
                               Estimate Std. Error  t value Pr(>|t|)    
(Intercept)                  -0.1031889  0.0007938 -129.994  < 2e-16 ***
log10(Space)                 -0.0152780  0.0011197  -13.645  < 2e-16 ***
log10(Time + 1)              -0.0113717  0.0004364  -26.056  < 2e-16 ***
log10(Space):log10(Time + 1)  0.0040206  0.0006164    6.523 6.94e-11 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for gaussian family taken to be 0.001928771)

    Null deviance: 252.89  on 127805  degrees of freedom
Residual deviance: 246.50  on 127802  degrees of freedom
AIC: -436195

Number of Fisher Scoring iterations: 2

So, here the slopes for the individual terms are higher than in the non-interactive model. As i have learned from this answer, the interaction term expresses the elasticity of an increase of 1% Time +1 in respect to Space, which is 0.00004

My questions are: why are the term slopes so different between the interactive and non-interactive models? Is my interpretation of the interaction term correct? How would i explain the influence of time on space (and vice versa)? Is it possible to formulate a simple sentence to summarize the model?

Thank you.

The specific question in your title is answerable, but the logic unpinning your choice of model is hard to understand. The latter seems like a bigger concern to address if you are trying to draw any sort of inference from it. Could you expand on exactly what your goal is, and how your model achieves it? — mkt, Aug 05 '19 at 08:14
The approach is known as distance-decay or time-decay in community decay, and is a standard method to assess the influence of geographic or temporal distances between two communities on their relatedness. It regresses all pairwise community distances, expressed as similarities (e.g. (1-bray curtis) or (1- jaccard)) versus all pairwise spatial or temporal distances. The linear regression is very often formulated as log/log-linear (see https://onlinelibrary.wiley.com/doi/full/10.1111/j.0906-7590.2007.04817.x). Usually, only one of the two distances are tested (time or space). — nouse, Aug 05 '19 at 11:06
Thanks, that helps (I recommend editing that information into the question too). But also: why time+1, and what would that mean? And why use a GLM with a Gaussian family - do you have reason to expect that the variance will scale with the mean? Aside from these questions, I'd recommend plotting the output of your models to get a better understanding of what they mean. — mkt, Aug 05 '19 at 11:31
(time + 1) because i have samples sampled at the same day, so it would be log10(0)=-Inf. I am adding a single day to every temporal distance (range between 1 and 300 days). — nouse, Aug 05 '19 at 12:07
About my model assumptions, you wont be satisfied with it, but basically the model tries to fit a straight line through a heteroscedastic cloud of 70,000 points. Thus, my residuals are skewed to right. It looks almost exactly like https://i.postimg.cc/3RWPL2tv/Unbenannt.jpg, which is from Wu et al, 2019, Nature Microb, about waste water treatment plants, who also fitted gaussian lm's. My goal is to compare my slopes to this paper (and others), so currently id like to live with the shortcomings of my model. — nouse, Aug 05 '19 at 12:12

interpreting a log-log-linear model of two continous variables with or without interaction terms

0 Answers0