This is similar to an earlier question which was never answered... Interpretation of interactions between categorical data... and which is much shorter so no problem if you would like to read/answer that instead:
I am doing an experiment where I have varied one parameter (timestep [tt] - factor, 5 levels: hourly, 3hrly, 6hrly, 12hrly, daily), added in a couple of covariates (release location [location], release month [m]) - so far straightforward - and have a response variable (where it starts getting complicated).
I am trying to see how timestep affects the track of a particle which is traced over time, but further than that suggest the maximum timestep which can be used with minimal effect on particle fate. "Particle fate" is a complicated concept which varies in direction and distance after release, so I have created a response variable which is already a moving comparison - i.e. distance from the daily timestep position, per day (e.g. below [day_3]). This allows me to exclude tracking day from the test which gets a bit around the timeseries/autocorrelation issue, but I realise I will need to be careful to remember that I am looking at the same particles at different times and patterns are likely to be lost over time due to the increasing influence of random factors.
I figure an ANOVA/ANCOVA derivative would be an appropriate test since I am looking for differences between sample groups, so I have run an lm in R in order to utilise its factor-level specific ANOVA - here is an example output:
> summary(lm(day_3 ~ tt + m * location))
Call:
lm(formula = day_3 ~ tt + m * location)
Residuals:
Min 1Q Median 3Q Max
-6.7523 -0.3392 -0.0836 0.2803 2.4548
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 10.7602 0.7272 14.796 < 2e-16 ***
tt3 -0.1765 0.4719 -0.374 0.710065
tt6 -0.4822 0.4719 -1.022 0.312317
tt12 -1.8280 0.4719 -3.874 0.000345 ***
m4 0.3378 0.9438 0.358 0.722087
m7 -9.1490 0.9438 -9.694 1.36e-12 ***
m10 -5.8576 0.9438 -6.207 1.54e-07 ***
location180 -5.2777 0.9438 -5.592 1.25e-06 ***
location270 -8.6343 0.9438 -9.149 7.82e-12 ***
location360 -7.3264 0.9438 -7.763 7.64e-10 ***
m4:location180 -4.8520 1.3347 -3.635 0.000710 ***
m7:location180 5.5768 1.3347 4.178 0.000133 ***
m10:location180 4.1863 1.3347 3.137 0.003011 **
m4:location270 -0.5704 1.3347 -0.427 0.671166
m7:location270 10.3842 1.3347 7.780 7.21e-10 ***
m10:location270 8.8500 1.3347 6.631 3.60e-08 ***
m4:location360 -1.8426 1.3347 -1.381 0.174234
m7:location360 6.5613 1.3347 4.916 1.22e-05 ***
m10:location360 4.1764 1.3347 3.129 0.003073 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.335 on 45 degrees of freedom
Multiple R-squared: 0.8857, Adjusted R-squared: 0.84
F-statistic: 19.37 on 18 and 45 DF, p-value: 1.724e-15
In this instance, of all the timesteps, you can see that the intercept (hourly) and the tt12 (12hrly) is significant which I would interpret as hourly is different from daily (which would be an intercept of zero), 3hrly & 6hrly are no different from hourly and 12 hrly is different from hourly. However this does not tell me whether 12hrly is different from daily?
I ran the test again with a reordering of levels so that 12hrly is the reference level and got the following output:
> summary(lm(day_3 ~ tt + m * location))
Call:
lm(formula = day_3 ~ tt + m * location)
Residuals:
Min 1Q Median 3Q Max
-6.7523 -0.3392 -0.0836 0.2803 2.4548
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.9322 0.7272 12.283 5.69e-16 ***
tt1 1.8280 0.4719 3.874 0.000345 ***
tt3 1.6515 0.4719 3.500 0.001062 **
tt6 1.3458 0.4719 2.852 0.006538 **
m4 0.3378 0.9438 0.358 0.722087
m7 -9.1490 0.9438 -9.694 1.36e-12 ***
m10 -5.8576 0.9438 -6.207 1.54e-07 ***
location180 -5.2777 0.9438 -5.592 1.25e-06 ***
location270 -8.6343 0.9438 -9.149 7.82e-12 ***
location360 -7.3264 0.9438 -7.763 7.64e-10 ***
m4:location180 -4.8520 1.3347 -3.635 0.000710 ***
m7:location180 5.5768 1.3347 4.178 0.000133 ***
m10:location180 4.1863 1.3347 3.137 0.003011 **
m4:location270 -0.5704 1.3347 -0.427 0.671166
m7:location270 10.3842 1.3347 7.780 7.21e-10 ***
m10:location270 8.8500 1.3347 6.631 3.60e-08 ***
m4:location360 -1.8426 1.3347 -1.381 0.174234
m7:location360 6.5613 1.3347 4.916 1.22e-05 ***
m10:location360 4.1764 1.3347 3.129 0.003073 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.335 on 45 degrees of freedom
Multiple R-squared: 0.8857, Adjusted R-squared: 0.84
F-statistic: 19.37 on 18 and 45 DF, p-value: 1.724e-15
I figure this tells me that 12hrly is different from daily also, but then suggests that all other increments of timestep are different from 12hrly.
So my question is... is it appropriate to rehash these tests with each categorical level (timestep) as the reference level to see where they overlap and where they differ, or is there some "better"/easier way to cross-compare the timestep levels. I realise there are other clues in each output and there may be a way to interpret the "estimate" (slope)/ "t value" columns to see whether there is a significant difference between timestep values? If so how do you set a threshold value? What would you recommend in my shoes?
N.B. as an aside, I realise I am using m * location as an interaction - this is intentional and was tested prior to running the lm. And I realise that day of tracking needs to be carefully cross-considered also as they are autocorrelated results.
P.S. Apologies if this is overspecific!
P.P.S THANKS FOR READING ALL THIS!