Interpreting the output from a gls model

Question

I am quite new to R and coding so please forgive the lack of in depth information I may provide. I am also new at using linear models, particularly with large data sets. I have used the gls function in the nlme package to assess water quality data and I just need to understand the output and what I need to report for an article.

I want to look at the relationship between water flow and various parameters (electrical conductivity, pH etc etc) over a long time period (50 years). The data is stationary and there are many data points so autocorrelation is present (I tested for this elsewhere) and this is why I am using gls instead of linear methods (was also suggested to me by a reviewer for a paper). I ran the code to look at flow and electrical conductivity (ec) and the dataset name is rivin.

I ran a first model (m1) using the following script

m1<-gls(flow~ec, rivin)

and then a second one as follows using the AR(1) function

m2<-update(m1,correlation=corAR1())

I then used anova() to check significance between the two models and this is the output:

Model df      AIC      BIC    logLik   Test  L.Ratio p-value
m1     1  3 183.2906 193.5522 -88.64531                        
m2     2  4 164.8020 178.4841 -78.40098 1 vs 2 20.48866  <.0001

Does this mean that m2 is significantly different from m1?

I then look at the summary from m2:

summary(m2)
Generalized least squares fit by REML
  Model: flow ~ ec 
  Data: rivin 
      AIC      BIC    logLik
  164.802 178.4841 -78.40098

Correlation Structure: AR(1)
 Formula: ~1 
 Parameter estimate(s):
      Phi 
0.3111562 

Coefficients:
                Value Std.Error   t-value p-value
(Intercept)  3.936472 0.5951170  6.614619       0
ec          -1.106382 0.2228789 -4.964047       0

 Correlation: 
   (Intr)
ec -0.999

Standardized residuals:
        Min          Q1         Med          Q3         Max 
-2.70663729 -0.58400432  0.03536558  0.33392867  5.01270171 

Residual standard error: 0.3557397 
Degrees of freedom: 228 total; 226 residual

Here I just please want to know how to interpret the results and what to report in an article. Do the Coefficient results indicate that ec decreases as flow increases and that this is significant? And what does the Correlation (Intr) show me? Does this value of -0.999 indicate collinearity between the two variables and make the model invalid? And what do the results from the Standardized residuals indicate?

Thank you in advance.

Welcome to CV. Many of your questions have already been answered [here](https://stats.stackexchange.com/q/5135/176202). The interpretation is hardly different for `gls`. If you want to estimate the effect of more than just `ec`, you should consider interactions between the variables and (even if there are none) use a single model with all explanatory variables. By running separate models you will systematically overestimate the coefficients by omitted-variable bias. — Frans Rodenburg, Jul 15 '19 at 10:28
Your question on the `anova()` output has already been answered [here](https://stats.stackexchange.com/a/274632/176202). — Frans Rodenburg, Jul 15 '19 at 10:32
I wouldn't worry too much about the correlation between intercept and slope: https://stats.stackexchange.com/questions/171125/correlation-between-ols-estimators-for-intercept-and-slope — Roland, Jul 15 '19 at 11:38

Isabella Ghement · Answer 1 · 2019-07-15T21:10:10.260

When you say the data are "stationary", do you mean all of the following:

There is no evidence of a temporal trend in the values of flow over time;
There is no evidence of increased variability in the values of flow over time;
There is no evidence of a temporal trend in the values of ec over time;
There is no evidence of increased variability in the values of ec over time?

If yes, it makes sense to relate flow to ec via a linear model, while allowing for the potential of temporally correlated model errors.

You also have not specified if you have data on the two variables for every year in the time period of interest. If you do, that's great! If you don't (some years are missing), then you would need to use the option na.exclude = TRUE in your gls() model call.

In additional to the excellent comments you received so far from Frans and Rolland, I would suggest the following:

Fit your model m1<-gls(flow~ec, rivin) to your data (with care for whether or not you have any missing years, as explained above).
Produce ACF and PACF plots of the residuals associated with model m1 using the Acf() and Pacf() functions in the forecast package in order to get a sense for the nature of the temporal correlation present among its errors.
Apply the auto.arima() function to the model residuals to determine the order of the autoregressive process that could be used to describe this nature. This function will indicate whether the process AR(p), where p could be 1, 2, etc., captures the temporal correlation present in your data.
Let's say that the previous step produced a value of p = 1. Proceed to fit your model m2 with this value of p:

p <- 1

m2 <- update(m1,correlation=corARMA(p=p))
Compare the model m2 against model m1 using the anova() function to determine whether model m2 is an improvement over model m1. If it is, interpret the results produced by model m2. It will be helpful to produce confidence intervals for the regression model parameters using the intervals() function - see https://stat.ethz.ch/R-manual/R-devel/library/nlme/html/intervals.gls.html.

Note that model formulation for m2 in item 4. allows more flexibility in case p > 1, though it is equivalent to your current formulation when p = 1.

Assuming p = 1, you can interpret your results from model m2 along these lines:

We found a statistically significant negative association between flow and ec over the period of study (p < 0.001) after accounting for the AR(1) temporal correlation present in the errors of the linear regression model relating the two variables. Specifically, each 1-unit increase in the values of ec was associated with a decrease in the expected value of flow of 1.106 units (95% CI: ____ to ______).

Thank you very much for the responses. With regards to your questions about the data being stationary, the short answer is yes to all of them. There are no temporal trends or variability in either flow or the ec. I also do have data on all of the variables for all of the years assessed. Thank you for the responses it all helped me a great deal — Lizaan de Necker, Jul 30 '19 at 07:48
You’re welcome, @LizaandeNecker! Glad you found my responses helpful. — Isabella Ghement, Jul 30 '19 at 14:02

Interpreting the output from a gls model

1 Answers1