Heavy-tailed errors in mixed-effects model

Question

I'm relatively new to statistical modelling and `R', so please let me know If I should provide any further information/plots. I did originally post this question here, but unfortunately have not received any responses yet.

I am using the lme() function from nlme in R to test the significance of fixed effects of a repeated measures design. My experiment involves subjects listening to a pair of sounds and adjusting the sound level (in decibels) until both are equally loud. This is done for a 40 different pairs of stimuli, with both orders tested (A/B and B/A). There are a total of 160 observations per subject (this includes 1 replication for every condition), and 14 participants in all.

The model:

LevelDifference ~ pairOfSounds*order, random = (1|Subject/pairOfSounds/order)

I have built the model up by AIC/BIC and likelihood ratio tests (method = "ML"). Residual plots for the within-group errors are shown below:

Residual plots for mixed-effects model

The top left plot shows standardised residuals vs fitted values. I don't see any systematic pattern in the residuals, so I assume that the constant variation assumption is valid, although further inspection of the subject-by-subject residuals do show some unevenness. In conjunction with the top right plot, I have no reason to suspect non-linearities.

My main concern is the lower left qqplot which reveals that the residuals are heavy-tailed. I'm not sure where to go from here. From reading Pinheiro and Bates (2000, p. 180), the fixed-effects tests tend to be more conservative when the tails are symmetrically distributed. So perhaps I'm OK if the p-values are very low?

The level two and three random effects show a similar departure from normality.

Basically:

How do such heavy-tailed residuals effect the inference for fixed-effects?
Would robust regression be an appropriate alternative to check?

I hope this question gets some more attention. I have had a similar experience with analysing some data sets using linear mixed-effects models. In my case I had a roughly symmetric residual distribution with $\text{Kurt}(\hat{\boldsymbol{\epsilon}}) \approx 9$, which is substantially more heavy-tailed than the normal distribution. It might be possible to use a generalised-error distribution for the model, but that would require some additional programming. I welcome other responses to this question. — Ben, Feb 20 '18 at 23:56
A good place to start here would be to fit the model as it is in [brms](http://paul-buerkner.github.io/brms/reference/brm.html), and then to fit it again as a robust model, using Student's t distribution instead of a Gaussian for the residuals (by setting `family='student'`). Comparing the two models and their predictions will give a good idea of what the heavy tails are doing. — Eoin, Oct 01 '20 at 17:34
@Eoinisonthejobmarket I looked into a similar situation not long ago. I found that provided the residuals were approximately symmetrical, the estimates were unbiased and the predictions were good. However if you wanted to compute p values for the fixed effects (which I generally don't advise) then the standard errors would be wrong. — Robert Long, Oct 03 '20 at 13:53
@RobertLong, I've given this a go myself and posted the results below. It's interesting that even when the standard error of the fixed effects doesn't change much, the cumulative effects on estimates for each group are pretty notable. — Eoin, Oct 05 '20 at 09:26

Eoin · Answer 1 · 2020-10-05T09:21:45.433

I took my own advice and tried this with some simulated data. This isn't as complete an answer as it could be, but it's what's on my hard drive.

I simulated data...

...with heavy-tailed residuals ($\frac{\epsilon}{10} \sim \text{Student}(\nu=1)$).

If fit two models using brms, one with Gaussian residuals, and one with Student residuals.

m_gauss = brm(y ~ x + (x|subject), data=data, file='fits/m_gauss')
m_student = brm(y ~ x + (x|subject), data=data, family=student, 
                file='fits/m_student')

In the Gaussian model, the fixed effects estimates are reasonable but noisy, (see true parameters in simulation code below), but sigma, the standard deviation of the residuals was estimated to around 60.

summary(m_gauss)
# Family: gaussian 
# Links: mu = identity; sigma = identity 
# Formula: y ~ x + (x | subject) 
# Data: data (Number of observations: 100) 
# Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
# total post-warmup samples = 4000
# 
# Group-Level Effects: 
#   ~subject (Number of levels: 5) 
#                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept)       67.72     21.63    38.27   123.54 1.00     1445     2154
# sd(x)               21.72      7.33    11.58    40.11 1.00     1477     2117
# cor(Intercept,x)    -0.18      0.32    -0.73     0.48 1.00     1608     1368
# 
# Population-Level Effects: 
#           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept     2.90     19.65   -35.38    43.25 1.00     1959     2204
# x            -2.63      8.91   -19.36    16.19 1.00     1659     1678
# 
# Family Specific Parameters: 
#       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sigma    67.02      5.08    57.84    77.94 1.00     3790     3177

The Student model gives consistent parameter estimates, since the residuals are more-or-less symmetric, but with reduced standard errors. I was surprised by how small the change in standard error actually was here though. It correctly estimates $\sigma$ (10) and $\nu$ (degrees of freedom: 1) for the residuals.

summary(m_student)
# Family: student
# Links: mu = identity; sigma = identity; nu = identity
# Formula: y ~ x + (x | subject)
# Data: data (Number of observations: 100)
# Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
# total post-warmup samples = 4000
# 
# Group-Level Effects:
#   ~subject (Number of levels: 5)
#                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept)       57.54     18.26    33.76   103.21 1.00     1069     1677
# sd(x)               22.99      8.29    12.19    43.89 1.00     1292     1302
# cor(Intercept,x)    -0.20      0.31    -0.73     0.45 1.00     2532     2419
# 
# Population-Level Effects:
#           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept     2.26     17.61   -32.62    39.11 1.00     1733     1873
# x            -3.42      9.12   -20.50    15.38 1.00     1641     1263
# 
# Family Specific Parameters:
#       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sigma    10.46      1.75     7.34    14.23 1.00     2473     2692
# nu        1.24      0.18     1.01     1.68 1.00     2213     1412

Interestingly, the reduction in standard errors for both the fixed and random effects leads to a considerable increase in the precision of posterior predictive means (that is, uncertainty about the location of the regression line, shown below).

So, using a model that assumes Gaussian residuals when your data has heavy-tailed residuals inflates the standard error of your fixed effects, as well as of other parameters. This is largely because if the residuals are assumed to be Gaussian, they must come from a Gaussian distribution with a huge standard deviation, and so the data is treated as being very noisy.

Using a model that correctly specifies the heavy-tailed nature of the Gaussians (this is also what non-Bayesian robust regression does) largely solves this issue, and even though the standard errors for individual parameters don't change very much, the cumulative effect on the estimated regression line is considerable!

Homogeneity of Variance

It's worth noting that even though all the residuals were drawn from the same distribution, the heavy tails mean that some groups will have lots of outliers (e.g. group 4), while others won't (e.g. group 2). Both models here assume that the residuals have the same variance in each group. This causes additional problems for the Gaussian model, since it's forced to conclude that even group 2, where the data are close to the regression line, has high residual variance, and so greater uncertainty. In other words, the presence of outliers in some groups, when not properly modelled using robust, heavy-tailed residual distribution, increases uncertainty even about groups without outliers.

Code

library(tidyverse)
library(brms)
dir.create('fits')
theme_set(theme_classic(base_size = 18))

# Simulate some data
n_subj = 5
n_trials = 20
subj_intercepts = rnorm(n_subj, 0, 50) # Varying intercepts
subj_slopes     = rnorm(n_subj, 0, 20) # Varying slopes

data = data.frame(subject   =   rep(1:n_subj, each=n_trials),
                  intercept = rep(subj_intercepts, each=n_trials),
                  slope     = rep(subj_slopes, each=n_trials)) %>%
  mutate(
    x = rnorm(n(), 0, 10),
    yhat = intercept + x*slope)

residuals = rt(nrow(data), df=1) * 10
hist(residuals, breaks = 50)
data$y = data$yhat + residuals

ggplot(data, aes(x, y, color=factor(subject))) +
  geom_point() +
  stat_smooth(method='lm', se=T) +
  labs(x='x', y='y', color='Group') +
  geom_hline(linetype='dashed', yintercept=0)


m_gauss = brm(y ~ x + (x|subject), data=data, file='fits/m_gauss')
m_student = brm(y ~ x + (x|subject), data=data,
                family=student, file='fits/m_student')
summary(m_gauss)
summary(m_student)

fixef(m_gauss)
fixef(m_student)

pred_gauss = data.frame(fitted(m_gauss))
names(pred_gauss) = paste0('gauss_', c('b', 'se', 'low', 'high'))

pred_student = data.frame(fitted(m_student))
names(pred_student) = paste0('student_', c('b', 'se', 'low', 'high'))

pred_df = cbind(data, pred_gauss, pred_student) %>%
  arrange(subject, x)

ggplot(pred_df, aes(x, gauss_b, 
                    ymin=gauss_low, ymax=gauss_high,
                    color=factor(subject),
                    fill=factor(subject))) +
  geom_path() + geom_ribbon(alpha=.2) +
  labs(title='Gaussian Model', color='Subject', fill='Subject', 
       y='Estimates')


ggplot(pred_df, aes(x, student_b, 
                    ymin=student_low, ymax=student_high,
                    color=factor(subject),
                    fill=factor(subject))) +
  geom_path() + geom_ribbon(alpha=.2) +
  labs(title='Heavy-tailed (Student) Model', color='Subject', fill='Subject', 
       y='Estimates')

Very cool! +1 for this, a good a answer as any on what the consequences of this is on CV imo. — sleepy, Oct 05 '20 at 01:16
This is nice, although the Cauchy/$t_1$-distribution is pathological due to non-existing variances and even expected values. The residuals in the question don't look as bad as this, so the case treated here seems worse than what appears in the question. Interesting and probably helpful anyway. Note by the way that in the absence of information of what the actual distribution is, the wider uncertainty ranges under the Gaussian model may actually be appropriate. — Christian Hennig, Oct 05 '20 at 10:32
Very true. I went with such an extreme distribution to make the effects as clear as possible. It's also interesting that brms does such a good job of estimating $\nu$ even in this pathological case. — Eoin, Oct 05 '20 at 10:36
(+1) excellent answer, I will play with your code a little later. BTW check your LinkedIn :) — Robert Long, Oct 05 '20 at 13:04

score 2 · Answer 2 · answered Oct 05 '20 at 10:41

Looking at models based on the t-distribution is potentially helpful as others wrote. However one reason to use the Gaussian assumption is that the Gaussian distribution minimises the Fisher-Information for given variance. This means that Gaussian parameters cannot be as precisely estimated as parameters of other distributions given the knowledge of the distribution. This is a good thing in the sense that using the Gaussian distribution is normally a conservative choice, as is also shown in the answer by Eoin. The larger uncertainty indicated by the Gaussian model may actually be realistic given that we don't know what the true distribution is.

Note that this argument technically doesn't apply to the $t_1$- and $t_2$-distribution, because these don't have existing variances. However your residuals do not look quite as evil as residuals of these tend to look.

If you run regressions, fitting Gaussians to heavy tailed distributions (even $t_1$ or $t_2$) will normally make confidence intervals and p-values larger, so if you have small p-values, these should still be reliable. It is far more dangerous to have outliers/heavy tails in the explanatory variables ("leverage points"), but as far as I understand your experiment, you don't have those.

Heavy-tailed errors in mixed-effects model

2 Answers2

Homogeneity of Variance

Code

Linked