6

I read, that there are many methods of determining the degrees of freedom, thus calculating the p-values for fixed effects in mixed models. I read, that the worst is the Wald test and the Log-Likelihood test is a little bit better. So many names. I'm lost a bit.

  1. So, when I get an output of a mixed model, in any statistical package, I get the list of coefficients with its p-values. Are they Wald's?

  2. When I do ANOVA on such model output, it performs a joint test on all coefficients belonging to a single effects, thus, ANOVA gives me p-value for the main effects, one by one. In certain packages, like R or SAS one can choose among: LRT, Kenward-Roger, Satthertwhite, F test, Chi2 test. Which one is the Wald? Is this the Chi2 test? Is LRT the F test? Is the Kenward-Roger a "small-data adjustement" to the Wald?

  3. Briefly - which of them (Wald, t-test, F test, Chi2 test, LRT) refer to the model coefficients and which of them refer to the main effects? And which can, if any, refer to both?

Example: Software-agnostic report of the coefficients of the fixed part:

Intercept      Estimate   SE    t or z  P-value    <- are these Wald, right, not LRT?
Coeff 1        Estimate   SE    t or z  p-value
Coeff 2        Estimate   SE    t or z  p-value
Interactions   Estimate   SE    t or z  p-value

etc.

Software-agnostic report of ANOVA on the model, to get the main effects:

Main Effect 1    Estimate  SE   some_statistic  some_p_value  <- are these F/LRT? Chi2/Wald? (corrected with KR, for example)
Main Effect 2    Estimate  SE   some_statistic  some_p_value
Interaction      Estimate  SE   some_statistic  some_p_value
Damasco
  • 163
  • 4
  • 1
    Thank you for the hints. I updated the question with examples. I am afraid I cannot put a dump from any statistical software, because such questions are closed as "related to a software" and people are sent to StackOverflow. My question is software-agnostic. – Damasco Jan 08 '20 at 23:03
  • 1
    Welcome to CV, Domnisoara. Your edits are indeed an improvement. – Alexis Jan 09 '20 at 00:39
  • 1
    There is no problem including output from software in a question provided that the question is about a substantive statistical matter. – Robert Long Jan 10 '20 at 11:24
  • I don't think the output can be software-agnostic. Different software does different things. – Peter Flom Jan 10 '20 at 12:09
  • Dear Peter Flom and others, you close the questions again and again, yet in the meantime, before you manage to close it up, others give brilliant answers, like Ben Bolker below. He could - and he tried to answer it in a beautiful way. You - just close. Please, rethink what you do, because this way this place loses its usefulness. When people provide details from a statistical package - you close as "too much software related". When one just asks about general ideas - you close because it "cannot be software agnostic" (yet 99% of the software reports it the same way). Please, let others help. – Damasco Jan 15 '20 at 13:52

1 Answers1

2

1 . So, when I get an output of a mixed model, in any statistical package, I get the list of coefficients with its p-values. Are they Wald's?

Yes, generally they are. They may be $Z$-statistics/tests (i.e., assuming that the sample is big enough so the standard errors have no uncertainty) or $t$-statistics (allowing for the uncertainty in std err due to finite sample size); this is usually indicated in the column names (and by the appearance of a "df" or "ddf" [(denominator) degrees of freedom] column in the output).

In your second case (results of "ANOVA"), it's hard to know without reading the documentation exactly what tests are being done. It might be either Wald or LRT and might do some sort of finite-size correction or not (see details under #2).

  1. When I do ANOVA on such model output, it performs a joint test on all coefficients belonging to a single effects, thus, ANOVA gives me p-value for the main effects, one by one. In certain packages, like R or SAS one can choose among: LRT, Kenward-Roger, [Sattherthwaite], F test, Chi2 test. Which one is the Wald? Is this the Chi2 test? Is LRT the F test? Is the Kenward-Roger a "small-data adjustment" to the Wald?

This is a little complicated.

  • Wald tests in general assume the log-likelihood surface is quadratic.
    • They may ignore the finiteness of the data set, in particular the uncertainty associated with nuisance parameters such as the residual standard deviation (in which case they are "Wald chi-square tests", because the test statistic is $\chi^2$ (or scaled $\chi^2$) distributed in this case
    • If they take the finiteness of the data set into account, they are "F tests" ($F$-distributed test statistic)
      • if the experimental design is balanced and nested, the denominator degrees of freedom (df) for the F-statistic can be computed exactly
      • if not, then some approximation such as Satterthwaite or Kenward-Roger must be used (so the answer to your question "is the K-R a 'small-data adjustment' to the Wald?" is "yes")
  • The likelihood ratio test (also "LRT", or (also) "Chi2", because the test statistic of a LRT is $\chi^2$-distributed: in R, if the output says just "Chi2" and not "Wald Chi2" it's probably a LRT) accounts for the non-quadratic shape of the log-likelihood surface, but not the uncertainty of nuisance parameters due to finite size. Finite-size corrections to the LRT are complicated and rarely used.
  1. Briefly - which of them (Wald, t-test, F test, Chi2 test, LRT) refer to the model coefficients and which of them refer to the main effects? And which can, if any, refer to both?

I'm guessing that what by "coefficients" you mean tests of single coefficients (e.g. the slope in a regression models) vs. joint tests of multiple coefficients simultaneously equaling zero (e.g. the effect of a categorical predictor with >2 levels). t- and Z-tests specifically refer to single coefficients. F and Chi2 essentially test sums of squares of scaled coefficients, so can refer to single or multiple-coefficient tests. Wald and LRT refer to assumptions about the shape of the log-likelihood surface, so are not specific to single- or multiple-coefficient tests.

See also: GLMM FAQ on denominator df; the "pvalue" help page for lme4; How can I obtain z-values instead of t-values in linear mixed-effect model (lmer vs glmer)?

Corrections and comments welcome.

Ben Bolker
  • 34,308
  • 2
  • 93
  • 126