How to understand the vertical bar (pipe) in R formulas

Question

I came upon this because I wanted to emulate Welch's t-test using gls. I found the answer here:

https://stats.stackexchange.com/a/144480/141304

and it says to add weights with

gls(y ~ group, data = dat, weights=varIdent(form = ~ 1 | group))

y and group are variables in the model. I don't know what form is. I read through help on gls, glm, weights, etc. but couldn't find anything that addressed the issue.

Some tutorials on R formulas filled me in that the pipe means conditioning, just like in probability. I understand conditioning in probability, but I can't wrap my head around what it means in regression.

Suppose I have four predictor variables A, B, C, D and a response variable X. A and B are continuous; C and D are categorical with two levels.

What would formulas such as the ones below (or any other ones an answerer might want to explain) mean?

X ~ A + A|B
X ~ A + B|C
X ~ A + B + C|D

score 5 · Answer 1 · answered Jul 09 '19 at 02:54

Assume there are only two groups: group 1 and group 2. The gls() call you specified fits two sub-models to your $y$ observations - one sub-model for the $y$ observations in the first group and another sub-model for the $y$ observations in the second group.

The sub-model for the observations $y$ in group 1 postulates that $y = \beta_0 + \epsilon$, where $\epsilon$ denotes a random error term coming from a normal distribution with mean 0 and unknown variance $\sigma_1^2$. In other words, these observations are grouped about the true group mean $\beta_0$, with their spread about this true group mean being captured by $\sigma_1^2$.

The sub-model for the observations y in group 2 postulates that $y = \beta_0 + \beta_1 + \epsilon$, where $\epsilon$ denotes a random error term coming from a normal distribution with mean 0 and unknown variance $\sigma_2^2$. In other words, these observations are grouped about the true group mean $\beta_0 + \beta_1$, with their spread about this true group mean being captured by $\sigma_2^2$.

The gls() call you provided allows the spread (or variability) of the y values in the two groups about their respective true group means to be different across groups (that is, it allows $\sigma_1^2$ to be different from $\sigma_2^2$) via the option weights=varIdent(form = ~ 1 | group).

How to understand the vertical bar (pipe) in R formulas

1 Answers1