R: emmeans back tranform clr data using clrInv

Question

I have a set of data that I am transforming using the clr function

library(compositions)
clr(my_data)

Now I used lmer to build mixed effect linear models and I am extracting the estimated means and the contrasts using emmeans. I am aware of the options that can be used to back-transform the data

e.g.

  model.rg <- update(ref_grid(model), tran =  "asin.sqrt")
  emm1<-emmeans(model.rg, specs = ~ drug:age:time, type = "response")

However, I cannot figure out how to back-transform the data CLR transformed. Any idea how could I achieve the back-transformation within emmeans?

Thank you

thank you, the point is that I need to account for the geometric mean of the data. I might create my own function but then I am not sure how to provide this to `tran`. — efrem, Nov 09 '20 at 11:45
Thanks, I just saw it, quite a complex task. I would need o basically write a `link` function myself and then parse into `tran`. I guess something like: https://stackoverflow.com/questions/15931403/modify-glm-function-to-adopt-user-specified-link-function-in-r — efrem, Nov 09 '20 at 13:16
I guess my problem is that I do not quite get how to implement the back transformation similarly to what Ben Bolker did in the link above — efrem, Nov 09 '20 at 14:49
The CLR is described at https://stats.stackexchange.com/a/259223/919: "the logs of the data in any observation are centered by subtracting their mean." — whuber, Nov 09 '20 at 16:25
I have deleted my "answer" because it doesn't make sense to apply the CLR transform to just one response variable. If applied to a multivariate response representing a composition, *all information about factors other than the compositions* is lost, so it makes no sense to try to recover information on individual response means. You can set `tran = "log"` and then with `type = "response"` you can use that to estimate the ratios of the various components. Nothing much else makes sense. — Russ Lenth, Nov 09 '20 at 17:26
Absolutely, true, I was just trying to write the link functions myself, and I realized this makes no sense at all! Thank you very much for looking into this. — efrem, Nov 09 '20 at 17:36

Russ Lenth · Answer 1 · 2020-11-12T17:56:57.927

It occurs to me that you can sort of have your cake and eat it too, by playing tricks with offsets. Suppose that your multivariate response (reflecting compositions) is in the matrix data$Y, and there are additional factors treat and block in play, and we don't care about block. We can do something like this:

library(emmeans)
logmean <- apply(log(data$Y), 1, mean)
mod <- lm(log(Y) ~ treat + block + offset(logmean), data = data)
emm <- emmeans(mod, ~ treat * comp, mult.name = "comp")

### Obtain geometric means. The offset is included in the estimates
summary(emm, by = "comp", type = "response")

### Contrast the CLR-transformed components for each treat
# doesn't work but maybe shoul: emm.clr <- update(emm, offset = 0)
emm.clr <- emm <- emmeans(mod, ~ treat * comp, mult.name = "comp", offset = 0)

pairs(emm.clr, by = "treat")   # diffs on CLR scale
pairs(emm.clr, by = "treat", type = "response")   # ratios

In making emm.clr, we set offset = 0 which ignores the offsets, effectively creating the CLR-transformed predictions.

These antics do not require the compositions package because we've done that part manually.

Data example

I tried this with a re-framing of the WhiteCells dataset in compositions, creating a 60-row dataset with factors for method and sample in the above roles. Everything comes out sensibly:

> confint(emm, type="r")
 method comp response       SE df lower.CL upper.CL
 i      G      0.6124 0.009054 29   0.5942   0.6312
 m      G      0.5629 0.008322 29   0.5461   0.5802
 i      L      0.1838 0.002782 29   0.1782   0.1896
 m      L      0.2054 0.003109 29   0.1992   0.2119
 i      M      0.0444 0.000402 29   0.0435   0.0452
 m      M      0.0432 0.000391 29   0.0424   0.0440

Results are averaged over the levels of: sample 
Confidence level used: 0.95 
Intervals are back-transformed from the log scale 

> confint(emm.clr, type="r")
 method comp response      SE df lower.CL upper.CL
 i      G       3.583 0.05297 29    3.476    3.693
 m      G       3.293 0.04869 29    3.195    3.394
 i      L       1.076 0.01628 29    1.043    1.109
 m      L       1.202 0.01819 29    1.165    1.240
 i      M       0.260 0.00235 29    0.255    0.264
 m      M       0.253 0.00229 29    0.248    0.257

Results are averaged over the levels of: sample 
Confidence level used: 0.95 
Intervals are back-transformed from the log scale 

> pairs(emm.clr, type = "response", by = "method")
method = i:
 contrast ratio     SE df t.ratio p.value
 G / L     3.33 0.0950 29  42.196 <.0001 
 G / M    13.81 0.2663 29 136.109 <.0001 
 L / M     4.14 0.0832 29  70.795 <.0001 

method = m:
 contrast ratio     SE df t.ratio p.value
 G / L     2.74 0.0781 29  35.347 <.0001 
 G / M    13.03 0.2514 29 133.126 <.0001 
 L / M     4.76 0.0955 29  77.656 <.0001 

Results are averaged over the levels of: sample 
P value adjustment: tukey method for comparing a family of 3 estimates 
Tests are performed on the log scale 

> ### Might as well average over method:
> emm.clrc = emmeans(emm.clr, "comp")
> pairs(emm.clrc, type = "r")
 contrast ratio     SE df t.ratio p.value
 G / L     3.02 0.0609 29  54.831 <.0001 
 G / M    13.41 0.1829 29 190.378 <.0001 
 L / M     4.44 0.0631 29 104.971 <.0001 

Results are averaged over the levels of: sample, method 
P value adjustment: tukey method for comparing a family of 3 estimates 
Tests are performed on the log scale 

> ### Check on G/M = G/L * L/M:
> 3.02*4.44
[1] 13.4088

So we have about 3 times as much G as L, and about 4.4 times as much L as M

R: emmeans back tranform clr data using clrInv

1 Answers1

Data example