This question is both related to math and coding, and for transparency, I also posted it on Stack Overflow for the coding part. I need to modify a log regression when I change my y-axis data from y to 100-y (y is a percentage). I understand that log(100-y)~x is a different model to log(y)~x, but considering the shape of the curve f(x)=-exp(x), I would expect an exponential lm() model to work. It seems I am wrong, why? I went through the site, it seems to be a known (and more complicated than I thought) issue/question, reading here, and here for example. Even if I don't get the exact "reversed" model, why does lm(exp(100-y)~x not provide a decent estimate? I also understand that a log-linked gaussian GLM could make it, but I have issues to properly code it and plot. And it is less convenient as I cannot really have a r² if I understand properly.
I use the following reprex, the initial data are:
m=c(3,1,3,2,4,7,10,2,1,44,11,24,1,3,5,11,7,17,8,10,17,34,31,10,7,3,1,6,4,3,4,7,11,14,6,8,34,80,23,54,2,3,2,1,13,8,10,15,2,15,8,11,8,12,9,18,85)
q=c(91,78,72,84,74,76,81,80,70,130,131,136,53,56,57,111,60,80,78,76,84,107,85,76,57,57,45,50,68,61,52,59,60,60,51,63,95,93,85,97,52,39,34,40,70,124,90,68,39,81,50,60,74,70,59,53,98)
Age=c(rep("Young",24),rep("Old",33))
df=data.frame(Age,m,q)
After looking at the plot, I decided to test a linear regression for the "Young" category, and a log one for the "Old" category:
model_lin_df <- df %>% group_by(Age) %>% do(model = glance(lm(m ~ q,data = .))) %>% unnest(model)
> model_lin_df
# A tibble: 2 x 13
Age r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual nobs
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int>
1 Old 0.370 0.350 16.4 18.2 0.000172 1 -138. 282. 287. 8339. 31 33
2 Young 0.418 0.391 8.97 15.8 0.000646 1 -85.7 177. 181. 1771. 22 24
model_exp_df <- df %>% group_by(Age) %>% do(model = glance(lm(log(m) ~ q,data = .))) %>% unnest(model)
> model_exp_df
# A tibble: 2 x 13
Age r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual nobs
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int>
1 Old 0.517 0.502 0.776 33.2 0.00000242 1 -37.4 80.8 85.3 18.7 31 33
2 Young 0.362 0.333 0.916 12.5 0.00187 1 -30.9 67.8 71.4 18.5 22 24
Here is the plot with the fitted lines:
model_lin <- lm(m ~ Age/q + 0, df,na.action=na.exclude)
model_exp <- lm(log(m) ~ Age/q + 0, df,na.action=na.exclude)
df_plot <- ggplot(df, aes(x = q, y = m, shape=Age, color=Age)) + geom_point() +
geom_line(aes(y = ifelse(Age == "Young", fitted(model_lin), NA))) +
geom_line(aes(y = ifelse(Age == "Old", exp(fitted(model_exp)), NA))) +
theme_bw() + theme(panel.grid = element_blank())
The nice part is that it is a lm model, I can add the r² and p-value. But I realized that in fact, I should plot the y-axis as 100-y, the data would biologicaly make more sense:
s=100-m
df=cbind(df,s)
The linear model provides the same r² and p-value, which was intuitive:
model_lin_df2 <- df %>% group_by(Age) %>% do(model = glance(lm(s ~ q,data = .))) %>% unnest(model)
> model_lin_df2
# A tibble: 2 x 13
Age r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual nobs
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int>
1 Old 0.370 0.350 16.4 18.2 0.000172 1 -138. 282. 287. 8339. 31 33
2 Young 0.418 0.391 8.97 15.8 0.000646 1 -85.7 177. 181. 1771. 22 24
But I don't find how to "reverse" / back transform the exponential model. I would intuitively think that a -exp function would do it, but no:
model_exp_df2 <- df %>% group_by(Age) %>% do(model = glance(lm(100-exp(s) ~ q,data = .))) %>% unnest(model)
> model_exp_df2
# A tibble: 2 x 13
Age r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual nobs
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int>
1 Old 0.216 0.191 2.26e42 8.55 0.00640 1 -3264. 6534. 6539. 1.58e86 31 33
2 Young 0.114 0.0736 3.19e42 2.83 0.107 1 -2382. 4770. 4773. 2.24e86 22 24
model_lin2 <- lm(s ~ Age/q + 0, df,na.action=na.exclude)
model_exp2 <- lm((100-exp(s)) ~ Age/q + 0, df,na.action=na.exclude)
df_plot2 <- ggplot(df, aes(x = q, y = s, shape=Age, color=Age)) + geom_point() +
geom_line(aes(y = ifelse(Age == "Young", fitted(model_lin2), NA))) +
geom_line(aes(y = ifelse(Age == "Old", exp(fitted(model_exp2)), NA))) +
theme_bw() + theme(panel.grid = element_blank())
Is there a lm() model that could fit the data for the "Old" category? It would be great because it easily provides r² and p-value. If not, why, and is a glm with log link the only option? I am less used to that one, I cannot figure out how to plot it for one category so far. Any insight will be appreciated!