I have an outcome variable that is right skewed, so I log transformed it. I made a null model with only the log-transformed outcome variable, but when I exponentiate the estimate, it does not equal the mean.
Concerned it was issues with my data, I made a sample data set and found the same discrepancy. Why is this? What does the intercept represent in this model?
Here is the sample data and R code:
library(tidyverse)
test <- tibble(salary = c(10000, 23244, 2222222, 2353, 2353463, 5464564),
perf = c(4, 2, 4, 2, 5, 7))
Here's my null model:
summary(lm(log(salary) ~ 1 , data = test))
The intercept equals 11.971, which when I use exp(11.971), I get 158102.7:
exp(11.971)
But the mean is 1679308:
mean(test$salary)
And, as a sanity check, when I don't log transform the outcome, the intercept does produce the mean:
summary(lm(salary ~ 1 , data = test))
I'd appreciate 1) how to interpret the intercept, 2) why it doesn't equal the mean, and 3) how I could get non-log predictions from this model.