Interpreting lm summary in R?

Question

Consider a dataframe ("df") with three variables (Happiness, Smoke, Depression), where (1) Happiness (DV) = continuous measure of happiness on 1-10 scale, (2) Smoke (IV1) = categorical variable of whether the person smokes (yes/no), and (3) Depression (IV2) = continuous measure of depression on 1-10 scale.

Happiness <- c(1, 2, 5, 6, 2, 7, 7, 3, 8, 9)
Smoke <- c("yes", "yes", "no", "no", "yes", "no", "yes", "yes", "no", "no")
Depression <- c(6, 8, 2, 1, 10, 4, 5, 1, 2, 3)
df <- data.frame(Happiness, Smoke, Depression)

Suppose I want to test whether Smoke x Depression interaction predicts Happiness (in other words, if the interaction between two Independent Variables predicts the Dependent Variable). So I use this formula:

summary(lm(data = df, Happiness ~ Smoke*Depression))

which gives me this:

Call:
lm(formula = Happiness ~ Smoke * Depression, data = df)

Residuals:
Min      1Q  Median      3Q     Max 
-2.0000 -1.0460 -0.3788  0.8905  3.7826 

Coefficients:
                    Estimate Std. Error t value Pr(>|t|)  
(Intercept)           5.6154     2.4745   2.269   0.0637 .
Smokeyes             -1.3110     3.2748  -0.400   0.7028  
Depression            0.5769     0.9489   0.608   0.5655  
Smokeyes:Depression  -0.7943     1.0011  -0.793   0.4578  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.164 on 6 degrees of freedom
Multiple R-squared:  0.6098,    Adjusted R-squared:  0.4147 
F-statistic: 3.125 on 3 and 6 DF,  p-value: 0.1092

I need help interpreting this result.

Is it correct to use Smoke*Depression instead of Smoke + Depression, or Smoke:Depression if I want to see the main effect of each independent variable and their interaction?
Do the values under Pr(>|t|) mean the significance of main effect for each variable?
If so, how do I test the main effect of non-smokers? (ie, why is there only "Smokeyes" and no "Smokeno"?)
What does Smokeyes:Depression indicate? I am suspecting that it means the interaction between Smoke and Depression. If so, how is Pr(>|t|) of this different from the p-value?

Any help would be much appreciated. Thank you!

You can use ` lm(formula = Happiness ~ Smoke * Depression + 0, data = df)` if you want separate columns for Smokeno and Smokeyes but then you won't get an intercept. — G. Grothendieck, Feb 10 '21 at 23:32
Thanks! What does the p-value under F-statistic indicate? (0.1092) — , Feb 12 '21 at 03:34

score 0 · Answer 1 · answered Feb 10 '21 at 21:58

Regarding your questions:

yes
yes, these are p-values
Smoke has been coded into a dummy variable smokeYes, which is 1 when smoke is "yes" and 0 with "no"
Yes it is the interaction and the p-value.

Hope this helped quickly.

score 0 · Answer 2 · answered Feb 10 '21 at 22:06

Your model output shows quite high p-values for all your variables indicating that none of your variables (smoking, depression and the interaction between smoking and depression) is significant.

to answer your specific questions:

smoke*depression and smoke + depression + smoke:depression is the same thing. if you use interaction terms, you should also have the individual terms in the model.
The Pr(>|t|) are the p-values and p-values > 0.05 indicate the variables are not statistically significant
You only have smoke yes because smoke is a factor variable and so it has been coded as 1 for yes and 0 for no. This means the intercept accounts for smokers no, which has a p-value of 0.06, meaning that although it is not statistically significant at 95% confidence, it is at 90%. If you want to have both smoke yes and smoke no explicitly in results, you can code up the to levels as 1 for smoke yes and -1 for smoke no but then the interpretation of the intercept will change.
You are correct about point 4, it is the interaction between smoke yes and depression and the p-value indicates the significance of the interaction. Smoke no is coded 0 so 0*depression = 0 and so not explicitly in the model

Thanks for your reply! This might be a silly question, but when there is a main effect -- for example, if I were to find a significant main effect of Smoke on Happiness, would I say that "Smoke significantly predicted Happiness"? It sounds right but then I think that "predicted" is also a jargon that is not appropriate in some contexts... — , Feb 12 '21 at 04:05

Interpreting lm summary in R?

2 Answers2