Interpretation of R's lm() output

Question

The help pages in R assume I know what those numbers mean, but I don't. I'm trying to really intuitively understand every number here. I will just post the output and comment on what I found out. There might (will) be mistakes, as I'll just write what I assume. Mainly I'd like to know what the t-value in the coefficients mean, and why they print the residual standard error.

Call:
lm(formula = iris$Sepal.Width ~ iris$Petal.Width)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.09907 -0.23626 -0.01064  0.23345  1.17532

This is a 5-point-summary of the residuals (their mean is always 0, right?). The numbers can be used (I'm guessing here) to quickly see if there are any big outliers. Also you can already see it here if the residuals are far from normally distributed (they should be normally distributed).

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)       3.30843    0.06210  53.278  < 2e-16 ***
iris$Petal.Width -0.20936    0.04374  -4.786 4.07e-06 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Estimates $\hat{\beta_i}$, computed by least squares regression. Also, the standard error is $\sigma_{\beta_i}$. I'd like to know how this is calculated. I have no idea where the t-value and the corresponding p-value come from. I know $\hat{\beta}$ should be normal distributed, but how is the t-value calculated?

Residual standard error: 0.407 on 148 degrees of freedom

$\sqrt{ \frac{1}{n-p} \epsilon^T\epsilon }$, I guess. But why do we calculate that, and what does it tell us?

Multiple R-squared: 0.134,  Adjusted R-squared: 0.1282

$ R^2 = \frac{s_\hat{y}^2}{s_y^2} $, which is $ \frac{\sum_{i=1}^n (\hat{y_i}-\bar{y})^2}{\sum_{i=1}^n (y_i-\bar{y})^2} $. The ratio is close to 1 if the points lie on a straight line, and 0 if they are random. What is the adjusted R-squared?

F-statistic: 22.91 on 1 and 148 DF,  p-value: 4.073e-06

F and p for the whole model, not only for single $\beta_i$s as previous. The F value is $ \frac{s^2_{\hat{y}}}{\sum\epsilon_i} $. The bigger it grows, the more unlikely it is that the $\beta$'s do not have any effect at all.

residuals are not so badly deviating from normality, why do you think so? — nico, Dec 04 '10 at 13:14
@nico: I think @Alexx Hardt was speaking hypothetically. I.e. once *could* use the five number summary to see if residuals were deviating from normal — Gavin Simpson, Dec 04 '10 at 13:39
@Gavin Simpson: you're right, I misread the sentence. Disregard my previous comment. — nico, Dec 04 '10 at 14:34
Minor quibble: You cannot say anything about normality or non-normality based on those 5 quantiles alone. All you can say based on that summary is whether the estimated residuals are approximately symmetric around zero. You could divide the reported quantiles by the estimated residual standard error and compare these values to the respective quantiles of the N(0,1), but looking at a QQ-plot probably makes more sense. — fabians, Dec 06 '10 at 09:29
One note here: the model $F$ is not $SS_{model} / SS_{error}$, rather it is $MS_{model} / MS_{error}$. $F$ is described correctly in the answer below, but it does not explicitly mention that it is mischaracterized in the question, so someone might not notice the discrepancy. — gung - Reinstate Monica, Aug 23 '12 at 14:28

score 231 · Accepted Answer · edited Apr 03 '19 at 15:11

Five point summary

yes, the idea is to give a quick summary of the distribution. It should be roughly symmetrical about mean, the median should be close to 0, the 1Q and 3Q values should ideally be roughly similar values.

Coefficients and $\hat{\beta_i}s$

Each coefficient in the model is a Gaussian (Normal) random variable. The $\hat{\beta_i}$ is the estimate of the mean of the distribution of that random variable, and the standard error is the square root of the variance of that distribution. It is a measure of the uncertainty in the estimate of the $\hat{\beta_i}$.

You can look at how these are computed (well the mathematical formulae used) on Wikipedia. Note that any self-respecting stats programme will not use the standard mathematical equations to compute the $\hat{\beta_i}$ because doing them on a computer can lead to a large loss of precision in the computations.

$t$-statistics

The $t$ statistics are the estimates ($\hat{\beta_i}$) divided by their standard errors ($\hat{\sigma_i}$), e.g. $t_i = \frac{\hat{\beta_i}}{\hat{\sigma_i}}$. Assuming you have the same model in object modas your Q:

> mod <- lm(Sepal.Width ~ Petal.Width, data = iris)

then the $t$ values R reports are computed as:

> tstats <- coef(mod) / sqrt(diag(vcov(mod)))
(Intercept) Petal.Width 
  53.277950   -4.786461

Where coef(mod) are the $\hat{\beta_i}$, and sqrt(diag(vcov(mod))) gives the square roots of the diagonal elements of the covariance matrix of the model parameters, which are the standard errors of the parameters ($\hat{\sigma_i}$).

The p-value is the probability of achieving a $|t|$ as large as or larger than the observed absolute t value if the null hypothesis ($H_0$) was true, where $H_0$ is $\beta_i = 0$. They are computed as (using tstats from above):

> 2 * pt(abs(tstats), df = df.residual(mod), lower.tail = FALSE)
 (Intercept)  Petal.Width 
1.835999e-98 4.073229e-06

So we compute the upper tail probability of achieving the $t$ values we did from a $t$ distribution with degrees of freedom equal to the residual degrees of freedom of the model. This represents the probability of achieving a $t$ value greater than the absolute values of the observed $t$s. It is multiplied by 2, because of course $t$ can be large in the negative direction too.

Residual standard error

The residual standard error is an estimate of the parameter $\sigma$. The assumption in ordinary least squares is that the residuals are individually described by a Gaussian (normal) distribution with mean 0 and standard deviation $\sigma$. The $\sigma$ relates to the constant variance assumption; each residual has the same variance and that variance is equal to $\sigma^2$.

Adjusted $R^2$

Adjusted $R^2$ is computed as:

$$1 - (1 - R^2) \frac{n - 1}{n - p - 1}$$

The adjusted $R^2$ is the same thing as $R^2$, but adjusted for the complexity (i.e. the number of parameters) of the model. Given a model with a single parameter, with a certain $R^2$, if we add another parameter to this model, the $R^2$ of the new model has to increase, even if the added parameter has no statistical power. The adjusted $R^2$ accounts for this by including the number of parameters in the model.

$F$-statistic

The $F$ is the ratio of two variances ($SSR/SSE$), the variance explained by the parameters in the model (sum of squares of regression, SSR) and the residual or unexplained variance (sum of squares of error, SSE). You can see this better if we get the ANOVA table for the model via anova():

> anova(mod)
Analysis of Variance Table

Response: Sepal.Width
             Df  Sum Sq Mean Sq F value    Pr(>F)    
Petal.Width   1  3.7945  3.7945   22.91 4.073e-06 ***
Residuals   148 24.5124  0.1656                      
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The $F$s are the same in the ANOVA output and the summary(mod) output. The Mean Sq column contains the two variances and $3.7945 / 0.1656 = 22.91$. We can compute the probability of achieving an $F$ that large under the null hypothesis of no effect, from an $F$-distribution with 1 and 148 degrees of freedom. This is what is reported in the final column of the ANOVA table. In the simple case of a single, continuous predictor (as per your example), $F = t_{\mathrm{Petal.Width}}^2$, which is why the p-values are the same. This equivalence only holds in this simple case.

This will take some time and playing around with R to understand. Big thanks for now, I might follow up with some questions at some point :) — Alexander Engelhardt, Dec 04 '10 at 13:42
Nice job. One thing you might clarifiy, with regard to calculating t values: sqrt(diag(vcov(mod))) produces the SE of the estimates. These are the same SEs that are output in the model summary. Easier and clearer just to say that t = Estimate/SEestimate. In that sense it is no different that any other t value. — Brett, Dec 04 '10 at 14:49
(+1) This is great. The only thing I'd add is that the $F$ value is the same as $t^2$ for the slope (which is why the p values are the same). This - of course - isn't true with multiple explanatory variables. — , Dec 04 '10 at 15:05
@Jay; thanks. I thought about mentioning that equivalence too. Wasn't sure if it was too much detail or not? I'll ad something on this in a mo. — Gavin Simpson, Dec 04 '10 at 15:43
@Brett; thanks. I've tried to clarify this a bit above as per your comment. — Gavin Simpson, Dec 04 '10 at 15:54
Am I wrong in thinking that a less confusing name for "residual standard error" would be "residual standard deviation"? — Rasmus Bååth, Dec 29 '13 at 23:26
"will not use the standard mathematical equations to compute" What will they use? — SmallChess, Jan 08 '15 at 03:54
@StudentT R uses a [QR decomposition](http://en.wikipedia.org/wiki/QR_decomposition) to avoid inverting a matrix which would be required if it just literally did the matrix algebra of the matrix equations shown in text books. Computers only do floating point arithmetic and that can cause issues in some circumstances so algorithms have been developed to minimise those issues. — Gavin Simpson, Jan 08 '15 at 04:02
In the line "The p-value is the probability of achieving .... They are computed as (using tstats from above)" you are writting that the null is $H_{0}: \hat{\beta}_i=0$, but in fact we are not testing whether the estimator is zero, but the true parameter. So it should be $H_{0}: \beta_i=0$, right? — Michael L., Jan 16 '18 at 07:58
@MichaelL. Right; we assume the true value of $\beta_i$ is zero and we know that the estimator ($\hat{\beta}_i$) is not zero because we just computed a non-zero value. Knowing me I was probably being lazy and copying and pasting latex around in the answer and forgot to correct this. — Gavin Simpson, Jan 16 '18 at 13:49

Steve Lihn · Answer 2 · 2021-09-16T13:03:26.477

2

Ronen Israel and Adrienne Ross (AQR) wrote a very nice paper on this subject: Measuring Factor Exposures: Uses and Abuses.

To summarize (see: p. 8),

Generally, the higher the $R^2$ the better the model explains portfolio returns.
When the t-statistic is greater than two, we can say with 95% confidence (or a 5% chance we are wrong) that the beta estimate is statistically different than zero. In other words, we can say that a portfolio has significant exposure to a factor.

R's lm() summary calculates the p-value Pr(>|t|). The smaller the p-value is, the more significant the factor is. P-value = 0.05 is a reasonable threshold.

edited Sep 16 '21 at 13:03

answered Nov 03 '17 at 18:29

Steve Lihn

137
2

7

The kinds of misstatements in this paper, exemplified by "When the t-statistic is greater than two, we can say (with ... a 5% chance we are wrong) that the beta estimate is statistically different from zero" [at p. 11], are discussed at https://stats.stackexchange.com/questions/311763 and https://stats.stackexchange.com/questions/26450. – whuber Nov 03 '17 at 18:48
Link to the paper seems to be dead now. – Pake Jul 30 '21 at 18:00
1

@Pake fixed link – Steve Lihn Sep 16 '21 at 13:04