Is there any difference between lm and glm for the gaussian family of glm?

Question

Specifically, I want to know if there is a difference between lm(y ~ x1 + x2) and glm(y ~ x1 + x2, family=gaussian). I think that this particular case of glm is equal to lm. Am I wrong?

Yes and no. As a statistical model, no. As a fitted object in R, yes; different returned objects, different algorithm used. — Gavin Simpson, Nov 10 '15 at 18:39
It seems to me there is a statistical question here, as well as an R coding one. — Silverfish, Nov 10 '15 at 19:32

Glen_b · Accepted Answer · 2019-07-24T23:50:46.587

53

While for the specific form of model mentioned in the body of the question (i.e. lm(y ~ x1 + x2) vs glm(y ~ x1 + x2, family=gaussian)), regression and GLMs are the same model, the title question asks something slightly more general:

Is there any difference between lm and glm for the gaussian family of glm?

To which the answer is "Yes!".

The reason that they can be different is because you can also specify a link function in the GLM. This allows you to fit particular forms of nonlinear relationship between $y$ (or rather its conditional mean) and the $x$-variables; while you can do this in nls as well, there's no need for starting values, sometimes the convergence is better (also the syntax is a bit easier).

Compare, for example, these models (you have R so I assume you can run these yourself):

x1=c(56.1, 26.8, 23.9, 46.8, 34.8, 42.1, 22.9, 55.5, 56.1, 46.9, 26.7, 33.9, 
37.0, 57.6, 27.2, 25.7, 37.0, 44.4, 44.7, 67.2, 48.7, 20.4, 45.2, 22.4, 23.2, 
39.9, 51.3, 24.1, 56.3, 58.9, 62.2, 37.7, 36.0, 63.9, 62.5, 44.1, 46.9, 45.4, 
23.7, 36.5, 56.1, 69.6, 40.3, 26.2, 67.1, 33.8, 29.9, 25.7, 40.0, 27.5)

x2=c(12.29, 11.42, 13.59, 8.64, 12.77, 9.9, 13.2, 7.34, 10.67, 18.8, 9.84, 16.72, 
10.32, 13.67, 7.65, 9.44, 14.52, 8.24, 14.14, 17.2, 16.21, 6.01, 14.23, 15.63, 
10.83, 13.39, 10.5, 10.01, 13.56, 11.26, 4.8, 9.59, 11.87, 11, 12.02, 10.9, 9.5, 
10.63, 19.03, 16.71, 15.11, 7.22, 12.6, 15.35, 8.77, 9.81, 9.49, 15.82, 10.94, 6.53)

y = c(1.54, 0.81, 1.39, 1.09, 1.3, 1.16, 0.95, 1.29, 1.35, 1.86, 1.1, 0.96,
1.03, 1.8, 0.7, 0.88, 1.24, 0.94, 1.41, 2.13, 1.63, 0.78, 1.55, 1.5, 0.96, 
1.21, 1.4, 0.66, 1.55, 1.37, 1.19, 0.88, 0.97, 1.56, 1.51, 1.09, 1.23, 1.2, 
1.62, 1.52, 1.64, 1.77, 0.97, 1.12, 1.48, 0.83, 1.06, 1.1, 1.21, 0.75)

lm(y ~ x1 + x2)
glm(y ~ x1 + x2, family=gaussian) 
glm(y ~ x1 + x2, family=gaussian(link="log")) 
nls(y ~ exp(b0+b1*x1+b2*x2), start=list(b0=-1,b1=0.01,b2=0.1))

Note that the first pair are the same model ($y_i \sim N(\beta_0+\beta_1 x_{1i}+\beta_2 x_{2i},\sigma^2)\,$), and the second pair are the same model ($y_i \sim N(\exp(\beta_0+\beta_1 x_{1i}+\beta_2 x_{2i}),\sigma^2)\,$ and the fits are essentially the same within each pair.

So - in relation to the title question - you can fit a substantially wider variety of Gaussian models with a GLM than with regression.

edited Jul 24 '19 at 23:50

answered Nov 11 '15 at 01:27

Glen_b

257,508
32
553
939

4

+1. One the computational side of things I would also think that an GLM algorithm would use some IRWLS variant (in most cases) while an LM would relay on some closed-form solution variant. – usεr11852 Nov 11 '15 at 10:12
@usεr11852 - I would have thought it was E-M, but they might be the same thing in this case. – EngrStudent May 22 '17 at 15:10
@usεr11852, does this imply that if outliers are in the data set that GLM will be less affected by their presence? – Chris Chiasson Sep 13 '17 at 14:42
@EngrStudent Usually stats packages use Fisher scoring for GLMs (which boils down to a form of iterated least squares approximations - essentially as usεr11852 suggests). – Glen_b Sep 13 '17 at 21:38
@ChrisChiasson Not from the algorithm used for fitting, no. Some GLMs are more robust to y-outliers (e.g. a Gamma GLM is more robust to a high outlier than LS is), but some not -- it depends on the particular family and kind of outlier, not the fitting algorithm. – Glen_b Sep 13 '17 at 21:40
@Glen_b I guess I was referring ti user11852's mention of IRwLS... assuming the reweighting process scales down the weight of outliers due to extremely large variance/deviation – Chris Chiasson Sep 13 '17 at 21:50
1

It doesn't respond to seeing "outliers" (except via the likelihood as described above); reweighting is due to the effect of the variance function and the shift in the local linear approximation. – Glen_b Sep 13 '17 at 22:35
1

@ChrisChiasson: +1 to Glen_b's comment. As mentioned this is not related to the robustness of the algorithm in the presence of outliers. You might want to explore different families (eg. suitably scaled $t$-distributions, or a Huber loss; check `MASS::rlm` for more on that) - sorry just got online after a couple days off.. – usεr11852 Sep 14 '17 at 20:24
So would it be fair to say that I should (must?) use a robust function to estimate the variation/deviance to make the glm be robust to outliers...and that otherwise the glm would not be robust? (I use a different system than R) – Chris Chiasson Sep 16 '17 at 05:02
1

You could achieve the kind of robustness I think you intend in a number of ways. However, with glms and regression type models, you have to beware not just of outliers in the y-direction but of *influential* outliers, which can make themselves not look out of place.. – Glen_b Sep 16 '17 at 08:15

score 16 · Answer 2 · edited Jan 12 '16 at 14:04

From @Repmat's answer, the model summary are the same, but the C.I.'s of the regression coefficients from confint are slightly different between lm and glm.

> confint(reg1, level=0.95)
               2.5 %    97.5 %
(Intercept) 2.474742 11.526174
x1          1.971466  2.014002
x2          2.958422  3.023291
> confint(reg2, level=0.95)
Waiting for profiling to be done...
               2.5 %    97.5 %
(Intercept) 2.480236 11.520680
x1          1.971492  2.013976
x2          2.958461  3.023251

$t$-distribution is used in lm while normal distribution is used in glm when constructing the intervals.

> beta <- summary(reg1)$coefficients[, 1]
    > beta_se <- summary(reg1)$coefficients[, 2]
> cbind(`2.5%` = beta - qt(0.975, n - 3) * beta_se, 
        `97.5%` = beta + qt(0.975, n - 3) * beta_se) #t
                2.5%     97.5%
(Intercept) 2.474742 11.526174
x1          1.971466  2.014002
x2          2.958422  3.023291
> cbind(`2.5%` = beta - qnorm(0.975)*beta_se, 
        `97.5%` = beta + qnorm(0.975)*beta_se) #normal
                2.5%     97.5%
(Intercept) 2.480236 11.520680
x1          1.971492  2.013976
x2          2.958461  3.023251

score 15 · Answer 3 · answered Nov 10 '15 at 21:09

Short answer, they are exactly the same:

# Simulate data:
set.seed(42)
n <- 1000

x1 <- rnorm(n, mean = 150, sd = 3)
x2 <- rnorm(n, mean = 100, sd = 2)
u  <- rnorm(n)
y  <- 5 + 2*x1 + 3*x2 + u

# Estimate with OLS:
reg1 <- lm(y ~ x1 + x2)
# Estimate with GLS
reg2 <- glm(y ~ x1 + x2, family=gaussian)

# Compare:
require(texreg)
screenreg(l = list(reg1, reg2))

=========================================
                Model 1      Model 2     
-----------------------------------------
(Intercept)        6.37 **       6.37 ** 
                  (2.20)        (2.20)   
x1                 1.99 ***      1.99 ***
                  (0.01)        (0.01)   
x2                 3.00 ***      3.00 ***
                  (0.02)        (0.02)   
-----------------------------------------
R^2                0.99                  
Adj. R^2           0.99                  
Num. obs.          1000          1000       
RMSE               1.00                  
AIC                           2837.66    
BIC                           2857.29    
Log Likelihood               -1414.83    
Deviance                       991.82    
=========================================
*** p < 0.001, ** p < 0.01, * p < 0.05

Longer answer; The glm function fits the model by MLE, however, because of the assumption you made about the link function (in this case normal), you end up with the OLS estimates.

+1, a typo in the last sentence. The normal assumption is about the error distribution, not about the link function. In your example, the default link function is "identity". A more complete form for `glm` is `glm(y ~ x1 + x2, family = gaussian(link = "identity"))`. — Paul, Aug 16 '18 at 23:29

Is there any difference between lm and glm for the gaussian family of glm?

3 Answers3

Linked

Related