I'm working on a small project where I need to create a multivariate linear regression model to predict the frequency of some airline companies. I'm a bit confused as I don't know if I have to remove the intercept because its Pr(>|t|)
had, after removing the first variable dist
, the biggest value among the other values. Here is what I get after removing dist
:
flights_lm = lm(freq~dist+capa+nbrt+depf+lcco+prbi)
summary(flights_lm)
##################################################################
# > summary(flights_lm)
#
# Call:
# lm(formula = freq ~ dist + capa + nbrt + depf + lcco + prbi)
#
# Residuals:
# Min 1Q Median 3Q Max
# -204884 -12347 1145 12382 297908
#
# Coefficients:
# Estimate Std. Erro t value Pr(>|t|)
# (Intercept) 1.857e+04 1.487e+04 1.248 0.21437
# dist -5.145e+00 6.729e+00 -0.765 0.44610
# capa -7.928e+01 6.540e+01 -1.212 0.22784
# nbrt 7.665e+01 7.188e+00 10.663 < 2e-16 ***
# depf 3.408e-05 1.204e-05 2.832 0.00546 **
# lcco 3.531e+04 2.151e+04 1.642 0.10339
# prbi 4.084e+00 2.017e+01 0.203 0.83988
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 60280 on 116 degrees of freedom
# Multiple R-squared: 0.8719, Adjusted R-squared: 0.8653
# F-statistic: 131.6 on 6 and 116 DF, p-value: < 2.2e-16
#####################################################################
flights_lm2 = update(flights_lm, .~. -prbi)
summary(flights_lm2)
####################################################################
# Call:
# lm(formula = freq ~ dist + capa + nbrt + depf + lcco)
#
# Residuals:
# Min 1Q Median 3Q Max
# -204913 -12471 1098 12201 297917
#
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 1.918e+04 1.450e+04 1.323 0.18854
# dist -5.132e+00 6.701e+00 -0.766 0.44528
# capa -7.813e+01 6.488e+01 -1.204 0.23093
# nbrt 7.665e+01 7.158e+00 10.708 < 2e-16 ***
# depf 3.406e-05 1.199e-05 2.842 0.00529 **
# lcco 3.506e+04 2.138e+04 1.639 0.10382
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 60030 on 117 degrees of freedom
# Multiple R-squared: 0.8719, Adjusted R-squared: 0.8664
# F-statistic: 159.3 on 5 and 117 DF, p-value: < 2.2e-16
#####################################################################
flights_lm3 = update(flights_lm2, .~. -dist)
summary(flights_lm3)
#####################################################################
# Call:
# lm(formula = freq ~ capa + nbrt + depf + lcco)
#
# Residuals:
# Min 1Q Median 3Q Max
# -206975 -12147 4077 11489 297630
#
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 1.526e+04 1.355e+04 1.127 0.26212
# capa -9.031e+01 6.279e+01 -1.438 0.15303
# nbrt 7.705e+01 7.127e+00 10.811 < 2e-16 ***
# depf 3.302e-05 1.189e-05 2.778 0.00637 **
# lcco 3.329e+04 2.122e+04 1.569 0.11939
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 59920 on 118 degrees of freedom
# Multiple R-squared: 0.8713, Adjusted R-squared: 0.8669
# F-statistic: 199.6 on 4 and 118 DF, p-value: < 2.2e-16
################################################################