The model will output coefficients for every independent variable you provide. This does not mean that all of them significantly predict the outcome. Below, I generated three independent variables (x1
, x2
, x3
). Then, I generated an outcome y
to be predicted by only x1
and some error eps
.
> set.seed(1839)
> x1 <- rnorm(100) # generating x1 data
> x2 <- rnorm(100) # generating x2 data
> x3 <- rnorm(100) # generating x3 data
> eps <- rnorm(100, 0, 4) # generating residuals
> y <- x1 + eps # creating the y data that is only predicted by x1 and error
> summary(lm(y ~ x1 + x2 + x3)) # running regression analysis
Call:
lm(formula = y ~ x1 + x2 + x3)
Residuals:
Min 1Q Median 3Q Max
-12.4047 -2.4531 0.1129 2.3498 9.3450
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.34690 0.40100 -0.865 0.3892
x1 1.05767 0.42671 2.479 0.0149 *
x2 -0.17768 0.41690 -0.426 0.6709
x3 -0.07895 0.37309 -0.212 0.8329
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.937 on 96 degrees of freedom
Multiple R-squared: 0.06104, Adjusted R-squared: 0.0317
F-statistic: 2.08 on 3 and 96 DF, p-value: 0.1079
You can see that it obviously outputs coefficients for all variables. However, only the coefficient for x1
is significantly different from zero. Note that x2
and x3
do not correlate with y
:
> cor.test(x2, y)
Pearson's product-moment correlation
data: x2 and y
t = -0.29478, df = 98, p-value = 0.7688
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.2248680 0.1676336
sample estimates:
cor
-0.02976456
> cor.test(x3, y)
Pearson's product-moment correlation
data: x3 and y
t = -0.054407, df = 98, p-value = 0.9567
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.2016963 0.1911286
sample estimates:
cor
-0.005495866
You write:
I'm trying to figure out how a variable can have a coefficient and affect the dependent variable when there doesn't seem to be a correlation between the two.
It returns a coefficient because the model that you specified is allowing the other variables to have an influence on the DV. The thing is that these predictors don't explain any significant amount of variance in the DV (i.e., the coefficients aren't significantly different from zero).
You could do this by comparing a model with all the IVs as predictors with just the one significant one:
> mod.all <- lm(y ~ x1 + x2 + x3)
> mod.reduced <- lm(y ~ x1)
> anova(mod.all, mod.reduced)
Analysis of Variance Table
Model 1: y ~ x1 + x2 + x3
Model 2: y ~ x1
Res.Df RSS Df Sum of Sq F Pr(>F)
1 96 1488.2
2 98 1491.5 -2 -3.3013 0.1065 0.8991
So the model that only includes x1
predicts y
just as well as a model with x2
and x3
included also. So, if you wish, you could prune your model by dropping any non-significant predictors. It all depends what your goal of the analysis and research question is.
In short: Linear regression models will output coefficients for each independent variable that you specify, regardless of significance. If they have no relationship with the dependent variable, they will just have a coefficient that is very close to zero (relative to the standard error), which means they are not significant.