Dependent variable not correlated with independent, but still a predictor in regression

Question

My regression model uses 4 independent variables and 1 dependent variable.

When I plot the dependent variables against the independent variables, only one of them shows a correlation. However, the regression analysis splits out 4 coefficients.

I'm trying to figure out how a variable can have a coefficient and affect the dependent variable when there doesn't seem to be a correlation between the two.

How exactly does this work?

Any help would be appreciated. I've tried looking this up but I'm not sure what to search.

Thanks,

I'm analyzing the following variables:avg. session length, time on app, time on website, length of membership, yearly amount spent.

Time on App is seemingly not correlated

Regression shows the following coefficients:

Avg Session Length: 25.98

Time on App: 38

Time on Website: 0.19

Length of Membership: 61

Could you edit your post to include the output you are looking at? It will be easier to explain my answer if I can refer to your output. — Mark White, Jun 22 '17 at 14:30

score 5 · Accepted Answer · answered Jun 22 '17 at 15:02

The model will output coefficients for every independent variable you provide. This does not mean that all of them significantly predict the outcome. Below, I generated three independent variables (x1, x2, x3). Then, I generated an outcome y to be predicted by only x1 and some error eps.

> set.seed(1839)
> x1 <- rnorm(100) # generating x1 data
> x2 <- rnorm(100) # generating x2 data
> x3 <- rnorm(100) # generating x3 data
> eps <- rnorm(100, 0, 4) # generating residuals
> y <- x1 + eps # creating the y data that is only predicted by x1 and error
> summary(lm(y ~ x1 + x2 + x3)) # running regression analysis

Call:
lm(formula = y ~ x1 + x2 + x3)

Residuals:
     Min       1Q   Median       3Q      Max 
-12.4047  -2.4531   0.1129   2.3498   9.3450 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept) -0.34690    0.40100  -0.865   0.3892  
x1           1.05767    0.42671   2.479   0.0149 *
x2          -0.17768    0.41690  -0.426   0.6709  
x3          -0.07895    0.37309  -0.212   0.8329  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.937 on 96 degrees of freedom
Multiple R-squared:  0.06104,   Adjusted R-squared:  0.0317 
F-statistic:  2.08 on 3 and 96 DF,  p-value: 0.1079

You can see that it obviously outputs coefficients for all variables. However, only the coefficient for x1 is significantly different from zero. Note that x2 and x3 do not correlate with y:

> cor.test(x2, y)

    Pearson's product-moment correlation

data:  x2 and y
t = -0.29478, df = 98, p-value = 0.7688
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.2248680  0.1676336
sample estimates:
        cor 
-0.02976456 

> cor.test(x3, y)

    Pearson's product-moment correlation

data:  x3 and y
t = -0.054407, df = 98, p-value = 0.9567
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.2016963  0.1911286
sample estimates:
         cor 
-0.005495866

You write:

I'm trying to figure out how a variable can have a coefficient and affect the dependent variable when there doesn't seem to be a correlation between the two.

It returns a coefficient because the model that you specified is allowing the other variables to have an influence on the DV. The thing is that these predictors don't explain any significant amount of variance in the DV (i.e., the coefficients aren't significantly different from zero).

You could do this by comparing a model with all the IVs as predictors with just the one significant one:

> mod.all <- lm(y ~ x1 + x2 + x3)
> mod.reduced <- lm(y ~ x1)
> anova(mod.all, mod.reduced)
Analysis of Variance Table

Model 1: y ~ x1 + x2 + x3
Model 2: y ~ x1
  Res.Df    RSS Df Sum of Sq      F Pr(>F)
1     96 1488.2                           
2     98 1491.5 -2   -3.3013 0.1065 0.8991

So the model that only includes x1 predicts y just as well as a model with x2 and x3 included also. So, if you wish, you could prune your model by dropping any non-significant predictors. It all depends what your goal of the analysis and research question is.

In short: Linear regression models will output coefficients for each independent variable that you specify, regardless of significance. If they have no relationship with the dependent variable, they will just have a coefficient that is very close to zero (relative to the standard error), which means they are not significant.

Dependent variable not correlated with independent, but still a predictor in regression

1 Answers1