Coefficient with a high cor and low p-value in a high R² regression. How to interpret?

Question

first time asking something on cross validated.

I'm doing an analysis on the performance of marketing campaigns. I've done a few linear regressions with the dataset trying to explain as much variance as possible. This is my final model:

all:
lm(formula = log(regular.volume.2) ~ ., data = dt.regular.2)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.64384 -0.10257 -0.00436  0.11570  0.52346 

Coefficients:
                                Estimate Std. Error t value Pr(>|t|)    
(Intercept)                    1.365e+01  1.676e+00   8.150 4.17e-13 ***
regular.campaign.b.spend      -4.516e-07  3.515e-07  -1.285   0.2013    
regular.campaign.e.spend       4.904e-07  3.165e-07   1.550   0.1239    
regular.campaign.d.spend       2.568e-07  1.345e-07   1.908   0.0587 .  
regular.campaign.c.spend      -7.104e-08  3.584e-07  -0.198   0.8432    
regular.campaign.a.spend       3.853e-07  3.672e-07   1.049   0.2961    
regular.campaign.f.spend       6.002e-07  4.789e-07   1.253   0.2125    
regular.distribution.2        -2.349e-03  1.650e-02  -0.142   0.8870    
regular.display.2              3.250e-02  2.226e-01   0.146   0.8841    
regular.feature.and.display.2  1.439e-02  1.705e-03   8.437 9.04e-14 ***
regular.feature.2              2.869e-03  1.387e-03   2.069   0.0407 *  
regular.multibuy.2            -3.908e-04  1.308e-03  -0.299   0.7657    
regular.price.2               -1.055e+00  4.944e-02 -21.348  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1978 on 119 degrees of freedom
Multiple R-squared:  0.9642,    Adjusted R-squared:  0.9606 
F-statistic: 266.8 on 12 and 119 DF,  p-value: < 2.2e-16

It is my understanding that:

I have reached a high R², which means I have explained most of the variance.
A high "estimate" of the independent variable means that it is strongly correlated with the dependent variable.
A high p-value means that the independent variable it is not statistically significant.

So my question is: what can be said about the campaigns that have a high p-value?

For instance: what can I say about the campaign a, considering that it is not statistically relevant (high p-value)? Can I claim that it hadn't a significant impact on sales, so it should be considered a failure?

tldr: How should I interpret in a business sense the effect of an campaign on sales (linear regression) if this campaigns show a high p-value in a regression which most of the variance has been explained (96%)?

tldr:tldr: help me. I'm desperate.

score 1 · Answer 1 · answered Mar 23 '18 at 22:19

The first bullet is correct; the second bullet is not; the third bullet may (or may not) be correct.

For the second bullet, the size of the estimate partially depends on the scale of the dependent variable. Thus, you can't actually compare the absolute sizes. I might recommend looking at the standardized coefficients to be able to compare the over all effects of the independent variables compared to each other.

For the third bullet, it is important to remember what the P-values in this table actually are assessing. Briefly, the null hypothesis being tested for any single row is this assumption: there is no additional gain in explaining the variance of the dependent variable with this (current row) variable being added to the model that had all the other rows, i.e., comparing the model with all variables and the model with all variables but that row.

This means that it is possible for the desired variable to still be related to the dependent variable, but for other reasons associated with the multiple regression model, the impact is being obscured.

Not sure this helps answer your query, but I hope it is useful.

score 0 · Answer 2 · answered Mar 24 '18 at 01:40

I just wanted to add to Gregg H's comment:

Why is this your final regression model? There are still a lot of predictors in there that are not significant (the ones that don't have stars by the p-values). For example, regular.campaign.a.spend is not a significant predictor. You should consider trying stepwise regression (described well at https://people.duke.edu/~rnau/regstep.htm).

You can use the leaps package to step through different combinations of variables to figure out which ones to use as predictors: Stepwise regression in R - How does it work?

Good luck :)

Hi, thank you for your answer. While stepwise is a great tool to pick predictors while maximizing R², it does not help me because I just can't drop predictors. For instance, if i drop campaign c (which has a very high p-value), I won't be able to measure its impact on volume sales (coefficients). — Danilo Araujo, Mar 24 '18 at 09:38

Coefficient with a high cor and low p-value in a high R² regression. How to interpret?

2 Answers2