-1

We are trying to understand the impact of number of workdays on sales.

Please find reprex below:

library(tidyverse)

# Work days for January from 2010 - 2018
data = data.frame(work_days = c(20,21,22,20,20,22,21,21),
           sale = c(1205,2111,2452,2054,2440,1212,1211,2111))

# Apply linear regression
model = lm(sale ~ work_days, data)

summary(model)
Call:
lm(formula = sale ~ work_days, data = data)

Residuals:
   Min     1Q Median     3Q    Max 
-677.8 -604.5  218.7  339.0  645.3 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  2643.82    5614.16   0.471    0.654
work_days     -38.05     268.75  -0.142    0.892

Residual standard error: 593.4 on 6 degrees of freedom
Multiple R-squared:  0.00333,   Adjusted R-squared:  -0.1628 
F-statistic: 0.02005 on 1 and 6 DF,  p-value: 0.892

Could you please help me understand if the coefficients Every work day decreases the sale by 38.05 ?


data = data.frame(work_days = c(20,21,22,20,20,22,21,21),
           sale = c(1212,1211,2111,1205,2111,2452,2054,2440))

model = lm(sale ~ work_days, data)

summary(model)
Call:
lm(formula = sale ~ work_days, data = data)

Residuals:
   Min     1Q Median     3Q    Max 
-686.8 -301.0   -8.6  261.3  599.7 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  -6220.0     4555.9  -1.365    0.221
work_days      386.6      218.1   1.772    0.127

Residual standard error: 481.5 on 6 degrees of freedom
Multiple R-squared:  0.3437,    Adjusted R-squared:  0.2343 
F-statistic: 3.142 on 1 and 6 DF,  p-value: 0.1267

Does this mean,

Every workday increases the sales by 387 ? How about the negative intercept ?

Similar questions but couldnt apply the learnings:

Interpreting regression coefficients in R

Interpreting coefficients from Logistic Regression from R

Linear combination of regression coefficients in R

  • The F-statistic of both your models suggest that the distribution of sales is not conditional on work-days. In second dataset, the p-value is 0.1267 so only at ~87% confidence level your model has any significance. – Dayne Sep 19 '19 at 09:29
  • 3
    Why did you repost this question? You got your answer two days ago: https://stackoverflow.com/a/57957391/1412059 – Roland Sep 19 '19 at 10:12

1 Answers1

1

Both your interpretations in bold are correct.

The intercept is the fitted value if all predictors have a value of zero. So in your second model, zero workdays would imply sales of -6220. Which illustrates why you can only interpret models over the actually observed range of the predictors - I assume none of your observations come with zero workdays.

Stephan Kolassa
  • 95,027
  • 13
  • 197
  • 357
  • Thanks for response. However, was wondering my interpretation were really making sense because p value is not less than .05 and sample size is just 8 points. Additionally, R-squared implies that it can't explain even 50% of the variation – Abhishek Sep 19 '19 at 08:53
  • 1
    $p>.05$ means that the *true* coefficient could easily be zero, or even have the opposite sign. (After all, your coefficient is just an *estimate* of the true value based on your particular sample.) A sample size of 8 is indeed small, and it contributes to a large $p$ value. All of which does not change the *interpretation* of your model and just means that you should treat your results with caution. $R^2<0.50$ just means that your observations are not explained very well, which is a common occurrence, see [this question](https://stats.stackexchange.com/q/414349/1352). – Stephan Kolassa Sep 19 '19 at 09:06
  • If this were a real data set, I would conclude that there is insufficient evidence for an impact of number of workdays on sales. In interpreting coefficients, just remember the coefficients are just part of the equation for a line: predicted sales = -6220 + 386.6*work_days – Jdub Sep 19 '19 at 15:37