I performed both a linear and log-linear regression to predict the price of a smartphone based on its characteristics. Now I have a question concerning the coefficients between the two models.
In the linear regression model, the dummy variable GPS included or not is 37,7. This means that smartphone users pay on average 47.7 euro more for a smartphone with a GPS built in than one without, while holding other variables in the model constant.
lm <- lm(Price ~ ., data=data_price2)
summary(lm)
Call:
lm(formula = Price ~ ., data = data_price2)
Residuals:
Min 1Q Median 3Q Max
-702.43 -46.68 -6.49 37.59 1522.53
Coefficients: (38 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 44.62802 70.21355 0.636 0.525128
Screensize -6.78973 7.14553 -0.950 0.342155
Multitouch 11.20542 12.62356 0.888 0.374861
nbrCores 14.58104 2.67044 5.460 5.53e-08 ***
Processorspeed 46.84652 9.54521 4.908 1.02e-06 ***
Memory -24.12829 6.02706 -4.003 6.54e-05 ***
nbrSims -9.23095 8.00187 -1.154 0.248842
CameraBack 3.10923 0.62724 4.957 7.94e-07 ***
CameraFront 10.69124 2.45340 4.358 1.40e-05 ***
Autofocus -20.51415 9.40548 -2.181 0.029326 *
Flitsertype 10.63140 7.10996 1.495 0.135043
5-GHzOndersteuning NA NA NA NA
GPS 47.68043 11.81778 4.035 5.73e-05 ***
....
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 102.3 on 1556 degrees of freedom
Multiple R-squared: 0.7766, Adjusted R-squared: 0.7613
F-statistic: 51.02 on 106 and 1556 DF, p-value: < 2.2e-16
Next, when we take a look at the log-linear regression model, the coefficient for the GPS variable is 2.249e-02, which means that the smartphone retail price increases with 2.52% = (e2.249e-02 − 1) when GPS is included, while holding other variables in the model constant.
lm3 <- lm(log(Price) ~ ., data = data_price2 )
summary(lm3)
Call:
lm(formula = log(Price) ~ ., data = data_price2)
Residuals:
Min 1Q Median 3Q Max
-2.3367 -0.1964 -0.0008 0.1896 3.1645
Coefficients: (38 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.268e+00 2.598e-01 12.575 < 2e-16 ***
Screensize 4.878e-02 2.644e-02 1.845 0.065255 .
Multitouch 2.155e-02 4.672e-02 0.461 0.644685
nbrCores 5.670e-02 9.883e-03 5.737 1.16e-08 ***
Processorspeed 7.306e-02 3.533e-02 2.068 0.038787 *
Memory 8.273e-03 2.231e-02 0.371 0.710761
nbrSims -3.488e-02 2.961e-02 -1.178 0.239022
CameraBack 9.779e-03 2.321e-03 4.213 2.67e-05 ***
CameraFront 5.348e-02 9.080e-03 5.890 4.73e-09 ***
Autofocus 1.061e-02 3.481e-02 0.305 0.760654
Flitsertype 1.080e-01 2.631e-02 4.105 4.26e-05 ***
5-GHzOndersupport NA NA NA NA
GPS 2.249e-02 4.374e-02 0.514 0.607221
....
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.3785 on 1556 degrees of freedom
Multiple R-squared: 0.7974, Adjusted R-squared: 0.7835
F-statistic: 57.76 on 106 and 1556 DF, p-value: < 2.2e-16
The average price for a smartphone in my model is 232€. So, in the log-linear model 2.52% of 232€ is +- 5.85€. How come this value is so different in comparison with the result obtained from the linear regression model?