I am using R "Forecast' package for prediction of churn by including external variables.
However, in my case its bit confusing. What I expect when you introduce more titles a less people will live. In case of titles_live it shows negative sign whereas toptitles_new its positive ?
When I look at correlation analysis they are negatively correlated to churn
1- Correlation
> cor(churn_rate, titles_live)
[1] -0.6511904
> cor(churn_rate, toptitles_new)
[1] -0.3265537
Whereas using simple linear model titles_live show neagtively relationship and toptitles_new positively associated with churn rate.
2- Simple Linear Model
summary(lm (churn_rate ~ titles_live + toptitles_new,
data = in_out_p_month))
Call:
lm(formula = churn_rate ~ titles_live + toptitles_new,
data = in_out_p_month)
Residuals:
Min 1Q Median 3Q Max
-6.7563 -1.5096 0.1252 1.7473 9.3720
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 32.29875 3.32502 9.714 3.87e-10 ***
titles_live. -0.05407 0.01428 -3.787 0.000813 ***
toptitles_new 0.03337 0.37485 0.089 0.929750
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.684 on 26 degrees of freedom
Multiple R-squared: 0.4242, Adjusted R-squared: 0.3799
F-statistic: 9.578 on 2 and 26 DF, p-value: 0.0007644
3- Forecasting with auto.arima
Well when I include those two variables as external regressors in my arima model my prediction for churn is quite close to real values. Even though one of them was not significantly associated with churn.
The direction of relationship is same as linear regression. As far I understand when using dynamic regression we can interpret regression coefficients. Can someone explains what could be reason for this weird relationship ? Also could please help me in interpretation of the output from model.
summary(arima_model_churn_rate)
Series: ts_churn_rate_train
Regression with ARIMA(2,0,0) errors
Coefficients:
ar1 ar2 intercept titles_live toptitles_new
0.8090 -0.5021 32.5879 -0.0573 0.3096
s.e. 0.1742 0.1833 4.5682 0.0190 0.3121
sigma^2 estimated as 9.454: log likelihood=-58.67
AIC=129.34 AICc=134.28 BIC=136.41
Training set error measures:
ME RMSE MAE MPE MAPE MASE
Training set -0.05194746 2.735777 2.347408 -2.053318 12.17807 0.3342007
ACF1
-0.06623817
Many thanks in advance !!!