2

Right now I'm doing survival analysis for marketing purposes and I'm picking the covariates.

One variable that has a lot of impact on churn is the age band of the customer at the time of the first interaction. Younger customers churn faster.

Another variable that seems to have an impact is the client's "persona"; it's the segment the client has been classified in. There is one particular segment that churns faster than the others, but I'm suspecting it may be because most of the clients in this segment are young.

These two covariates are therefore correlated. To find out, I've fitted a Cox regression to these variables.

Global p-value is 0 and the above-mentioned persona still churns faster than the baseline (p-value 4.4e-06). However, the age band factor is also significant with the particular age band 20-29 churning faster than the baseline (customers aged 16-20) and the p-value is even more significant (< 2e-16).

Can I discard the hypothesis that the segment is churning faster than the baseline ?

Ferdi
  • 4,882
  • 7
  • 42
  • 62
Negarev
  • 125
  • 1
  • 1
  • 6

1 Answers1

1

The p-values in your question are for tests against the null hypothesis that the parameter in question (the $\beta$s for the dummy variables for the persona value and the 20-29 age band) is equal to 0. The small p-values indicate that (depending on your threshold for statistical significance) you can infer that those parameters are not equal to 0.

In particular, you can infer that the $\beta$ for the persona is not equal to 0.

If you want to infer that the $\beta$ is greater than 0, then you need a one-sided test; multiply the p-value by 2, since you're asking whether the value of the test statistic is in the right tail. In your case, (depending on your threshold for statistical significance) you can infer that the $\beta$ for the persona is greater than 0.$^*$ The other p-values aren't relevant, because they're for different tests — the "global p-value" is roughly for a test against the null that all of the $\beta$s are equal to 0, and the p-value for the age band is for a test against the null that the $\beta$ for the dummy for that age band is equal to 0.

$^*$ This assumes that the estimate $\hat \beta$ is positive. Off the top of my head, I don't recall whether the summary for a Cox PH model reports "raw" or exponentiated values. If they're "raw" values, then this assumes $\hat\beta > 0$. If they're exponentiated, then this assumes $\hat\beta > 1$.

Dan Hicks
  • 722
  • 4
  • 19
  • Thanks a lot for your answer ! Is there a way to compare these two betas to find out whether the persona is only mimicking the age band ? I'm still doubting that the persona is significant – Negarev Dec 06 '17 at 04:12
  • When you include both variables in the same regression model, you can think of the estimate for one as "controlling for" the effects of the other. So $\beta_{persona}$ is the effect of persona controlling for the effect of age band. – Dan Hicks Dec 06 '17 at 04:16
  • Great, that answers my question! Thank you so much! – Negarev Dec 06 '17 at 04:19
  • I also have another [question](https://stats.stackexchange.com/questions/317336/interpreting-r-coxph-cox-zph) on Cox PH, maybe you can help as well? – Negarev Dec 06 '17 at 04:26