2

I am trying to predict whether client would take up a promotion or not using the random forest model. From the variable importance plot output, it is shown that the number of days pass the day of contact(which I bin in days of 5) is the most important variable in the predictive modelling.

However, when I put this into a hypothesis testing, chi square test, the p-value is greater than 0.05. How do I reconcile these two facts or if there is a better way of testing that the number of days in relation to whether client would take up a promotion?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
AGZH12
  • 21
  • 1

1 Answers1

2

You lost information by binning, see Why should binning be avoided at all costs?. For an alternative, use logistic regression and spline the day of contact variable. For details see Logistic Regression with regression splines in R or Using splines to address non-linearity in logistic regression

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
  • thanks for the reply. I tried logistic regression before but the accuracy rate is not as good as the random forest model. I would just like to check if using any other hypothesis method would be good for testing – AGZH12 Apr 16 '21 at 14:23
  • Then please edit the post to give some more info, like other covariables, sample size, show us a plot, ... – kjetil b halvorsen Apr 16 '21 at 15:54