0

My dataset is about clients in arrears and contain the two following columns: cured which takes value 1 when the client cured and 0 otherwise, and self_cured which takes value 1 when client cured by themselves and 0 when they cured after being contacted. The dataset includes other behaviour and application variables as well.

I'm interested in studying the likelihood of a client in arrears to self cure. As you can see, there's a conditionality in my data: the self_cured variable is only relevant when the client cured (so, has a 1 in cured).

To account for this, I thought maybe I could do what I call a "two phase regression", that would look like this:

reg 1: cured ~ application_vars + behaviour_vars + u 

which will give the probability of cure aka the people that "will" cure

reg 2: self_cured ~  application_vars + behaviour_vars + s 

Where reg 1 would be a logistic regression run with the entire dataset and reg 2 just with the ones that were found cured from the first regression.

My question is, is there a name for this kind of thing? Do you have any sugestion about this?

  • Does this answer your question? [How do you deal with "nested" variables in a regression model?](https://stats.stackexchange.com/questions/372257/how-do-you-deal-with-nested-variables-in-a-regression-model) – kjetil b halvorsen Aug 02 '20 at 18:50
  • 1
    @kjetilbhalvorsen not really! That deals with nested variables affecting explanatory variables. In this case it would be with response variables –  Aug 02 '20 at 19:15
  • So your *cured* variable is the response, and you are using logistic regression? If so, please add that explicitly to your post. – kjetil b halvorsen Aug 02 '20 at 19:18
  • Maybe you could model this as a *graphical model* or *Bayesian network*, for example see https://stats.stackexchange.com/questions/313955/how-to-train-a-bayesian-network-with-bernoulli-switch-variable – kjetil b halvorsen Aug 21 '20 at 03:48

0 Answers0