AB testing: test for nested variables

Question

Sorry if my title is not clear, I'm very new to the random experiment field.

I will start right off the bat with an example, hoping this clarifies things.

Let's say I am a bank, which is looking into implementing a new KYC method for its loan underwriting process.

I am mainly concerned about two things: how many customers I get (conversion rate), and how many of them will default on the loan I give them (default rate).

I assume that my new KYC process will of course decrease the conversion rate (fraudsters will more likely quit at that step because they don't have the required credentials to go further), but the customers who go through the KYC will also default less (because the aforementioned fraudsters will not get the occasion to get a loan).

I can build a simple A/B test for my customers, where it is decided at random that they go through the KYC process or not.

Building a test only for conversion is rather easy, this is a binomial variable with a $\theta$ parameter, and I can start off by making my null hypothesis and the one-sided alternative:

$H_0: \theta_{KYC} = \theta_{N}$ ( N for no KYC event)

$H_1: \theta_{KYC} < \theta_{N}$

Given a sufficient sample size, I can just compute a chi-squared test and have (hopefully) significant results.

I am also interested into verifying if customers who go throught the KYC default less. It is also a binomial variable, so if default and conversion were unrelated I would be happy simply building a second chi square on the variable whether my customers default or not - or look into a multinomial test, even though I am less familiar with that

However,I cannot observe the default for users who do not convert ! More than that, there is a probable strong correlation between the risk of default and the likelihood of conversion if a KYC process is necessary.

So I am a bit at a loss there on how to estimate the effect on default risk from the KYC process.

Would doing a chi squared test as if probability of default was independent of conversion with the KYC process be of any value ?

If not, would a more complex model (even by adding more variables like revenues) help me better estimate the effect creating the KYC process has in terms of default ?

Take a look at [this post on the sequential logit model](https://stats.stackexchange.com/a/92773/7071) and this one [on the two-part model](https://stats.stackexchange.com/questions/93998/how-can-i-determine-statistical-significance-in-an-a-b-test-in-which-the-kpi-is/94024#94024). — dimitriy, Jan 27 '20 at 17:47
The analysis will depend on what you are trying to determine. If you are trying to determine if defaults are less with the KYC treatment, then comparing the binomial test for default rate between treatment and no treatment is valid. On the other hand if you are comparing profitability where the treatment is higher expense, lower loans but less defaults then that becomes more complicated. — Dave2e, Jan 29 '20 at 16:09

AB testing: test for nested variables

0 Answers0