Logistic regression, separating variable (moderator) true in population!

Question

I already checked other posts in this area, but still couldn't get a fit to my issue: I have the following preconditions:

Software: preferred SPSS v21, possibly R
Sample size: 5655 (will get around 8x times larger in 2 weeks, but I already want to find out statistical pitfalls)
DV: binary; 1 = success, rest 0 = fail - sales conversion
IV1: categorical; 5 different conditions - different designs
IV2: categorical; 3 different conditions - different discount (0,5,10)

and the following problem:

DV = 1 has only 36 cases. IV2 is naturally related to DV, since one of the 3 conditions of IV2 is only applied when DV = 1. Meaning for all DV = 0, IV2 will also equal 0. Conditions of IV1 are randomly assigned to each case. Essentially, I want to find out whether IV2 is a moderator of IV1 -> DV, however, SPSS gives me very strange pseudo-R²s, odds ratios and so on.

After googling I found out it might be a problem of quasi-complete separation. Subsequently, I tried to run Firth logistic regression in SPSS, which never stopped running and in R, which gave me the following error:

Error in chol.default(x) : the leading minor of order 15 is not positive definite

Such that I have the following questions:

is my issue solely related to the sample size?
how can I fix the leading minor error?
how can I approach the problem in general (methods)?

Additional info: The different discounts (IV2) are applied to a specific set of products (let's call them set A (0 discount), set B (5 discount) and set C (10 discount)). Now a visitor comes and is randomly assigned to one of the categories of IV1, which is effecting how A, B and C are displayed to the visitor.

It is only tracked to which category of IV1 the visitor was assigned and which product (from set A, B, C or none of them) was purchased.

I only want to find out the conversion rate of A, B and C and especially which category of IV1 predicts a higher conversion. On top I assume the discount (IV2) to have a moderating effect on the effects of IV1.

I know its probably not optimal to do it like this, but it was not possible in another way. Maybe my reasoning is incorrect, but I hoped to test the above for a moderation effect.

I ran a bayesglm today, it indeed drove down the odds ratios, so that is already a little success, although the results are mostly insignificant, which is probably due to the small sample size. However, I am not even sure whether the bayesglm applies to my situation. Further, I am unsure about the prior means and scales which I am trying to understand at the moment. Any feedback of how I can approach my situation is welcome, thanks!

This thread may also be of interest. http://stats.stackexchange.com/questions/11109/how-to-deal-with-perfect-separation-in-logistic-regression Note that the entire Hauck-Donner tag deals explicitly with this phenomenon. — Sycorax, Jul 06 '15 at 13:49
Thanks for your answer, but I couldnt really get track of my problem from the threads you posted. I am not that advanced in Statistics, could you maybe have a look at my problem description, I can also send a little sample of my data if needed. — Prof_Z, Jul 06 '15 at 16:37
Are the discounts only offered *after* the conversion? If so, how could IV2 possibly be a moderator? If not, how could there possibly be no 5s & 10s among the unconverted? — gung - Reinstate Monica, Jul 07 '15 at 12:03
Sorry, I might have not explained it really well, I edit additional information — Prof_Z, Jul 07 '15 at 21:31

Logistic regression, separating variable (moderator) true in population!

0 Answers0